Tue Jun 22 17:38:17 PDT 2004
there shouldn't be any reason not to include the change you suggest -
if someone implements it, of course.
If I understand your proposal right, you'd use an array for
the first 256 values and a hashtable or something like that
for the rest. I don't think there would be a problem with changing
it so that it would both serialize an array and a hashtable.
One or two objects in *.ser shouldn't make a difference.
You could even stick a flag at the beginning if the array shouldn't
pay off for some encodings.
One would have to see what the actual sizes of the .ser files would be;
keeping those small is certainly desirable. From what I understand,
they're more compact than any Java code representation.
Edouard would know more since he wrote that code, I think.
On a related note, this whole conversion thing stinks.
Why can't people stick to 7-bit ASCII?
For instance, the JVM98 jack benchmark calls PrintStream.print
a whopping 296218 times in a single run. Every call results in a new
converter object being newinstanced, just to convert a bunch of bytes.
(The new converter was one of the changes done to make the
charset conversion thread-safe.) This is one of the reasons
why we're on this test some 7 or 8 times slower than IBM.
And that's not even using any of the serialized converters, just
the default one (which is written in JNI).
> I wrote a simple program to show a Java charmap (
> something like Encode.java in developers directory).
> It essentially creates a byte array with size 1, and
> creates a string with the appropriate Unicode char
> using the encoding in question for every value a byte
> can take.
> When displaying a serialized converter like 8859_2,
> the performance is very bad. Comparing current kaffe
> from CVS running on SuSE Linux 6.4 with jit3 and IBM's
> JRE 1.3 running in interpreted mode, kaffe is about 10
> times slower.
> While I consider the idea to use serialized encoders
> based on hashtables a great one, it is very
> inefficient for ISO-8859-X and similar byte to char
> encodings. These encodings use most of the 256
> possible values a byte can take to encode characters,
> so I tried using an array instead. I achieved
> comparable running times to JRE 1.3.
> Why was the hashtable based conversion chosen over
> alternatives (switch based lookup, array based
> "Success means never having to wear a suit"
> Do You Yahoo!?
> Send instant messages & get email alerts with Yahoo! Messenger.
More information about the kaffe