large Class.forName() patch

Godmar Back gback at cs.utah.edu
Sun Feb 6 21:53:38 PST 2000


> 
> 
> On Feb  4, 2000, Mo DeJong <mdejong at cygnus.com> wrote:
> 
> > The current Kaffe implementation means that a \0 embedded in the
> > class name will screw up code that expects a \0 at the end of a
> > string.
> 
> AFAIK, there can't be \0s embedded in class names.  

Utf8strings can have \u0 characters in them, and the JVM spec says that 
the class name is a CONSTANT_Utf8Info, so it can have a \u0 in it as well.
Why should it not?  Limitations on characters in your identifier make
sense at the java source level, but they're not necessary at the 
bytecode level.

> And, in any case,
> utf8 strings won't contain a \0 char unless they actually contain \u0.
> 

Not so.  On the contrary, utf8strings encode the \u0 with something
different than \0.  The convenience of utf8 in the JVM spec lies in that 
it can encode all unicode character without using a \0.  Hence, they can 
be represented as zero-terminated strings, which is what we do.

The problem comes in when we convert either 16bit unicode java strings
or utf8strings to zero-terminated 8bit C strings.  Then we can't represent
the \u0 character (among others), and problems arise.  If we directly 
converted from unicode java to utf8strings, there would be no problem.
Thinking about it, maybe it would even be worth thinking about 
not implementing Strings as char[] arrays internally, but as utf8strings.
This would save space for all ASCII strings, it would allows to directly
use a literal's utf8 representation.  Since strings are immutable, the
internal char[] array is never directly exposed anyway.

	- Godmar



More information about the kaffe mailing list