large Class.forName() patch

Godmar Back gback at cs.utah.edu
Mon Feb 7 09:42:05 PST 2000


> 
> 
> Godmar Back wrote:
> 
> > Thinking about it, maybe it would even be worth thinking about
> > not implementing Strings as char[] arrays internally, but as utf8strings.
> > This would save space for all ASCII strings, it would allows to directly
> > use a literal's utf8 representation.  Since strings are immutable, the
> > internal char[] array is never directly exposed anyway.
> 
> It is done this way in gcj I think. But I do not suppose that it is a
> good thing for java only implementation - main aim for Strings should be
> efficiency. I think that String.charAt is method called often enough to
> cause major performance drop in entire java app with such strings.
> 

We may be talking about different things here, but I don't think gcj
stores strings as utf8strings in the way I propose.
Look at their implementation of charAt(), which is mapped to
"(jchar*)((char*) str->data + str->boffset);" via JvGetStringChars.

Saying that efficiency should be the main aim for Java begs the question
what kind of efficiency: you imply runtime efficiency, but space efficiency
can sometimes be more important.

Even for runtime efficiency, you need to look at profiling data and 
decide whether it's worth it.  Is charAt() really the most frequent
operation on java.lang.String?  For which application or set of 
applications?

Also, I think that charAt() can be sped up for "regular" strings
that contain only ASCII chars < 127, because those can be represented in
a byte array.  So you have to ask what strings you're application is
dealing with.  That may depend on what i18n is used etc.

I don't think one should rush to conclusions without data to back it up.
It may or may not be a benefit, and most likely it will be sometimes.

	- Godmar

ps: I think Alexandre is right about class names not being able to
contain a \u0:  while they're represented as a ConstantUtf8Info, the
spec also says 

    Class names that appear in class file structures are always represented 
    in a fully qualified form (§2.7.9).  These class names are always 
    represented as CONSTANT_Utf8_info (§4.4.7) structures, and they are
    referenced from those CONSTANT_NameAndType_info (§4.4.6) structures 
    that have class names as part of their descriptor (§4.3), as well as 
    from all CONSTANT_Class_info (§4.4.1) structures. 

    For historical reasons the exact syntax of fully qualified class names 
    that appear in class file structures differs from the familiar Java fully 
    qualified class name documented in §2.7.9. In the internal form, the
    ASCII periods ('.') that normally separate the identifiers (§2.2) 
    that make up the fully qualified name are replaced by ASCII forward slashes 
    ('/'). For example, the normal fully qualified name of class Thread is
    java.lang.Thread. In the form used in descriptors in class files, a 
    reference to the name of class Thread is implemented using a 
    CONSTANT_Utf8_info structure representing the string "java/lang/Thread"

And if you follow the link the identifiers, it'll says that it's a sequence
of isJavaLetterOrDigit chars, which are letters, digit, _ and $.

That said, it doesn't solve the problem that we must throw an exception
if the user looks up "java.lang.String\0" or the like.


More information about the kaffe mailing list