[kaffe] question about characters set.

Wed Nov 12 10:10:03 PST 2003

Hi jsona,

jsona laio wrote:
> hi mavens,
> i used to utilize sun java's vm to develop my porject.
> however, lately i want to participate a porject, in
> which involves encoding like CCCII (CJK based
> character for asian characters, which defines more
> characters than unicode supports). however, as i know,
> java vm is based upon unicode (is kaffe based on
> unicode, too?). so i hope to know that "is it possible
> to switch its base encoding", for i'm afraid that code
> value may miss when data exchanging 'twixt two
> character set. or any else better way can avoid such
> questions?
> i appreciate any suggestions, sincerely.

If I understand you correctly, you'd like to exchange the Unicode core 
of a VM for another encoding. The trouble is that it wouldn't be a VM 
for Java anymore, as the spec mandates the Unicode usage for Java 
programs [1]. The spec explicitely says:

§2.4.1
[...]
The integral types are byte, short, int, and long, whose values are 
8-bit, 16-bit, 32-bit, and 64-bit signed two's-complement integers, 
respectively, and char, whose values are 16-bit unsigned integers 
representing Unicode characters (§2.1).

If complying to the JVM spec doesn't bother you much, feel free to fork 
kaffe and rip the unicode handling code out and replace it with CCCIII.

If you're looking for a more portable solution to character conversion 
problems, you may want to take a look at the ICU4J [2] project from IBM.

best regards,
dalibor topic

[1]
http://java.sun.com/docs/books/vmspec/2nd-edition/html/Concepts.doc.html#25310
[2] http://oss.software.ibm.com/icu4j/