[kaffe] Bug report (java.io.StreamTokenizer)

Fri Jun 27 19:54:01 PDT 2003

Hi Hermanni,

In message "[kaffe] Bug report"
    on 03/06/27, Hermanni Hyytiälä <hemppah at cc.jyu.fi> writes:

> Token: # (type: -3)

> The tokenizer is initialized in
> sandStorm.main.SandstormConfig$configSection (starts from line 610) like
> this:

> tok = new StreamTokenizer(in);
> tok.resetSyntax();
> tok.wordChars((char)0, (char)255);
> tok.whitespaceChars('\u0000', '\u0020');
> tok.commentChar('#');
> tok.eolIsSignificant(true);

This way of initialization makes all characters between 0 and 255
word characters.

So '#' is both a word character and a comment character.
(Sun's API document says, "Each character can have zero or more of these
attributes.")

Kaffe's java.io.StreamTokenizer checks each character in the
following order:

  isWhitespace
  isNumeric
  isAlphabetic
  chr=='/' && CPlusPlusComments && parseCPlusPlusCommentChars()
  chr=='/' && CComments && parseCCommentChars()
  isComment
  isStringQuote

So '#' is treated as a word character (isAlphabetic) before
it is checked against isComment.

I do not think Sun's API document clearly defines in what order
character types should be checked.  So it can be said that treating
'#' as a word character is not a bug but so specified.

But in order to make the behavior of kaffe's java.io.StreamTokenizer
similar to Sun's,  I suggest that the cheking order be changed
as follows (the more specific, the earlier):

  isWhitespace
  chr=='/' && CPlusPlusComments && parseCPlusPlusCommentChars()
  chr=='/' && CComments && parseCCommentChars()
  isComment
  isStringQuote
  isNumeric
  isAlphabetic

Please try this patch.

--- java/io/StreamTokenizer.java.orig	Tue Feb 19 09:47:49 2002
+++ java/io/StreamTokenizer.java	Sat Jun 28 11:48:50 2003
@@ -116,14 +116,6 @@
 		/* Skip whitespace and return nextTokenType */
 		parseWhitespaceChars(chr);
 	}
-	else if (e.isNumeric) {
-		/* Parse the number and return */
-		parseNumericChars(chr);
-	}
-	else if (e.isAlphabetic) {
-		/* Parse the word and return */
-		parseAlphabeticChars(chr);
-	}
 	/* Contrary to the description in JLS 1.ed,
 	   C & C++ comments seem to be checked
 	   before other comments. That actually
@@ -145,6 +137,14 @@
 	else if (e.isStringQuote) {
 	        /* Parse string and return word */
 	        parseStringQuoteChars(chr);
+	}
+	else if (e.isNumeric) {
+		/* Parse the number and return */
+		parseNumericChars(chr);
+	}
+	else if (e.isAlphabetic) {
+		/* Parse the word and return */
+		parseAlphabeticChars(chr);
 	}
 	else {
 		/* Just return it as a token */