[kaffe] Bug report (java.io.StreamTokenizer)

Mon Jun 30 07:05:02 PDT 2003

Hei Hermanni, konichiwa Kiyo,

--- Hermanni Hyytiälä <hemppah at cc.jyu.fi> wrote:
> On Mon, 2003-06-30 at 13:06, Ito Kazumitsu wrote:
> > Hi,
> > 
> > In message "Re: [kaffe] Bug report (java.io.StreamTokenizer)"
> >     on 03/06/30, Hermanni Hyytiälä <hemppah at cc.jyu.fi> writes:
> > 
> > > According to the JLS (first edition), the nextToken-method of
> > > java.io.StreamTokenizer class has the following lexical order:
> > > 
> > > whitespace
> > > numeric character
> > > alphabetic character
> > > comment character
> > > string quote character
> > > comment //
> > > comment /*
> > 
> > I see.  But this rule seems to have been ignored even by
> > Sun's implementation.
> > 
> > (1) Kaffe's java.io.StreamTokenizer.java has this comment:
> > 
> >         /* Contrary to the description in JLS 1.ed,
> >            C & C++ comments seem to be checked
> >            before other comments. That actually
> >            make sense, since the default comment
> >            character is '/'.
> >         */
> 
> 
> Do you know what this comment is based on?

Being the guy who rewrote the class a few years ago, I guess I should chime in.

I had discovered a case where kaffe deviated from Sun's implementation, and
fixed it by pulling comment detection up which seemed to be what Sun's
implementation is doing. To see it for yourself, change Ito's test to be like
this:

import java.io.*;
public class StreamTokenizerTest {
  public static void main(String[] args) throws Exception {
    StreamTokenizer tok = new StreamTokenizer(System.in);
    if (args[0].equals("NBIO")) {
       tok = new StreamTokenizer(System.in);
       tok.resetSyntax();
       tok.wordChars((char)0, (char)255);
       tok.whitespaceChars('\u0000', '\u0020');
       tok.commentChar('/');
       tok.slashStarComments(true);
       tok.eolIsSignificant(true);
    }
    System.out.println("TT_WORD = " + StreamTokenizer.TT_WORD);
    System.out.println("TT_NUMBER = " + StreamTokenizer.TT_NUMBER);
    System.out.println("TT_EOF = " + StreamTokenizer.TT_EOL);
    System.out.println("TT_EOF = " + StreamTokenizer.TT_EOF);
    while (true) {
      int t = tok.nextToken();
      System.out.println(tok.sval + ": " + t);
      if (t == StreamTokenizer.TT_EOF) break;
    }
  }
}

echo "/* */ unparsed" | java StreamTokenizerTest NBIO

results in unparsed actually being parsed. According to the hierarchy in the
JLS documentation the whole line should have been filtered out, since / is a
comment character. That's why I added the comment above.

> > (2) Comment characters must be checked before alphabetic characters
> >     and Sun's java.io.StreamTokenizer seems to do so.  Otherwise,
> >     NBIO you mentioned cannot run properly.
> > 
> 
> Hm, have you tested and if so how? Or does literature mentions about
> this? After a quick thought, I don't see any problem in the lexical
> order which is defined in nextToken-method of java.io.StreamTokenizer
> (JLS, 1st edition).

If we want to have full compatibility with Sun's implementation (instead of
their spec; note that StreamTokenizer's behaviour is not well-specified in the
API, but just in an superseeded version of the JLS) then what we are interested
in is the precedence between the different parsing operations. There are 2**5
attribute states a character can have (any combination of white space,
alphabetic, numeric, string quote, and comment character attributes). There are
2**4 flag states the tokenizer can have. That's a lot of testing ;)

A quick (i.e. not exhaustive) test would be to take a fresh tokenizer, reset it
so that all characters are ordinary, then pick a single character, and give it
different attribute combinations to see what precedence exists between them.

cheers,
dalibor topic

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com