Bytecode verification

Tue Apr 29 19:11:01 PDT 1997

 > So I've finally settled down to write a bytecode verifier for Kaffe
 > (which'll please a lot of people).  Now while I was doing this I came
 > across one aspect of the verification which doesn't seem to be well
 > documented in the JVM book.  Here's the problem: What types should the
 > local variables be given at the start of an exception handler?  Should
 > is be a merge of all possible locals from anywhere in the exception
 > block?  This seems rather heavy handed to me, but perhas there's an
 > alternative?

Not that I can see.  (I presume, however, that this "merge" you refer
to is the elementwise merge of the vector of all locals, rather than
literally the merge of all elements of the vector into a single value,
which is unnecessarily conservative).  

It's not really all that heavy-handed.  The merge (join) operation
yields a value in the lattice consisting of the reference type
hierarchy, the type int, the type returnAddress, and top (which I
believe you can conflate with int for verification purposes).  It's
not an expensive operation, even when extended to a vector of locals.
It's not as if the values grow with the number of possible paths that
supply them.  

The only safe simplification (in the sense of not letting invalid
programs through the BCV) is to replace more of the values with `top'
than you really have to.  However, you have to be careful with that,
since there are well-defined circumstances in which variables are
supposed to have valid values.  For example, consider

	// Point 1
	try {
	    /* assignments to variable x (statically type-correct) */ 
	    // Point 1.5
	} catch (E0 e0) { // Point 2
	    /* uses of variable x (statically type-correct) */ 
	} catch (E1 e1) { ... } ...

This is valid Java if x is definitely assigned at Point 1, which
means that the BCV can't get away with just taking the values
of all the local slots to be top at Point 2.  And, of course, it can't
simply take them to have their abstract values from Point 1, either;
it does have to look for assignments of weird stuff in the try block.

But still, what's the big deal?  The abstract local variable vector at
Point 2 is the merge of the vectors at Point 1.5 (which I'll take to
include all normal exit statements) with those just before all method
calls that can throw an exception that is an instance of E0, and (if
E0 is a subclass of Error or RuntimeException) with those at each
other kind of statement that can throw an exception that is an
instance of E0.  While in principle that may sound bad, it sounds
simple enough to implement, and I suspect you'll find that in practice
it is not terribly time-consuming at BCV time.