Speed comparisson between Kaffe/jdk on Linux/Windows

Sun Apr 11 06:51:04 PDT 1999

Artur Biesiadowski wrote:
> 
> Constantin Teodorescu wrote:
> 
> > I am confused by this results. Should I understand that under Linux, jdk
> > 1.1.7 is using a JIT ?!?!?
> > Or, the interpreter of jdk 1.1.7 is as fast as Kaffe JIT ?
> 
> Basically yes, in many cases sun's interpreter is as fast as kaffe
> jitted code. Why kaffe jit is slow is quite obvious - it does not
> perform any optimalizations except putting some stack slots in
> registers. But even with that it should be faster (tya uses even more
> dumb translation scheme and is faster for most apps). Do anybody know
> which part is so slow in kaffe ? Method dispatching ?
> BTW, somebody should do even simple method inlining for kaffe - speed
> increase can be trmendous in some cases.

Here are a few answers why kaffe jit is so slow:

1. Kaffe do _no_ optimization, even simple pipehole
optimization.
2. Kaffe puts some stack slots in registers, but it
have no any allocation strategy for this. They
allocated "as is", and together with lack of
pipehole optimization - using registers do not
help optimize execution.
3. Kaffe uses C-call convention, while java requires
Pascal-call convention (i.e. first argument pushed
first). So, for calls with some arguments values
are needed to be stored in temp slots after
calculation, to be pushed in reverse mode.
4. Kaffe do not have method inlining for
small methods, it does not have even simplified frames
for methods that do simple things - just return
a value of a field or something like this.

>From the other point of view, it looks like
jdk uses assembler loop over bytecode instructions,
and it follows java call conventions and so on.
TYA jit also uses very simple algoritm of
jitting, but it has carefully optimized assembler
code for complex bytecode instructions and
very good choice for allocating (top) stack values
in registers.

IMHO, there are two ways of rewriting jit.
First is continue with RTL approach it uses
now, but it must not immedeatly generate
code for each instruction, it must have
intermidiate stage of optimization.
Those optimizations should be modularized and
it should be easy to add new optimization
procedures.

Second is to switch to TYA approach, but with
slot<->register allocation strategy selected
for particular CPU. Also, two-pass jit will
be allowed to have more possible optimizations
(TYA is one-pass jit compiler).

TYA approach can give us _very_ fast and simple
jit with acceptable level of optimization.
RTL approach will give us slow, but potensially
highly-optimized jit.

Of couse, there is also gcj compiler. If integration
with it is possible (IMHO it whould be very hard
task, if possible at all) than the best approach is
to choose TYA-like simple and fast jit and recommend
to users to use gcj to pre-compile offten used
classes.

Also, RTL jit approach may become more attractive,
if it will be combined with hashing of jit-compiled
code for classes.

Reagrds
  Maxim Kizub