Benchmarking kaffe (Was: Re: SPECjvm98)

Tue Mar 26 05:07:32 PST 2002

On Mon, 25 Mar 2002, Dalibor Topic wrote:
> I agree that environments don't matter as long one is comparing the results to 
> one's earlier runs. I doubt that comparing one's results with those of others 
> will lead to anything beyond wild speculation.

If you think environments don't matter, then you're not agreeing with me
;) The bare minimum for the benchmarks to be useful is that we should know
the environment used. Let it be stressed that I thought of this more in
terms of "recommendation" than "requirement", ie. "The benchmarks would be
most useful if produced using such and such compiler and library such and
such, but if you prefer something else, state so with the benchmark
results". To put it in crude terms, this could for example prevent people
from re-implementing an optimization already provided in some compiler; or
conversely, breaking some compiler-based optimization.

> Sticking to a standard environment would just limit the number of
> people able to contribute results.

That's one of the things I'm afraid of. The last thing we want is people
upgrading their compiler/libraries on the run, and forgetting to mention
it in the benchmarks, leading to everybody think they've broken something
terribly, or found a new optimization. While certainly we can't, and
shouldn't, stop people from providing any benchmarks they feel they
should, there's no particular advantage to having a high number of people
contributing them. A couple of motivated people would be sufficient; and
if they're not interested enough to set up a separate toolchain to ensure
all the benchmarks are built with the same environment, I'd rather not
rely on the data they provide for anything significant.

> What kind of contribution process would be suitable for such an
> effort? Emails to a specific mailing list? Web forms?

Well, I was most initially thinking of having both a gnuplot graph of the
development of the benchmark performance over time, as well as a textual
log of the specific results. In the most simplest case, this would only
require an e-mail notification of the location of the graphs to this list,
and the URL could then be added to the official web-page if deemed
useful/reliable enough. If enough data is provided, it might be worth it
just to write a script on the web-site machine, that would gather the
benchmark logs and collate combined graphs from them.

But, as implied, if we're aiming for just "any benchmark", for posterity
and some pretend-comparisions between system perfomances, then all bets
are off, and we should probably have some sort of web-form for users to
input in that "Herez the rezultz I gotz from running my own
number-calculation benchmark, calculating how many numbers there are from
1 to 1000 while playing Doom in another Window. This is OBIVIOUSLY what
everybody else will be doing with the VM's, so I think this counts. I'm
not sure I even have a compiler." ;)

 -Jukka Santala