[kaffe] string memory usage

Wed May 11 07:44:21 PDT 2005

Hi,

Pursuant to mjw's request, I'm sending this here:

I'm writing an apache logfile parser (yes, yes, re-inventing the wheel and
all that, but this brings up a more general problem anyway), and ran into
the problem that its memory consumption was going through the roof despite
storing data to a circular buffer and thus eventually throwing out old data.

The core of the parsing had been something to the effect of

while((line = log.readLine()) != null)
	addDataToBuffer(parseLine(line));

where parseLine would pull apart sub-pieces of the String line, such as the
IP address, the referer, etc. In many cases integers were parsed out using
a Integer.parseInt(line.substring()) strategy.

As I said, the memory usage of this kept growing (roughly) linearly with
input after the circular buffer wrapped. I re-implemented a readLine
function to return a character array, and re-implemented the line parsing
purely in terms of that character array (creating the String for referer by
using the String(char data[], int start, int len) constructor). Not only did
this substantially reduce memory usage initially, but memory usage also grew
much slower after the circular buffer wrapped. I'm talking about the
difference between failing after using about 300M of memory (oom killer)
versus succeeding after about 180M of memory, in both cases on a 330M log
file.

So, at mjw & robilad's request, I've implemented an example case which just
implements a rudimentary circular buffer for storing hashes which count the
number of times each IP made requests on that day. (Note: I fake days by
assuming that they're 1E5 requests long, so as not to clutter the example up
with apache date parsing code.)

The example provides both core implementations, string and char[]. You can
tune the parameters in the source, and choose the implementation at runtime.
While running, it will indicate it's progress in the circular buffer. This
is meant to be used in conjunction with another terminal monitoring memory
usage, so you can see the relationship between usage and when wrapping
happens. By default it uses /var/log/apache/access.log, but you can change
this by editing logFileName in the source.

I recommend running this with a log file of at least a few hundred meg or
tuning down kaffe's initial heap size as appropriate.

(Please note: I'm not here making any claims about how JVMs or class
libraries should be designed, or trying to disparrage kaffe. First, I
actually discovered this problem with gcj, which was slightly worse about
memory usage (though much faster at execution), and second I think that both
gcj and kaffe are Really Cool. I'm just providing this for reference.)

Thanks,
-Chris Lansdown

-- 
chris at powerblogs.com -> http://www.powerblogs.com/
"Let us endeavor so to live that when we come to die even the undertaker 
will be sorry."  -- Mark Twain, "Pudd'nhead Wilson's Calendar"
========== Evil Overlord Quote of the Day (www.eviloverlord.com) =========
141. As an alternative to not having children, I will have lots of children.
My sons will be too busy jockeying for position to ever be a real
threat, and the daughters will all sabotage each other's attempts to
win the hero. 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MemoryExample.java
Type: text/x-java
Size: 4039 bytes
Desc: not available
Url : http://kaffe.org/pipermail/kaffe/attachments/20050511/f1fe11fa/attachment-0002.java