Rewriting the GC

Mon Apr 1 12:23:06 PST 2002

On Mon, Apr 01, 2002 at 11:10:18AM -0700, Patrick Tullmann wrote:
> Erik wrote:
> > > The big problem with walking the stack isn't the Java stack as much as
> > > the native stack.  You could walk the Java parts precisely, and the
> > > native bits conservatively, but I don't know what you'd win anything
> > > by doing this.
> > 
> > OK, I'm not so familiar with the way Java interacts with
> > native code, but why do we need to walk the native bits at
> > all?  Surely C code doesn't need GC?
> 
> All the native methods in Kaffe, the threading system, the core class
> loading --- that's all written in C.  That code uses the same stack as
> the Java code.  That code uses Java object refereneces left and right,
> and stores them on the stack.

Ah, now I understand.

Well, I'm not going to do the work here, so perhaps I should just
shut up, but one way to do this is to have a (per-thread) global
called gc_chain that is available in every function.  Then when
you want to have gc-able structures on the stack you do something
like

struct thingie {
    gcChain *next;
    int signature;
    here go the real fields
}

Then you use it with

{
    struct thingie t;
    t->next = gc_chain;
    t->signature = THINGIESIGNATURE;
    memzero(the rest of t);
    gc_chain = (gcChain *)&t;
    Now do stuff with t
}

the GC has a chain of structures on the stack, each with a
signature that tells it what types are in it.  You can have
signature called 1reference, 2references, 3references etc.
to allow you to save single pointer (not structs) on the
stack too.

For this to work you can't allow GC at random points in the
code (since things could be in registers), but every so often
(ie back edges etc.) the C code has to call checkgc(gc_chain)
(the parameter will ensure that all registers are flushed to
memory) which will check whether the other threads are waiting
to do a GC.  The gc_chain should also be a parameter to the
allocation routine so we can always do a GC in there.

I guess we have to have the ability to walk forward to the
next safe point, and since that's the same as what we need
in Java code I guess that's the way to do it.

I haven't thought through all details of this idea, but it
seems to me it should work.

> > > As I understand GC trade-offs, the big win for precise GC is the
> > > ability to update pointers and thus implement a compacting collector.
> > > Is there something else you're hoping to get out of precise stack
> > > walking?
> > 
> > Predictability and speed of GC.
> 
> You're talking about bogus pointer references in a conservative scan
> being viewed as pointers, when in fact they're just integers that look
> like pointers?

Actually, I was just talking about compacting/generational/semispace
GCs, which require the ability to move objects, and which are all
faster than nonmoving GCs because they only treat a small part
of the heap on any given GC.

> nice ones) a system that supports safe points is (and I'm somewhat
> guessing here) much easier to write safe native code in --- you don't
> have to worry about your C code getting interrupted by the GC at
> arbitrary points, only at safe points you explicitly insert in your
> code.  There are downsides, of course --- like if you forget to put a
> safe point in somewhere, the GC can be blocked for a long time.

I agree now.

-- 
Erik Corry erik at arbat.com