GC problem (finalisers) & comments

Wed Apr 9 05:47:20 PDT 1997

I should warn people this is a bit long. It covers several GC issues I
seem to have come up against.

I have spent some time tracking down a problem which caused crashes
(which were fortunately reasonably repeatable in that
biss.jde.LibBrowser died). I have produced a patch which solves the
immediate problem but I wonder if there is something more fundamentally
wrong here. I'll try and explain the problem for others to contemplate.

What actually happened in this case is the original first thread in the
system gets killed. The garbage collector eventually puts it and its
name ("main") on the white list. They then gets transferred to the
finaliser list. The Thread class actually does have a finalizer so the
finalizeThread routine returns false and the finaliser doesn't actually
free the memory for the thread object itself. However the name "main" is
also on the finaliser list and has no finalise routine so it just gets
freed. We now have a thread object in "normal" state with a pointer to
free memory. 

The garbage collector runs again and happens to find a pointer to this
thread on some bit of stack when walking (not sure whether its usefully
there or just left over). Eventually walkThread is called which will
call MARK_REFERENCE on the name which is free but markObject does no
checks and just then accvios nicely.

My "quick hack" for this is to just NULL out the things referenced by a
thread after calling its finaliser so those pointers are never touched
again but that doesn't feel terribly safe.

*** /users/bernard/orig-kaffe-0.8.3/kaffe/kaffevm/thread.c      Mon Apr 
7 18:05:33 1997
--- thread.c    Wed Apr  9 13:27:05 1997
***************
*** 1049,1060 ****
--- 1049,1065 ----
        /* Since this thread my be extended by another class, walk any
         * remaining data.
         */
+       gcStats.markedmem -= (base->size - sizeof(thread));
        scanConservative(tid+1, base->size - sizeof(thread));

        if (tid->PrivateInfo != 0) {
                ct = TCTX(tid);
                /* Nothing in context worth looking at except the stack
*/
+ #if STACK_GROWS_UP
+               scanConservative(ct->stackBase, ct->restorePoint -
ct->stackBase);
+ #else
                scanConservative(ct->restorePoint, ct->stackEnd -
ct->restorePoint);
+ #endif
        }
  }

***************
*** 1084,1089 ****
--- 1089,1098 ----
        final = findMethod(OBJECT_CLASS(&tid->obj), final_name,
void_signature);
        if (final != NULL) {
                CALL_KAFFE_FUNCTION(final, &tid->obj);
+                 tid->name = 0; 
+                 tid->next = 0; 
+                 tid->target = 0; 
+                 tid->group = 0; 
                return (false);
        }
        return (true);

Note that I believe this may have been what was causing someone to be
concerned about obj->ref being zero as that is what free memory looked
like (though, of course, this depends on the malloc package behaviour).

It also doesn't stop the fact that other explicit walk routines that use
MARK_REFERENCE may not end up in the same state whereby the object
itself has a finaliser and so isn't freed first time but objects it
references have no finaliser and are immediately freed. One solution to
just stop bad pointer accessed is to make MARK_REFERENCE do all the
checking that scanConservative does to avoid touching bad pointers (more
on this later).

But this still leaves the problem that you end up with live objects
referencing free memory which is unsound. If having a finaliser means we
shouldn't free the memory first time its garbaged (which for the moment
I don't quite understand why this is so) then surely the only safe thing
to do is to put all referenced objects into the same state and not free
them this time. Also, is there any guarantee of order here, can't we
free a referenced object before the referencer? I am sure there is
standard stuff to do here (can't remember the stuff in the various
papers I've read).

Actually I do wonder whether it wouldn't be simpler just to plug in the
Boehm garbage collector instead (e.g. as Sather does). Has anyone done
this? (I can't recall if it can run incrementally though).

On the obj->ref mechanism. This doesn't actually seem completely safe to
me. I would have though that MAKEREF could have made up some number that
might look like a valid address which then has the equivalent of
obj->ref being correct (I know this is extremely unlikely but its
theoretically possible and in many platforms process address layouts is
probably impossible). If this mechanism is still being used I would
suggest have MAKEREF(a,b) being ((a * REF_MAXW + b) * 2 + 1). Like this
a ref can never be a valid object because its always odd. Of course the
rest of the free ref handling needs to understand the new format.

Finally I'm not convinced the GC is properly thread safe. gc_malloc
doesn't disable interrupts around the calloc (unlike checked_calloc) and
I would have thought the free list ref handling should be done with
interrupts disabled so we don't allocate the same free ref id to two
objects (imagine a thread switch just after the line

     obj->ref = refTable.free

in gc_malloc). Making the whole routine run without interrupts would
seem to be basically necessary.

As I said before, apologies for the length, but this seemed worth
saying.
-- 
Bernie Solomon (bernard at edsug.com or Bernard.Solomon at acm.org)
Unigraphics Architecture, EDS-Unigraphics, Cambridge, UK