cleaning up the locking layers

Sun Feb 27 16:33:38 PST 2000

I'm porting Kaffe to another threading implementation (for an active
network NodeOS if you care --- mostly a posix-like thread interface).
In doing this, I've had some problems with the locking.  In an attempt 
to understand the locking code, I've updated the FAQ.locks with a lot
more information, and I've made several changes.

However, before I go about cleaning this up, and testing and
submitting it back to Kaffe, I'd like to get some input on the
changes.  So, please let me know what you think.

The major change is to make the 'sem2posixLock' less of a hack.  I
turned it into a first-class Kaffe type 'Ksem' and removed a lot of
#define indirections.  Ksem is always defined, and has a default
implementation built on jmutex/jcondvar.  Additionally, I've set
things up so that threading systems can supply the 'Ksem' abstraction
directly if they can (and thus won't need to supply the
jmutex/jcondvar).

The attached revision of FAQ.locks contains more details about the
change.

-Pat

----- ----- ---- ---  ---  --   -    -      -         -               -
Pat Tullmann                                       tullmann at cs.utah.edu
      That which does not kill you just didn't try hard enough.

Here is a brief summary of locking in Kaffe.  

There are several layers to the locking abstractions in Kaffe.
Ideally these layers are collapsed by the compiler.

Java Object Locks
=================

	files: kaffe/kaffevm/locks.[hc]

	lockObject(H_java_lang_Object*)
	unlockObject(H_java_lang_Object*)

These should only be used for Java object monitors.  They just take a
pointer to the Java object.  The intent is that if we ever create a
facility to log the execution of all monitorenter/exit bytecodes,
entries and exits of synchronized methods and JNI MonitorEnter/Exit
primitives, we'll be able to do it by tweaking this function.  These
functions are implemented as a thin layer over the VM-internal locks
(See the next section).

NOTE: these are functions, and must remain so, since their addresses
are taken in some engines.

The wait and signal functions on Java objects map directly to the wait
and signal functions in the VM-internal locks (see the next section).
XXX There should probably be waitObject(), signalObject(), and
broadcastObject() interfaces.  Currently the waitCond(), signalCond()
and broadcastCond() interfaces are just invoked directly.  
XXX should be a slowUnlockObjectIfHeld(), too (for stack unwinding due
to an exception).

NOTE: Two other functions are part of this interface:
	slowLockObject(H_java_lang_Object*, void*);
	slowUnlockObject(H_java_lang_Object*, void*);
these are used by some engines when the inlined "fast lock" fails.
See the "Kaffe Fast Locking" section, below.  There need not be
"slow" versions of the wait/signal/broadcast functions.

VM-internal object locks
========================

	files: kaffe/kaffevm/locks.[hc]

	types: iLock

	lockMutex	lockStaticMutex
	unlockMutex	unlockStaticMutex
	waitCond	waitStaticCond		
	signalCond	signalStaticCond	
	broadcastCond	broadcastStaticCond	

There are two flavors of VM-internal locking interfaces.  One for
dynamic locks (left column) and one for static locks (right column).
Dynamic locks are used to protect VM-internal dynamic structures
(e.g., classes).  Static locks are generally global locks (e.g., the
gc lock or finalizer lock).

The VM must use its own internal locks for most internal state, to
avoid race conditions with arbitrary Java code that uses object locks.
(For example on threads and classes, there are both logical Java-level
locks, and VM-internal locks).

There is one instance of using the VM-internal interfaces to lock a
Java-level object: when JIT-compiling a method, a lock on its Class is
held.  Logging such a lock via the lockObject() interface would be
unexpected: it would only be logged when the JIT was in use.
Moreover, it is not mandated nor directed by any of the Java monitor
facilities.  Therefore, this case uses the VM variant, and so should
similar cases that ever come up.

These locks are recursive.  Despite the disjoint names, the same iLock
object supports both the lock/unlock and wait/signal/broadcast
interfaces.

The non-static interfaces are a bit odd in that the argument passed in
must be a pointer to a struct with a "lock" field.  The lock field
must be an iLock pointer.  The static interfaces take a pointer to a
iLock pointer.  

Lastly, both static and non-static mutex functions assume an implicit
"iLockRoot" automatic(*) variable exists on the current stack frame.
(*) i.e., it must not be static nor global, otherwise its whole
purpose, which is of speeding up locking by means of using stack
information, would be defeated.  There is an implicit restriction in
the use of lock/unlock: they must occur in the same stack frame.
There is a way around this restriction, but if don't know what it is,
you probably shouldn't try it.

XXX How many of the internal locks really need to be recursive?

Threading System Synchronization Primitives
===========================================

The VM-internal locking scheme is implemented on top of the primitives
provided by the Kaffe threading system.  The underlying threading
system has two choices for implementing the lowest layer of locking in
Kaffe.  First is the Ksem interface which provides a semaphore
primitive on which the Kaffe run-time bases its "heavy locks".  (See
the "Kaffe Fast Locking" section below.)  The second option is to
implement the "jmutex/jcondvar" interface (which is just used to fill
out a default Ksem implementation.)

A. The Ksem interface:
---------------------

	files: kaffe/kaffevm/ksem.h, threading system lock files

	types: Ksem

	ksemInit(sem)
	ksemGet(sem,timeout)
	ksemPut(sem)

The Ksem is a binary semaphore, and need not be recursive.  The Ksem
is initialized to 0 (in semaphore-speak this means the resource is not
available).  ksemGet() can timeout, thus it returns a boolean
indication of whether it acquired the semaphore or not.

As a side-effect of Kaffe's fast locking scheme, there is a one-to-one
mapping between Ksem and threads.  Also, all Ksem's are dynamically
allocated, and all are initialized before being used.

B. The jmutex/jcondvar interface:
--------------------------------

	files: kaffe/kaffevm/ksem.h, threading system lock files

	types:	jmutex, jcondvar

	jmutex_lock
	jmutex_unlock

	jcondvar_signal
	jcondvar_wait

At this layer, mutexes are distinct from condition variables, though
the condition variable interface is provided with the associated
mutex.  A jmutex should not be recursive.  Kaffe guarantees that a
jmutex and jcondvar are used in a strict pair (see the implementation
of Ksem).  Broadcast is never used on the condvar.

Kaffe Fast Locking
==================

Kaffe has a fast locking scheme that minimizes the cost of uncontended
locks.  The fast locking scheme is implemented between VM-internal
locking layer and the threading system's locking primitives.  The
basic idea is that a "heavy" lock is not allocated unless there is
some contention for it.  Additionally, another trick is used to avoid
maintaining extraneous state for recursive locks.  The Kaffe Fast
Locking scheme takes advantage of several observations.  First, all
lockable objects have a pointer to a heavy lock, in case one is needed
for the object.  Second, all pointers to heap-allocated objects in
Kaffe are at least 4-byte aligned, thus making the bottom two bits of
every pointer available.  Third, if a recursive lock is re-acquired by
the same thread, it is necessairly acquired in a more recent stack
frame (i.e., the frame in which the lock was originally acquired is
still on the stack).  The recursion optimization can be considered
separately from the fast lock acquistion.

Here is some weak pseudo-code to explain the algorithm.

if lock field is NULL
	set lock field to current thread's "id".
else 
	/* lock field is a pointer */
	if lock field bottom bit is set
		/* lock field points to a heavy lock */
		...
	else
		/* lock field points to some thread's "id" */
		allocate a heavy lock and 
		set the lock field to address of heavy lock

This must all be done atomically.  Kaffe gets around this requirement
by chaining atomic compare-and-swap operations and by swapping the
magic value LOCKINPROGRESS into the field.  The common case requires a
single atomic compare and swap.

The "id" is where the recursive lock optimization comes in.  The id is
actually the address of a local stack slot in the function that
invoked "lockMutex".  This is used to decide if a thread owns a lock
(if the address doesn't fall on its stack, then the thread cannot own
the lock) and is used to optimize recursive locks (if the address is
on the current thread's stack, then it must be from a previous stack
frame).  This is why the iLock variable must be allocated as an
automatic variable on the stack frame.

Jason Baker <jbaker at cs.utah.edu>
Jun 29, 1999
updated by Alexandre Oliva <oliva at lsd.ic.unicamp.br>
Sept 18, 1999
updated by Patrick Tullmann <tullmann at cs.utah.edu>
Feb 25, 2000