[kaffe] analysis of pthread spec violation (leads to deadlock on recent glibc)
noa at resare.com
Tue Sep 28 11:36:13 PDT 2004
jakub at redhat concludes that kaffe violates the pthread spec by calling
pthread_cond_wait() after pthread_cond_destroy() has been called on a
particular condition variable.
Since pthread_cond_destory was implemented as 'return 0;' up until
glibc-2.3.3-47 (from fedora devel, based on cvs snapshots of recent
glibc development) it didn't matter for the linux/glibc platform. Now it
matters for Fedora Core 3 test2 and soon it will probably matter for the
new version of your favorite Linux distribution. So I went to kaffe to
find out was going on.
It turns out that in the unix-pthread implementation of kaffe threading
the following occurs sometimes on thread exit:
- pthread_cond_wait() gets called from exitThread() in thread.c (via
unlinkNativeAndJavaThread() -> ksemDestroy() -> jcondvar_destroy())
- thereafter jthread_exit() gets called in exitThread() and is
advertised as not returning. This is not true. As can be found in
systems/unix-pthreads/thread-impl.c if certain conditions are met it
just returns. when exitThread() returns control is moved to tRun in
- before tRun() goes on to remove the current thread from the
activeThreads list it calls TLOCK() calls pthread_cond_wait() (via
lockStaticMutex() -> locks_internal_lockMutex() -> slowLockMutex() ->
ksemGet() -> jcondvar_wait())
The obvious way of fixing this IMHO is to delay the
unlinkNativeAndJavaThread() invocation until the threads implementation
has removed the current thread from the activeThreads list. However this
has an obvious drawback that a required step in thread deconstruction is
moved from generic code to threading implementation specific code. (In
other words, it needs to be fixed or at leased verified as non-
problematic in all threading implementations).
What threading implementations are a priority? Since unix-pthread is
seriously broken at the moment and I suppose that it is by far the most
commonly used threading implementation fixing it is worth some breakage
in lesser used ones.
I can take responsibility for testing a proposed fix on unix-jthread but
are there anyone out there taking responsibility for win32, beos-native
and oskit-pthreads? Are they used by anyone?
Need to sleep now, but I'll probably looking at coding a fix tomorrow
(if now one beats me to it :)
ps. One easy testcase that exhibits the lockup in case anyone has
forgotten is ThreadState.java in test/regression. If you increase the
number of threads to for example 50 it locks up 95% of the time.
More information about the kaffe