hash tables & multi-threading

Sun Jan 17 23:55:46 PST 1999

> 
> If these locks are *also* protecting against thread problems at the call
> sites then I may just agree with you :-). From my read of the code it
> seemed as though the only issue was the hash table access, and I didn't
> see the locks as needed for controlling the code where they occurred. I
> skimmed the code again looking for its own thread-safety problems and
> didn't spot them, since the code seems to spend all of its time
> accessing the hash table or allocating new objects on the stack.
> Nonetheless I'd better take your word for it that the locks are really
> required at the call sites for reasons other than the hash table access.

As an example, look at this call-site:

void
utf8ConstRelease(Utf8Const *utf8)
{
        assert(staticLockIsInitialized(&utf8Lock));
        lockStaticMutex(&utf8Lock);
        assert(utf8->nrefs >= 1);
        if (--utf8->nrefs == 0) {
                hashRemove(hashTable, utf8);
                jfree(utf8);
        }
        unlockStaticMutex(&utf8Lock);
}

Here, the lock protects both the "nrefs" element in the utf8
and the hashtable.

> Well, perhaps I could convince you on this point. If the lock was in the
> hashtable structures, then every entry point could begin with
> assert(holdMutex(tab->lock)), since this condition must be true or you
> risk a potential bug in the code. Since all the entry points can contain
> this assertion, as soon as someone makes a programming error they'll get
> the assertion failure. The most likely spot for this type of error is
> when a new programmer uses the existing hashtable functionality in a new
> place - since the code doesn't say "these hash tables are only for
> strings & utf8 constants" this is at least a plausible possibility. A
> programming error is less likely inside the current modules since they
> already aquire the mutex for big chunks of their code and the mutex is
> fairly visible. The way the mutex is currently named actually obscures
> the fact that the mutex is at least in part being used to satisfy the
> hash table's requirements.
>  

I see what you're saying.
Like I said, this hashtable implementation is not intended to be
fully general, but for the specific cases of handling utf8const or
string.  Should other uses arise, we can always find other solutions.

What if a programmer creates one hashtable per thread for a hypothetical
application?  Then the sanity assertion you mention wouldn't make sense.
Basically, I'm willing to make the code more reusable if there is an
actual need.

Also, even though the code doesn't say "this is only for strings & utf8consts",
it does say that this hashtable implementation is single-threaded and that
the responsibility for locking is upon the caller.

> 
> How expensive having many lock aquire/releases would be depends on the
> implementation of the underlying thread system; I did concede that there
> would be a performance penalty to doing it this way. Since the locks at
> the call site are required for its own code as well as the hash table
> code (at least, that's what I read into what you say) this efficiency
> penalty is probably not worth paying.
> 

Well, you first have to get to the underlying threading system.
In the optimal and uncontended case, a lock should be taken in 
two instructions.  We're far from that.

Also, the performance discussion here is pretty much moot anyway since 
not much time is spent in this code, except maybe for string internalization
benchmarks, which we don't win for other reasons.

	- Godmar