hash tables & multi-threading

Sun Jan 17 19:51:02 PST 1999

Godmar Back wrote:
> Also, storing the lock in the hashtable structure, as you suggest makes
> less sense to me than declaring (by name) what the particular locks are
> really supposed to protect, namely string or utf8const operations.

If these locks are *also* protecting against thread problems at the call
sites then I may just agree with you :-). From my read of the code it
seemed as though the only issue was the hash table access, and I didn't
see the locks as needed for controlling the code where they occurred. I
skimmed the code again looking for its own thread-safety problems and
didn't spot them, since the code seems to spend all of its time
accessing the hash table or allocating new objects on the stack.
Nonetheless I'd better take your word for it that the locks are really
required at the call sites for reasons other than the hash table access.

Even assuming the locks are being used for the dual purpose of
protecting string ops and making sure the hash table is safe, you could
do #define aquireStringAndHashTableLocks() hashLock(hashTable) and have
logical names, the same protection the code currently has, and the
sanity check that I talk about below.

> As for the fragility of the code, whether or not you store the lock
> in the table and have the caller of an hashtable operation take responsibility
> for invoking hash.lock before and hash.unlock after invoking, say hash.add
> OR whether you allow the caller to name the lock doesn't make any difference
> in my eyes.

Well, perhaps I could convince you on this point. If the lock was in the
hashtable structures, then every entry point could begin with
assert(holdMutex(tab->lock)), since this condition must be true or you
risk a potential bug in the code. Since all the entry points can contain
this assertion, as soon as someone makes a programming error they'll get
the assertion failure. The most likely spot for this type of error is
when a new programmer uses the existing hashtable functionality in a new
place - since the code doesn't say "these hash tables are only for
strings & utf8 constants" this is at least a plausible possibility. A
programming error is less likely inside the current modules since they
already aquire the mutex for big chunks of their code and the mutex is
fairly visible. The way the mutex is currently named actually obscures
the fact that the mutex is at least in part being used to satisfy the
hash table's requirements.

> I also don't fully understand your point about making the hashtable
> code thread-safe.
> If you mean to use non-blocking synchronization that doesn't require locks
> at all, then please, implement it and submit the patches --- but if you
> mean to introduce redundant internal locking, I'd reject it because
> a) it's unnecessary and a run-time overhead and b) it doesn't work for
> strings because of the gc reentrancy problem.

I meant synchronizing inside the hash table code itself, on a privately
held lock that isn't exposed. I'm not sure it's so cut-and-dried as you
present it, but it could be my inexperience with this code. It seems to
me that the only place GC could get invoked is in the opening lines of
hashResize. A mutex embedded inside the hash table api itself wouldn't
need to be held during these lines - after all, they are just allocating
a stack variable, and aren't modifying the tab structure at all.  How
expensive having many lock aquire/releases would be depends on the
implementation of the underlying thread system; I did concede that there
would be a performance penalty to doing it this way. Since the locks at
the call site are required for its own code as well as the hash table
code (at least, that's what I read into what you say) this efficiency
penalty is probably not worth paying.

alex