hash tables & multi-threading

Fri Jan 15 19:55:49 PST 1999

Archie Cobbs wrote:
> 
> Alex Nicolaou writes:
> > Secondly, it seems to me that the hash table entry points aren't
> > threadsafe, and
> > each place where a hash table is used a static mutex is allocated to
> > control
> > access to that hash table. Why not put the mutex in the hash table
> > structure itself, and provide entry points hashLock() and hashUnlock()
> > that need to be called before calling into the other entry points? Or
> > perhaps make the underlying code threadsafe?
> 
> Originally it was done as you suggest, with "bigger" locks,
> but Godmar made them smaller. You'll have to ask him for
> a proof why it's still thread safe, as I haven't picked
> through his changes yet.

Possibly the code I'm looking at is out of date - I checked it out of
CVS a few days or perhaps even a week ago. In it I'd describe the locks
as pretty "big"; they are done from outside the hash table's own code
and protect the calling code from the fact that the internals of the
hash table code aren't threadsafe.

In an odd way I'm suggesting either "bigger" or "smaller" locks. My
suggestion about hashLock() and hashUnlock() allows for bigger locks. My
suggestion for making the underlying code threadsafe would make for many
smaller locks inside the hash table code itself instead of in the
calling code; in this case the mutex wouldn't be exposed at all. In my
opinion there are reasonable arguments for either. The code I have
appears to me to be OK from a thread safety standpoint right now, but
I'm worried about people modifying it carelessly.

The way it is now the locks are being done outside the hashXXX api
itself. It'd be nice to see the mutex live inside the hashTable's own
structure instead of having static ones lying around, since one careless
mistake can easily break the thread-safety of any given hash table, by
forgetting to aquire the corresponding static mutex. If the mutex was in
the hashTable's structure then every entry point could assert() that it
has the mutex before proceeding, quickly trapping the careless error. In
addition the seat-of-the-pants initialization of the mutex's does
nothing for the clarity of the code, but if they were in the hash
table's structure they could be initialized once, in an out-of-the-way
spot where it wouldn't interrupt the flow of the rest of the code.

On the other hand, putting the locks right inside the hashXXX functions
would make it truly bullet-proof at some (probably small) cost in speed.
At the same time the code at the call sites gets simpler, because
neither the initialization or aquire/release of the mutex needs to
appear.

alex