hash tables & multi-threading

Sun Jan 17 11:15:49 PST 1999

Alex, you are right that we shouldn't test everytime whether the
static locks are initialized, but move their initialization in an init
function instead.

However, it turned out to be better to leave the table single-threaded
and put the responsibility for locking on the caller.  The reason
is that these hashtable aren't general purpose anyway --- they're used
for strings and utf8consts.  For the latter, we have to protect the
ref counts for the utf8const, so it's more practical to protect utf8
ops as a whole.  For strings, the problem is that resizing the table
may cause a gc, and the gc now synchronously uninterns strings before
freeing them.  This requires the hashtable not be locked.  Again,
it seemed better to not concern the hashtable with locking.

For these reasons, it was better to define the hashtable as being
single-threaded and move the responsibility for locking to the caller.

Also, storing the lock in the hashtable structure, as you suggest makes
less sense to me than declaring (by name) what the particular locks are
really supposed to protect, namely string or utf8const operations.

As for the fragility of the code, whether or not you store the lock
in the table and have the caller of an hashtable operation take responsibility
for invoking hash.lock before and hash.unlock after invoking, say hash.add
OR whether you allow the caller to name the lock doesn't make any difference
in my eyes.

I also don't fully understand your point about making the hashtable 
code thread-safe.
If you mean to use non-blocking synchronization that doesn't require locks
at all, then please, implement it and submit the patches --- but if you
mean to introduce redundant internal locking, I'd reject it because
a) it's unnecessary and a run-time overhead and b) it doesn't work for
strings because of the gc reentrancy problem.

	- Godmar

> 
> Archie Cobbs wrote:
> > 
> > Alex Nicolaou writes:
> > > Secondly, it seems to me that the hash table entry points aren't
> > > threadsafe, and
> > > each place where a hash table is used a static mutex is allocated to
> > > control
> > > access to that hash table. Why not put the mutex in the hash table
> > > structure itself, and provide entry points hashLock() and hashUnlock()
> > > that need to be called before calling into the other entry points? Or
> > > perhaps make the underlying code threadsafe?
> > 
> > Originally it was done as you suggest, with "bigger" locks,
> > but Godmar made them smaller. You'll have to ask him for
> > a proof why it's still thread safe, as I haven't picked
> > through his changes yet.
> 
> Possibly the code I'm looking at is out of date - I checked it out of
> CVS a few days or perhaps even a week ago. In it I'd describe the locks
> as pretty "big"; they are done from outside the hash table's own code
> and protect the calling code from the fact that the internals of the
> hash table code aren't threadsafe.
> 
> In an odd way I'm suggesting either "bigger" or "smaller" locks. My
> suggestion about hashLock() and hashUnlock() allows for bigger locks. My
> suggestion for making the underlying code threadsafe would make for many
> smaller locks inside the hash table code itself instead of in the
> calling code; in this case the mutex wouldn't be exposed at all. In my
> opinion there are reasonable arguments for either. The code I have
> appears to me to be OK from a thread safety standpoint right now, but
> I'm worried about people modifying it carelessly.
> 
> The way it is now the locks are being done outside the hashXXX api
> itself. It'd be nice to see the mutex live inside the hashTable's own
> structure instead of having static ones lying around, since one careless
> mistake can easily break the thread-safety of any given hash table, by
> forgetting to aquire the corresponding static mutex. If the mutex was in
> the hashTable's structure then every entry point could assert() that it
> has the mutex before proceeding, quickly trapping the careless error. In
> addition the seat-of-the-pants initialization of the mutex's does
> nothing for the clarity of the code, but if they were in the hash
> table's structure they could be initialized once, in an out-of-the-way
> spot where it wouldn't interrupt the flow of the rest of the code.
> 
> On the other hand, putting the locks right inside the hashXXX functions
> would make it truly bullet-proof at some (probably small) cost in speed.
> At the same time the code at the call sites gets simpler, because
> neither the initialization or aquire/release of the mutex needs to
> appear.
> 
> alex
>