Re: Looking for symbol table hashing algorithm

Paul Dietz <dietz@interaccess.com>11 Jul 1998 23:40:49 -0400

From comp.compilers

Related articles
[9 earlier articles]
Re: Looking for symbol table hashing algorithm preston@tera.com (1998-07-08)
Re: Looking for symbol table hashing algorithm drh@microsoft.com (Dave Hanson) (1998-07-08)
Re: Looking for symbol table hashing algorithm smith@aragorn.uni-trier.de (1998-07-10)
Re: Looking for symbol table hashing algorithm scott@basis.com (1998-07-10)
Re: Looking for symbol table hashing algorithm henry@spsystems.net (1998-07-10)
Re: Looking for symbol table hashing algorithm chase@naturalbridge.com (David Chase) (1998-07-10)
Re: Looking for symbol table hashing algorithm dietz@interaccess.com (Paul Dietz) (1998-07-11)
Re: Looking for symbol table hashing algorithm scottdaniels@earthlink.net (Scott David Daniels) (1998-07-13)
Re: Looking for symbol table hashing algorithm buzzard@world.std.com (1998-07-13)
Re: Looking for symbol table hashing algorithm peter.r.wilson@boeing.com (Peter Wilson) (1998-07-20)
| List of all articles for this month |

 From: Paul Dietz Newsgroups: comp.compilers Date: 11 Jul 1998 23:40:49 -0400 Organization: The World's Usenet -- http://www.Supernews.com References: 98-07-030 98-07-045 98-07-084 Keywords: symbols

David Chase wrote:

> What this means is that the bucket chains must get really long
> before chaining is costly, and with a grade A hash function, the
> odds of that happening are very low indeed. I lack the skill-time
> product to generally compute odds, but if you've got 16 bucket
> heads, the odds against adding 16 entries and landing them all in
> the same bucket are (I think) 2**60 to 1.

To make chaining have good expected time on any input set it suffices
to choose a random hash function such that the probability that two
different keys hash to the same location is <= 1/(table size). That's
because the time to insert n keys will be proportional to the
expectation of the sum of the n^2 random variables C(i,j), where
C(i,j) = 1 if keys i and j collide, and 0 if they do not. Expectation
of a sum of RVs is the sum of expectations, even if the RVs are not
independent.

Universal hash functions are covered in Cormen, Leiserson and Rivest.

> For scoped symbol tables, there's a stunt, I think due to Paul Dietz,
> that is useful in some applications.
[...]
> Rather than modify the symbol table to reflect the changing scopes
> as compilation moves through the program, you simply note which
> scope you are in; a reference to a symbol X takes the form "get
> scopes from symbol table for X, find current scope in set of
> scopes". Numbering the scopes with in and out numbers in a standard
> tree walk generates bracketing ranges that allow you to find the
> nearest enclosing scope in log N time (binary search).

What I did was solve this problem in the case of a tree that is
growing by addition of individual nodes, so one can't use a static
numbering. The original paper had a log* n extra factor, which Tarjan
immediately (I mean, seconds after the presentation) pointed out how

You can do the lookup in O(loglogn) time on a RAM with O(logn) word
size using the van Emde Boas integer set data structure. Probably not
terribly useful practically.

Paul
--

Post a followup to this message