Re: Looking for symbol table hashing algorithm

preston@tera.com (Preston Briggs)
8 Jul 1998 01:37:30 -0400

From comp.compilers

Related articles
[3 earlier articles]
Re: Looking for symbol table hashing algorithm jmccarty@sun1307.spd.dsccc.com (1998-07-03)
Re: Looking for symbol table hashing algorithm zs@cs.mu.OZ.AU (1998-07-05)
Re: Looking for symbol table hashing algorithm chase@naturalbridge.com (David Chase) (1998-07-05)
Re: Looking for symbol table hashing algorithm dietz@interaccess.com (Paul Dietz) (1998-07-05)
Re: Looking for symbol table hashing algorithm khays@sequent.com (1998-07-05)
Re: Looking for symbol table hashing algorithm eodell@pobox.com (1998-07-08)
*Re: Looking for symbol table hashing algorithm preston@tera.com* (1998-07-08)**
Re: Looking for symbol table hashing algorithm drh@microsoft.com (Dave Hanson) (1998-07-08)
Re: Looking for symbol table hashing algorithm smith@aragorn.uni-trier.de (1998-07-10)
Re: Looking for symbol table hashing algorithm scott@basis.com (1998-07-10)
Re: Looking for symbol table hashing algorithm henry@spsystems.net (1998-07-10)
Re: Looking for symbol table hashing algorithm chase@naturalbridge.com (David Chase) (1998-07-10)
Re: Looking for symbol table hashing algorithm dietz@interaccess.com (Paul Dietz) (1998-07-11)
[3 later articles]

| List of all articles for this month |

From:	preston@tera.com (Preston Briggs)
Newsgroups:	comp.compilers
Date:	8 Jul 1998 01:37:30 -0400
Organization:	Compilers Central
References:	98-07-030 98-07-042 98-07-051
Keywords:	symbols

>>>[My impression is that so long as you avoid obviously horrible hashes
>>>that do things like making all single letter symbols collide, it
>>>doesn't make much difference. -John]

>>That would be true, except that the set of "obviously horrible" hashes
>>is quite large. Real data sets exhibits several kinds of similarity
>>between strings to be hashed, and designing an algorithm that
>>successfully deals with *all* these similarities is a non-trivial
>>exercise.

>I'll put in a plug for using AVL trees (or other balanced trees) at
>this point - they don't show degenerate performance on any case, and
>performance is good on real-world (and machine generated) codes, in my
>experience.

>In particular, having had to write compilers that had to deal with a
>lot of machine-generated code, I will forevermore avoid hashing as a
>technique where the input data is unknown.

Another approach would be to try multiset discrimination

    title="``Look Ma, No Hashing, And No Arrays Neither''",
    author="Jiazhen Cai and Robert Paige",
    booktitle=popl18,
    address="Orlando, Florida",
    year=1991,
    month=jan,
    pages="143--154"

I find the paper somewhat hard to read, but the basic idea is to
recognize that while hash tables provide O(n) probabilistic time for n
accesses in an on-line fashion, but that the on-line behaviour is
unnecessary for many applications (e.g., symbol tables in multipass
compilers). So they substitute a simple alternative, multiset
discrimination, to achieve O(n) time for n access in an off-line
fashion.

Note that the word "probabilistic" has disappeared! This means you'll
get your nice time bound regardless of the distribution of the keys.

Of course, the off-line part has some engineering implications and
I've never heard of anyone actually using this approach in a compiler
(but I'd be interested in hearing about it). You'll need to collect
all the identifiers (and perhaps the keywords), then apply multiset
discrimination to distingish between them.

Preston Briggs
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Looking for symbol table hashing algorithm

preston@tera.com (Preston Briggs)8 Jul 1998 01:37:30 -0400

preston@tera.com (Preston Briggs)
8 Jul 1998 01:37:30 -0400