Re: perfect hashing

bob_jenkins@burtleburtle.net
13 Aug 2000 19:07:25 -0400

          From comp.compilers

Related articles
[6 earlier articles]
Re: perfect hashing fjh@cs.mu.OZ.AU (2000-08-10)
Re: perfect hashing parz@home.com (Parzival) (2000-08-10)
Re: perfect hashing rick@home.com (2000-08-13)
Re: perfect hashing lars@bearnip.com (Lars Duening) (2000-08-13)
Re: perfect hashing pmk@cray.com (2000-08-13)
Re: perfect hashing tej@melbpc.org.au (Tim Josling) (2000-08-13)
Re: perfect hashing bob_jenkins@burtleburtle.net (2000-08-13)
Re: perfect hashing intmktg@Gloria.CAM.ORG (Marc Tardif) (2000-08-13)
| List of all articles for this month |

From: bob_jenkins@burtleburtle.net
Newsgroups: comp.compilers
Date: 13 Aug 2000 19:07:25 -0400
Organization: Deja.com - Before you buy.
References: 00-07-064 00-08-022 00-08-026 00-08-062
Keywords: symbols

> > Indeed, I wouldn't use it for keywords.
> > Consider the typical scanner.
> >
> > a) Recognize that we have some sort of keyword or identifier
> > b) See if it's a keyword
> > c) If not, look it up in the symbol table
>
> Step a) is done by the lexical analyser, and if the token class
> 'identifier' excludes reserved keywords, or 'identifier's which are
> also keywords are lexed as keywords, then the job of recognizing
> keywords distinct from identifiers is complete in step a. This
> eliminates step b, and avoids a perfect hasher too.


Assuming (c) requires a hash value, if you calculate the hash for all
identifiers in (a) you eliminate the loop overhead of calculating the
hash for (c). That's something like three instructions per character.
My current favorite ASCII-character-at-a-time hash is


    hash = (hash ^ current_character) + ((hash<<26)+(hash>>6));


- Bob Jenkins


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.