Re: perfect hashing

"Parzival" <parz@home.com>
10 Aug 2000 00:15:37 -0400

          From comp.compilers

Related articles
Hashtable alternatives gwynfa@paradise.net.nz (Gwynfa) (2000-07-27)
Re: Hashtable alternatives rkrayhawk@aol.com (2000-08-04)
perfect hashing preston@tera.com (Preston Briggs) (2000-08-04)
Re: perfect hashing ceco@jupiter.com (Tzvetan Mikov) (2000-08-05)
Re: perfect hashing jsgray@acm.org (Jan Gray) (2000-08-09)
Re: perfect hashing jmochel@foliage.com (2000-08-10)
Re: perfect hashing fjh@cs.mu.OZ.AU (2000-08-10)
Re: perfect hashing parz@home.com (Parzival) (2000-08-10)
Re: perfect hashing rick@home.com (2000-08-13)
Re: perfect hashing lars@bearnip.com (Lars Duening) (2000-08-13)
Re: perfect hashing pmk@cray.com (2000-08-13)
Re: perfect hashing tej@melbpc.org.au (Tim Josling) (2000-08-13)
Re: perfect hashing bob_jenkins@burtleburtle.net (2000-08-13)
Re: perfect hashing intmktg@Gloria.CAM.ORG (Marc Tardif) (2000-08-13)
| List of all articles for this month |

From: "Parzival" <parz@home.com>
Newsgroups: comp.compilers
Date: 10 Aug 2000 00:15:37 -0400
Organization: Excite@Home - The Leader in Broadband
References: 00-07-064 00-08-022 00-08-026
Keywords: symbols

Preston Briggs <preston@tera.com> wrote in message
> The moderator wrote:
> [Perfect hashing is sometimes useful for a fixed table of keywords, but
> you can't use it for a normal symbol table. -John]
>
> Indeed, I wouldn't use it for keywords.
> Consider the typical scanner.
>
> a) Recognize that we have some sort of keyword or identifier
> b) See if it's a keyword
> c) If not, look it up in the symbol table


Step a) is done by the lexical analyser, and if the token class
'identifier' excludes reserved keywords, or 'identifier's which are
also keywords are lexed as keywords, then the job of recognizing
keywords distinct from identifiers is complete in step a. This
eliminates step b, and avoids a perfect hasher too.=20


If you use a good lexical analyser, there should be no extra cost to
include keywords in the lexer (other than a modest increase in table
size). I have found the Cocktail Toolkit lexer generator 'Rex' with
its 'tunnel automaton' to be particularly good for this, and also
're2c' which generates hard-coded lexers. 'Coco' is moderately good
(recognizing with hard-coded tries), and I believe, lex and flex may
be not quite as good at this sort of disambiguation, but I haven't
used them for a long time.


In my experience, keyword identification with perfect hash lookup
or any secondary lookup has not been worthwhile for parsers of
general programming languages, and the three-step procedure a-c is
the easy way out when hand coding lexers: hand-coding an optimal
keyword recognizer in conjunction with recognizing identifiers is just =
too much code.


- pzv


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.