Re: Hash specifics

preston@ariel.rice.edu (Preston Briggs)
Mon, 17 Dec 90 15:58:58 GMT

          From comp.compilers

Related articles
[9 earlier articles]
Re: Hash specifics bart@cs.uoregon.edu (1990-12-13)
Re: Hash specifics roth@resi.rice.edu (1990-12-14)
Re: Hash specifics oz@nexus.yorku.ca (1990-12-14)
Re: Hash specifics plx!plxsun!evan@Sun.COM (1990-12-15)
Re: Hash specifics vern@horse.ee.lbl.gov (1990-12-16)
Re: Hash specifics schmidt@oberkampf.ICS.UCI.EDU (Doug Schmidt) (1990-12-17)
Re: Hash specifics preston@ariel.rice.edu (1990-12-17)
Re: Hash specifics brain@eos.ncsu.edu (1990-12-18)
Re: Hash specifics rfpfeifl@watcgl.uwaterloo.ca (Ron Pfeifle) (1990-12-21)
| List of all articles for this month |

Newsgroups: comp.compilers
From: preston@ariel.rice.edu (Preston Briggs)
Keywords: design
Organization: Rice University, Houston
References: <1990Dec12.221425.569@zoo.toronto.edu> <2980@lupine.NCD.COM> <EVAN.90Dec15155336@plxsun.uucp>
Date: Mon, 17 Dec 90 15:58:58 GMT

plx!plxsun!evan@Sun.COM (Evan Bigall) writes:


>The time when I always find myself looking for a perfect hash function, is
>when I am parsing long keyword options (ie: -ansi etc...) or some other case
>when I have a small set of possibilities but want to avoid using:
>
> if (!strcmp("hello", arg))
> {}
> else if (!strcmp("there", arg))
> {}
> else if (!strcmp("world", arg))


Another poster (Ira Baxter?) mentioned how to handle these efficiently
in a typical compiler-like case. I'll expand a little.


Since we're building a symbol table anyway, we're going to need to hash every
identifier anyway. I use one hash table to hash keywords, identifiers, and
strings as part of my scanner. Each time a new id or string is encountered,
I allocate a new integer for it. If the integer is < some small number, an
id is actually a keyword.


The rest of the front-end simply looks at integers which serve as indices
into a string table. Makes it easy to switch on "strings" when all are
reduced to integers.


This works especially well with Knuth's WEB system. WEB has a notion of
preprocessed-strings. I.e.
"a string"


When Tangling, a string table is maintained and all the strings are replaced
in the output text with integers. Further, the string table is written to a
pool file, suitable for initializing your compiler's hash table (or TeX or
whatever).


Pre-processed strings are also a nice way to handle error messages and so
forth, potentially saving much storage.


LISPers will wonder what the fuss is about, as they use symbols as a matter
of course.


Preston Briggs
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.