Re: perfect hashing

pmk@cray.com (Peter Klausler)
13 Aug 2000 19:02:15 -0400

          From comp.compilers

Related articles
[4 earlier articles]
Re: perfect hashing jsgray@acm.org (Jan Gray) (2000-08-09)
Re: perfect hashing jmochel@foliage.com (2000-08-10)
Re: perfect hashing fjh@cs.mu.OZ.AU (2000-08-10)
Re: perfect hashing parz@home.com (Parzival) (2000-08-10)
Re: perfect hashing rick@home.com (2000-08-13)
Re: perfect hashing lars@bearnip.com (Lars Duening) (2000-08-13)
Re: perfect hashing pmk@cray.com (2000-08-13)
Re: perfect hashing tej@melbpc.org.au (Tim Josling) (2000-08-13)
Re: perfect hashing bob_jenkins@burtleburtle.net (2000-08-13)
Re: perfect hashing intmktg@Gloria.CAM.ORG (Marc Tardif) (2000-08-13)
| List of all articles for this month |

From: pmk@cray.com (Peter Klausler)
Newsgroups: comp.compilers
Date: 13 Aug 2000 19:02:15 -0400
Organization: Cray Inc.
References: 00-07-064 00-08-022 00-08-026 00-08-062
Keywords: symbols

>Preston Briggs <preston@tera.com> wrote in message
>> The moderator wrote:
>> [Perfect hashing is sometimes useful for a fixed table of keywords, but
>> you can't use it for a normal symbol table. -John]
>>
>> Indeed, I wouldn't use it for keywords.
>> Consider the typical scanner.
>>
>> a) Recognize that we have some sort of keyword or identifier
>> b) See if it's a keyword
>> c) If not, look it up in the symbol table
>
>Step a) is done by the lexical analyser, and if the token class
>'identifier' excludes reserved keywords, or 'identifier's which are
>also keywords are lexed as keywords, then the job of recognizing
>keywords distinct from identifiers is complete in step a. This
>eliminates step b, and avoids a perfect hasher too.=20


As may already have been pointed out by now in this discussion, a C
compiler with an integrated preprocessor has to do a symbol table
lookup on any token that looks like an identifier, in case that token
has been #define'd.


FWIW, the last compiler I built (a system generation C compiler for a
new instruction set architecture) uses these techniques to minimize
compilation (and development) time while generating good quality code:


a) slurping entire source files into memory
b) tokenizing each source file all at once into a buffer with a custom DFA
c) keeping the token buffers around and reusing them when the same
      header file is #include'd again
d) organizing the symbol table in tries
e) integrated preprocessor
f) recursive-descent parsing, with operator precedence for expressions
g) use of a single indexable triple table to hold intermediate code
h) static single-assignment form constructed immediately after code
      generation and maintained throughout the rest of compilation
i) register assignment of SSA form by optimistic graph coloring
j) direct generation of ELF relocatable output, no assembler needed


Profiling the *whole* compiler is important.
--
Peter Klausler (pmk at cray dot com)
Principal Engineer, SV2 project, Cray Inc.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.