Hashing for Keywords? (was: Minimal Perfect Hashing and Cichelli's algorithm)

p.froehlich@amc.cube.net (Peter Froehlich)
29 Apr 1996 23:31:21 -0400

          From comp.compilers

Related articles
Minimal Perfect Hashing and Cichelli's algorithm CSPT@giraffe.ru.ac.za (Pat Terry) (1996-04-10)
Re: Minimal Perfect Hashing and Cichelli's algorithm taj@vanbc.wimsey.com (1996-04-11)
Hashing for Keywords? (was: Minimal Perfect Hashing and Cichelli's alg p.froehlich@amc.cube.net (1996-04-29)
| List of all articles for this month |

From: p.froehlich@amc.cube.net (Peter Froehlich)
Newsgroups: comp.compilers
Date: 29 Apr 1996 23:31:21 -0400
Organization: AMIGA CITY - Public Amiga-BBS, Munich, Germany
References: 96-04-055 96-04-069
Keywords: symbols


taj@vanbc.wimsey.com (Taj Khattra) wrote:

> For "large" keyword sets (certainly larger than the keyword sets of most
> programming languages) it may be worthwhile to explore the algorithm
> described in [CHM92] which is a probabilistic method using random graphs
> that works very fast in practice. It also has the benefit of being order
> preserving.

      Reading through the mails leading up to the one quoted above I
found myself wondering whether this really is an issue at all when
dealing with keywords.

      In fact a lot of Oberon-2 compilers use the following approach
which is *very* simple and apparently not causing a "bottleneck" for
the compiler (Oberon-2 compilers are usually very fast compared to
other compilers, see [1] for an objective comparison):

      VAR buffer: ARRAY 256 OF CHAR; token: INTEGER;
      (* ... *)
      CASE buffer[0] OF
          (* ... *)
          "A": IF buffer = "ARRAY" THEN
                        token := array;
                        token := identifier;
        |"B": IF buffer = "BEGIN" THEN
                        token := begin;
                    ELSIF buffer = "BY" THEN
                        token := by;
                        token := identifier;
          (* ... *)
          token := identifier;

I can't guarantee for correct syntax here. :-/

      I understand the problem for syntax tables where a large number of
varying objects (identifiers) must be expected, but for keywords? Is
it worth the pain to deal with them in a more complex way than the one
outlined above?

[1] Henderson, Zorn: A Comparison of Four Object-Oriented Programming
Languages. Technical Report CU-CS-641-93, University of Colorado at
Boulder, 1993. (availiable through ftp)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.