Re: Keyword implementation.

"Matthew J. Lockner" <lockner@chaos.cns.uni.edu>
1 Jul 2001 23:44:12 -0400

          From comp.compilers

Related articles
Keyword implemetation. tfjellstrom@home.com (Thomas Fjellstrom) (2001-06-28)
Re: Keyword implementation. lockner@chaos.cns.uni.edu (Matthew J. Lockner) (2001-07-01)
Re: Keyword implementation. acolvin@bbn.com (Alex Colvin) (2001-07-02)
| List of all articles for this month |

From: "Matthew J. Lockner" <lockner@chaos.cns.uni.edu>
Newsgroups: comp.compilers
Date: 1 Jul 2001 23:44:12 -0400
Organization: University of Northern Iowa
References: 01-06-079
Keywords: lex, comment
Posted-Date: 01 Jul 2001 23:44:12 EDT

If you're building keyword recognition into the lexer, then the
approach I prefer is to use GNU gperf
(http://www.gnu.org/software/gperf/gperf.html). Its advantages are
manyfold, but I enjoy the laziness of giving the tool a list of
keywords and having it spit out a testing function (identifier
is/isn't in the set).


gperf builds a "minimal perfect" hash function for you. The
documentation has considerable digression on exactly what this means,
but suffice it to say it is a very fast hash function. Of course, the
usual phrase "mileage may vary" most likely applies, and I think code
profiling would be the only good way to reveal whether bsearch can do
better for a given keyword set, etc. I would guess that gperf will
still be less work, though.


My two cents, anyway.


M. Lockner


Thomas Fjellstrom wrote:


> In a compiler/translator I've been writing I implemented
> the Reserved/Key words with a sorted table and searched
> using bsearch. I was wondering what every ones preference
> was for implementing keywords? using the symbol table,
> a method just like mine or something else all togeter?
[We discussed minimal perfect hashing at great length some years ago, and
I don't recall that perfect hash functions were all that fast. It seems
to me that on average a separate hash table for keywords will lose compared
to a single combined hash-indexed symbol table, since every non-keyword is
looked up twice with two hashes. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.