Re: Generating hash tables from known input (I think)

rv@erix.ericsson.se (Robert Virding)
Mon, 24 May 1993 15:55:24 GMT

          From comp.compilers

Related articles
Generating hash tables from known input (I think) rv@erix.ericsson.se (1993-05-10)
Re: Generating hash tables from known input (I think) leichter@zodiac.rutgers.edu (1993-05-11)
Re: Generating hash tables from known input (I think) rv@erix.ericsson.se (1993-05-24)
| List of all articles for this month |

Newsgroups: comp.compilers
From: rv@erix.ericsson.se (Robert Virding)
Keywords: design
Nntp-Posting-User: rv
Organization: Ellemtel Telecom Systems Labs, Stockholm, Sweden
References: 93-05-049
Date: Mon, 24 May 1993 15:55:24 GMT

rv@erix.ericsson.se (Robert Virding) writes:
[Long request about generating static hash tables]


Thanks for all the replies to my original request. After reading these,
talking to people and doing some experiments I have the following
observations:


1) Perfect hash functions (PHF) are very temptimg but I do have doubts.


2) PHF working on integer keys. The references I have found on these all
generate minimal hash functions. While this definitely saves space the
operations needed at run time to generate a table index are typically 1
multiply + 1 divide + 1 modulo. For systems where it is fast to compare
objects (eg. there is an atom table) then this probably takes as much time
as the comparison.


3) PHF on character strings. Here again the operation of building a hash
key from some/all of the characters is probably comparable or greater than
the time needed to compare objects if this can be done quickly. Also, it
seems as if some of these algorithms may need to be tuned for each input
set of strings. This could be difficult to do automatically by the
compiler (I think, correct me if I am wrong).


4) Just taking the hash key modulo <table size> and handling collisions,
either by buckets or rehashing, seems perfectly reasonable. Statistically
the number of collisions is quite small even for tables only marginally
larger that the number of elements. I found, however, that *actual* number
varied considerably if you changed the size. Making it bigger will
sometimes make it much worse!


5) If the table size is a power of two then calculating the index is very
fast. But does this waste unnnecessary space?


6) If you are handling collisions by rehashing then the order in which the
objects are inserted into the table is significant. Is it possible to
optimise this?


Any comments? What have I missed? How do I chose an optimal table size if
I want to take the hash key modulo <table size>? Is a prime number better?
Is there a deterministic algorithm, or do I just test all values within my
selected range? Etc ...


Thanks again for all help, past and future,


Robert
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.