Re: Hash specifics

rfg@ncd.com (Ron Guilmette)
13 Dec 90 05:55:44 GMT

          From comp.compilers

Related articles
[2 earlier articles]
Re: Hash specifics chased@Eng.Sun.COM (1990-12-11)
Re: Hash specifics pardo@cs.washington.edu (1990-12-11)
Re: Hash specifics henry@zoo.toronto.edu (1990-12-12)
Re: Hash specifics baxter@zola.ICS.UCI.EDU (Ira Baxter) (1990-12-13)
Re: Hash specifics lupine!rfg@uunet.UU.NET (1990-12-13)
Re: Hash specifics rfg@ncd.com (1990-12-13)
Re: Hash specifics rfg@ncd.com (1990-12-13)
Re: Hash specifics bart@cs.uoregon.edu (1990-12-13)
Re: Hash specifics roth@resi.rice.edu (1990-12-14)
Re: Hash specifics oz@nexus.yorku.ca (1990-12-14)
Re: Hash specifics plx!plxsun!evan@Sun.COM (1990-12-15)
Re: Hash specifics vern@horse.ee.lbl.gov (1990-12-16)
Re: Hash specifics schmidt@oberkampf.ICS.UCI.EDU (Doug Schmidt) (1990-12-17)
[3 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: rfg@ncd.com (Ron Guilmette)
Keywords: design
Organization: Network Computing Devices, Inc., Mt. View, CA
References: <9012111913.AA07310@cs-sun-fsa.cpsc.ucalgary.ca>
Date: 13 Dec 90 05:55:44 GMT

In article <9012111913.AA07310@cs-sun-fsa.cpsc.ucalgary.ca> degroot@cpsc.ucalgary.ca (Adriaan Degroot) writes:
>How does one build a hash function with minimal table size and minimal
>collisions for a known set of hash strings?


As others have already noted, Doug Schmidt wrote a cute program called
gperf (in C++) which is useful for finding perfect hash functions.


Among the possible command line arguments for Doug's program are a set of
options which allow you to specify (for instance) the particular character
positions from the input (search key) strings which you want to
participate in the hashing. So (for instance) you can tell the program
only to hash characters 1 thru 3 and the last 2 characters of each string.


The unfortunate part (and this is something that I ragged on Doug about at
length) is that the program is not setup to automatically try various
possibilities for the `hash bytes subset' option(s). In other words, you
may tell the program to go and look for a perfect hash function for a
certain input (key) set and a certain table size and a certain `hash bytes
subset' and it may simply come back and tell you that it ain't possible.


Also, the program only uses one particular (simple & fast) hash function.
I can't reacll exactly what it does, but that doesn't matter for the point
I'm making. The point is this. Let's say that the hash function tries
simply XOR'ing the bytes together, and then masking off the low bits to
get the hash key when done. Well, gperf will never tell you that you
might have been able to get a faster hash function (or even one that
actually works given your stated table size constraints) if (for example)
the hash function had simply *ADDED* the byte values together rather that
XOR'ing. In other words, gperf explores only a small area within the
(multidimensional?) space of all possible (or all useful) hashing
functions.


That's not to say that it isn't a nice an/or useful tool. It is both.


--


// Ron Guilmette - C++ Entomologist
// Internet: rfg@ncd.com uucp: ...uunet!lupine!rfg
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.