13 Dec 90 06:25:10 GMT

Related articles |
---|

Hash specifics degroot@cpsc.ucalgary.ca (1990-12-11) |

Re: Hash specifics johnl@iecc.cambridge.ma.us (1990-12-11) |

Re: Hash specifics chased@Eng.Sun.COM (1990-12-11) |

Re: Hash specifics pardo@cs.washington.edu (1990-12-11) |

Re: Hash specifics henry@zoo.toronto.edu (1990-12-12) |

Re: Hash specifics baxter@zola.ICS.UCI.EDU (Ira Baxter) (1990-12-13) |

Re: Hash specifics lupine!rfg@uunet.UU.NET (1990-12-13) |

Re: Hash specifics rfg@ncd.com (1990-12-13) |

Re: Hash specifics rfg@ncd.com (1990-12-13) |

Re: Hash specifics bart@cs.uoregon.edu (1990-12-13) |

Re: Hash specifics roth@resi.rice.edu (1990-12-14) |

Re: Hash specifics oz@nexus.yorku.ca (1990-12-14) |

Re: Hash specifics plx!plxsun!evan@Sun.COM (1990-12-15) |

[5 later articles] |

Newsgroups: | comp.compilers |

From: | lupine!rfg@uunet.UU.NET (Ron Guilmette) |

Keywords: | design |

Organization: | Network Computing Devices, Inc., Mt. View, CA |

References: | <9012112133.AA00452@rbbb.Eng.Sun.COM> |

Date: | 13 Dec 90 06:25:10 GMT |

In article <9012112133.AA00452@rbbb.Eng.Sun.COM> chased@Eng.Sun.COM (David Chase) writes:

+CACM 33:6 (June 1990) "Fast Hashing of Variable Length Strings"

+by Peter K. Pearson.

+

+He discusses use of hash functions of the form:

+

+ h = 0;

+ n = length(string);

+

+ for i = 1 through n do

+ h = Table [ h XOR string [i]];

+ enddo;

+

+ return h;

Some time back, I wrote a program which used the following hash function

to try (for a given input set of input strings) to find an "efficient"

hash function for the given set (and some particular table size).

I defined "efficiency" somewhat arbitrarily as the number members of the

input (key) set divided by the number of "probes" (or "key comparisons")

which would be needed to find each and every member of the input set in

the hash table exactly once. (Note that I was assuming that the hash

table in question would be the kind used in compilers & assemblers; i.e.

one which would be fully "precomputed" ahead of time and used only for

lookups, and NEVER for ADDS or DELETES.)

static unsigned hash (const char *s, unsigned prime)

{

unsigned ret_val = 0;

for (; *s; s++)

ret_val = (*s ^ ret_val) * prime;

return ret_val & hash_mask;

*}*

This hash function takes a second parameter `prime' which is used to help

produce `well mangled' output keys from the input keys (i.e. strings).

Knuth documents that fact that using prime numbers in hash functions is

known to tend to make these functions more "efficient" (i.e. it tends to

make the output keys more evenly distributed over the range).

(Note also that this function assumes that `hash_mask' is some constant

defined earlier on. Typically, it has a value like 2**N-1, where N is

some integer. The `primary' hash table size is 2**N.)

The program I wrote tried various values for `prime' up to some fixed

limit (i.e. 8k). I ran the program for various sets of input strings

(both large and small) and for various table sizes.

The program reported the "efficiency" rating of the primes tested.

Curiously, the number 47 always rated in the top 10 regardless of the set

of input keys used or the primary table size used. I have no explanation

for this.

Moral of the story: If you need a reasonably efficient hashing function

for some arbitrary set of strings (e.g. identifiers or keywords), and if

you need it fast, use the function shown above and substitute `47' for

`prime'.

P.S. The above code is not copyrighted, but its `look & feel' may be. :-)

--

// Ron Guilmette - C++ Entomologist

// Internet: rfg@ncd.com uucp: ...uunet!lupine!rfg

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.