Re: Size of hash tables was Re: Masters course ...

bonzini@gnu.org (Paolo Bonzini)
4 Jan 2003 22:47:23 -0500

          From comp.compilers

Related articles
Masters course with compiler specialization jeremy.wright@microfocus.com (Jeremy Wright) (2002-11-12)
Re: Masters course with compiler specialization Trevor.Jenkins@suneidesis.com (2002-12-11)
Re: Masters course with compiler specialization etechweb@yahoo.com (2002-12-19)
Size of hash tables was Re: Masters course with compiler specializat Trevor.Jenkins@suneidesis.com (2002-12-22)
Re: Size of hash tables was Re: Masters course ... joachim_d@gmx.de (Joachim Durchholz) (2002-12-30)
Re: Size of hash tables was Re: Masters course ... neeri@iis.ee.ethz.ch (Matthias Neeracher) (2003-01-04)
Re: Size of hash tables was Re: Masters course ... bonzini@gnu.org (2003-01-04)
Re: Size of hash tables was Re: Masters course ... stephan@stack.nl (Stephan Eggermont) (2003-01-07)
| List of all articles for this month |
From: bonzini@gnu.org (Paolo Bonzini)
Newsgroups: comp.compilers
Date: 4 Jan 2003 22:47:23 -0500
Organization: http://groups.google.com/
References: 02-11-060 02-12-056 02-12-092 02-12-107 02-12-127
Keywords: symbols, performance
Posted-Date: 04 Jan 2003 22:47:23 EST

> > Since the publication of Maurer's paper "An improved hash code for
> > scatter storage" in the Comm of the ACM (vol 11, Jan 1968, pp 35--38)
> > it is taken as gospel that hash tables only work if the size is a
> > prime number.
>
> Not "only work". Just "distribute their keys in a more random fashion,
> assuming you don't have a priori knowledge about key distribution".


You can always design a good-enough hash function so that the
distribution of the hash-function does not require the additional
bit-shuffling that a modulo-prime operation does. Considering strings
as keys, you can say that prime sizes are only required when a
character only affects a few specific and clustered bits, instead of
having some chance of affecting every bit.


> Particularly on modern hardware, where division and bit masking have
> roughly the same execution cost. Could anybody clarify?


Only roughly (and only if you can parallelize things well -- a
division has some cycles of latency, while you can do several
bitmaskings in the same clock cycle). Why do things roughly if it's
simple to do them better? :-)


You can also do double-hashing with power-of-two hash tables, only
make sure that you set the low bit of the secondary hash. e.g. size is
8, primary hash is 3, secondary hash is 4 --> does not work, you only
examine two items; but if secondary hash is 5, you examine eight
buckets (in this order: 3, 0, 5, 2, 7, 4, 1, 6).


Paolo


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.