Re: Hash tables

nmm1@cus.cam.ac.uk (Nick Maclaren)
2 Oct 2004 16:21:19 -0400

          From comp.compilers

Related articles
Hash tables victor@eijkhout.net (2004-10-02)
Re: Hash tables nmm1@cus.cam.ac.uk (2004-10-02)
Re: Hash tables cc-news@hermes.mirlex.com (C) (2004-10-02)
Re: Hash tables vbdis@aol.com (2004-10-04)
Re: Hash tables slimick@venango.upb.pitt.edu (John Slimick) (2004-10-04)
| List of all articles for this month |

From: nmm1@cus.cam.ac.uk (Nick Maclaren)
Newsgroups: comp.compilers
Date: 2 Oct 2004 16:21:19 -0400
Organization: University of Cambridge, England
References: 04-10-011
Keywords: storage
Posted-Date: 02 Oct 2004 16:21:19 EDT

Victor Eijkhout <victor@eijkhout.net> wrote:
>Knuth (AoCP 6.4) more or less waves away the `open hashing' strategy,
>where you resolve conflicts by allocating a linked list outside the hash
>table. He devotes almost all of that section to `closed hashing',
>investigating linear probing and chaining as ways of storing the
>conflicting items elsewhere in the table.


Boggle. If so, that's bonkers. I assume that 'open' hashing is
traditional hashing, where each entry starts a chain of elements with
the same hash value, and 'closed' hashing is as you describe.


[ I went back and read section 6.4 of Knuth, and he covers both
open (chained) and closed (rehashed) algorithms in considerable
detail. As someone notes in another message in this thread. the
algorithms for closed hashing are more complex so the analysis
is longer. -John]


There is another variant of 'open' hashing, where each entry is a
single pointer, and there is a single, separate overflow chain. That
is the method that gives the fastest O(N) sort (see below).


>Personally, I don't see the problem with open hashing.


There isn't one, and there are certain uses where it is FAR better.
Three points here:


        1) The statistical analysis of closed hashing is FAR more complex,
and requires a knowledge of Brownian bridges rather than just the
simple Poisson distribution. And doing it for imperfect hashing
(often necessary) is beyond most statisticians.


        2) It is common to have data where there are multiple entries for
each hash value, which are chained together. That mandates a form of
open hashing.


        3) When investigating true O(N) sorts for data from known
probability distributions, the fastest was an open hash method. The
second fastest was a Brownian bridge model - as far as I know, I
invented that and it has never been published.


>So, was closed hashing born of necessity in the days when memory was
>at a premium and you needed to keep track of every last byte
>yourself?


No. Open hashing predated close hashing. When I first saw closed
hashing, it was being proposed by theoreticians, and I thought that it
was a nice trick if they could manage it. When I asked about the
statistical issues, it was clear that they didn't understand them. I
don't think that much has changed.


Regards,
Nick Maclaren.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.