2 Oct 2004 16:21:19 -0400

Related articles |
---|

Hash tables victor@eijkhout.net (2004-10-02) |

Re: Hash tables nmm1@cus.cam.ac.uk (2004-10-02) |

Re: Hash tables cc-news@hermes.mirlex.com (C) (2004-10-02) |

Re: Hash tables vbdis@aol.com (2004-10-04) |

Re: Hash tables slimick@venango.upb.pitt.edu (John Slimick) (2004-10-04) |

From: | nmm1@cus.cam.ac.uk (Nick Maclaren) |

Newsgroups: | comp.compilers |

Date: | 2 Oct 2004 16:21:19 -0400 |

Organization: | University of Cambridge, England |

References: | 04-10-011 |

Keywords: | storage |

Posted-Date: | 02 Oct 2004 16:21:19 EDT |

Victor Eijkhout <victor@eijkhout.net> wrote:

*>Knuth (AoCP 6.4) more or less waves away the `open hashing' strategy,*

*>where you resolve conflicts by allocating a linked list outside the hash*

*>table. He devotes almost all of that section to `closed hashing',*

*>investigating linear probing and chaining as ways of storing the*

*>conflicting items elsewhere in the table.*

Boggle. If so, that's bonkers. I assume that 'open' hashing is

traditional hashing, where each entry starts a chain of elements with

the same hash value, and 'closed' hashing is as you describe.

[ I went back and read section 6.4 of Knuth, and he covers both

open (chained) and closed (rehashed) algorithms in considerable

detail. As someone notes in another message in this thread. the

algorithms for closed hashing are more complex so the analysis

is longer. -John]

There is another variant of 'open' hashing, where each entry is a

single pointer, and there is a single, separate overflow chain. That

is the method that gives the fastest O(N) sort (see below).

*>Personally, I don't see the problem with open hashing.*

There isn't one, and there are certain uses where it is FAR better.

Three points here:

1) The statistical analysis of closed hashing is FAR more complex,

and requires a knowledge of Brownian bridges rather than just the

simple Poisson distribution. And doing it for imperfect hashing

(often necessary) is beyond most statisticians.

2) It is common to have data where there are multiple entries for

each hash value, which are chained together. That mandates a form of

open hashing.

3) When investigating true O(N) sorts for data from known

probability distributions, the fastest was an open hash method. The

second fastest was a Brownian bridge model - as far as I know, I

invented that and it has never been published.

*>So, was closed hashing born of necessity in the days when memory was*

*>at a premium and you needed to keep track of every last byte*

*>yourself?*

No. Open hashing predated close hashing. When I first saw closed

hashing, it was being proposed by theoreticians, and I thought that it

was a nice trick if they could manage it. When I asked about the

statistical issues, it was clear that they didn't understand them. I

don't think that much has changed.

Regards,

Nick Maclaren.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.