22 Dec 2002 10:40:18 -0500

Related articles |
---|

Masters course with compiler specialization jeremy.wright@microfocus.com (Jeremy Wright) (2002-11-12) |

Re: Masters course with compiler specialization Trevor.Jenkins@suneidesis.com (2002-12-11) |

Re: Masters course with compiler specialization etechweb@yahoo.com (2002-12-19) |

Size of hash tables was Re: Masters course with compiler specializat Trevor.Jenkins@suneidesis.com (2002-12-22) |

Re: Size of hash tables was Re: Masters course ... joachim_d@gmx.de (Joachim Durchholz) (2002-12-30) |

Re: Size of hash tables was Re: Masters course ... neeri@iis.ee.ethz.ch (Matthias Neeracher) (2003-01-04) |

Re: Size of hash tables was Re: Masters course ... bonzini@gnu.org (2003-01-04) |

Re: Size of hash tables was Re: Masters course ... stephan@stack.nl (Stephan Eggermont) (2003-01-07) |

From: | Trevor.Jenkins@suneidesis.com (Trevor Jenkins) |

Newsgroups: | comp.compilers |

Date: | 22 Dec 2002 10:40:18 -0500 |

Organization: | suneidesis |

References: | 02-11-060 02-12-056 02-12-092 |

Keywords: | symbols, practice |

On 19 Dec 2002 12:41:25 -0500, Sebastiano Pilla <etechweb@yahoo.com> wrote:

*> Trevor Jenkins <Trevor.Jenkins@suneidesis.com> wrote:*

*>*

*> > If Hopgood continues to teach this module then you'll at least come*

*> > out with the correct view that hash-tables based on the powers of 2*

*> > are better than Maurer's gospel that such tables can only ever be*

*> > absed on prime numbers.*

*>*

*> Could you please expand on that? I don't think I'll have the opportunity*

*> to attend Hopgood's course.*

Since the publication of Maurer's paper "An improved hash code for

scatter storage" in the Comm of the ACM (vol 11, Jan 1968, pp 35--38)

it is taken as gospel that hash tables only work if the size is a

prime number. Ignoring for the moment any attempt to change the size

of a hash table the MODULO will require a divide operation. Divides

are (at least were) expensive operations to perform in hardware. For

the not so subtle influenceof this aper check out the Dragon books

where Aho/Ullman and Aho/Sethi/Ullman only mention the use of prime

numbers for the size of a hash table.

An alternative to prime numbers is to use powers of 2, which if I

remember Hopgood's lectures correctly were used in the Fortran II

compiler. Already this has become effecient. Why? The divide operation

can be replaced by the faster boolean AND operation. Plus there is no

serious penalty for increaing the size of the hash table. To increase

the size of a hash table using powers of 2 requires setting the next

bit along whereas with the Maurer doctrine you'll have to compute the

next prime number in sequence.

There are other benefits of using powers of two. Carefully planning

can make sure that the hash table starts on a page bondary and there

after an integral number of pages can be processed. Rehashing the

tabel will be faster for the same reason that inserting an individual

entry is faster.

For more information on Hopgood's approach see:

"The quadratic hash method when the table size is a power of 2"

Hopgood and Davenport, Computer Journal, vol 15, nov 1972, pp314--315

I do remember Hopgood saying that he wrtote the editor of Comm of the

ACM about Maurer's paper challenging the validity of the content. A

letter that so far remains unprinted --- even when Maurer's was

reprinted in a annivarsary issue of Comm of the ACM.

Regards, Trevor

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.