Re: Looking for symbol table hashing algorithm

zs@cs.mu.OZ.AU (Zoltan Somogyi)
5 Jul 1998 01:13:24 -0400

          From comp.compilers

Related articles
Looking for symbol table hashing algorithm wal@cosc.canterbury.ac.nz (Warwick Irwin) (1998-07-03)
Re: Looking for symbol table hashing algorithm dwight@pentasoft.com (1998-07-03)
Re: Looking for symbol table hashing algorithm drunelson@my-dejanews.com (1998-07-03)
Re: Looking for symbol table hashing algorithm jmccarty@sun1307.spd.dsccc.com (1998-07-03)
Re: Looking for symbol table hashing algorithm zs@cs.mu.OZ.AU (1998-07-05)
Re: Looking for symbol table hashing algorithm chase@naturalbridge.com (David Chase) (1998-07-05)
Re: Looking for symbol table hashing algorithm dietz@interaccess.com (Paul Dietz) (1998-07-05)
Re: Looking for symbol table hashing algorithm khays@sequent.com (1998-07-05)
Re: Looking for symbol table hashing algorithm eodell@pobox.com (1998-07-08)
Re: Looking for symbol table hashing algorithm preston@tera.com (1998-07-08)
Re: Looking for symbol table hashing algorithm drh@microsoft.com (Dave Hanson) (1998-07-08)
[8 later articles]
| List of all articles for this month |

From: zs@cs.mu.OZ.AU (Zoltan Somogyi)
Newsgroups: comp.compilers
Date: 5 Jul 1998 01:13:24 -0400
Organization: Computer Science, The University of Melbourne
References: 98-07-030
Keywords: symbols

>[My impression is that so long as you avoid obviously horrible hashes
>that do things like making all single letter symbols collide, it
>doesn't make much difference. -John]


That would be true, except that the set of "obviously horrible" hashes
is quite large. Real data sets exhibits several kinds of similarity
between strings to be hashed, and designing an algorithm that
successfully deals with *all* these similarities is a non-trivial
exercise.


A paper that discusses these issues is:


Performance in practice of string hashing functions'',
M.V. Ramakrishna and J. Zobel, Proceedings of the Fifth International
Conference on Database Systems for Advanced Applications, Melbourne,
Australia, April 1997, pp. 215--223.


Its abstract reads:


String hashing is a fundamental operation, used in countless
applications where fast access to distinct strings is required. In
this paper we describe a class of string hashing functions and explore
its performance. In particular, using experiments with both small sets
of keys and a large key set from a text database, we show that it is
possible to achieve performance close to that theoretically predicted
for hashing functions. We also consider criteria for choosing a
hashing function and use them to compare our class of functions to
other methods for string hashing. These results show that our class
of hashing functions is reliable and efficient, and is therefore an
appropriate choice for general-purpose hashing.


Zoltan Somogyi <zs@cs.mu.OZ.AU> http://www.cs.mu.oz.au/~zs/
Department of Computer Science, University of Melbourne, AUSTRALIA
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.