From: | khays@sequent.com (Kirk Hays) |
Newsgroups: | comp.compilers |
Date: | 5 Jul 1998 21:35:21 -0400 |
Organization: | Sequent Computer Systems, Inc. |
References: | 98-07-030 98-07-042 |
Keywords: | symbols |
Zoltan Somogyi <zs@cs.mu.OZ.AU> wrote:
>>[My impression is that so long as you avoid obviously horrible hashes
>>that do things like making all single letter symbols collide, it
>>doesn't make much difference. -John]
>
>That would be true, except that the set of "obviously horrible" hashes
>is quite large. Real data sets exhibits several kinds of similarity
>between strings to be hashed, and designing an algorithm that
>successfully deals with *all* these similarities is a non-trivial
>exercise.
I'll put in a plug for using AVL trees (or other balanced trees) at
this point - they don't show degenerate performance on any case, and
performance is good on real-world (and machine generated) codes, in my
experience.
In particular, having had to write compilers that had to deal with a
lot of machine-generated code, I will forevermore avoid hashing as a
technique where the input data is unknown.
For any given hash, it is possible to design a worst-case input.
Murphy states you will encounter that input.
--
Kirk Hays
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.