Re: Looking for symbol table hashing algorithm

buzzard@world.std.com (Sean T Barrett)
13 Jul 1998 23:47:19 -0400

          From comp.compilers

Related articles
[11 earlier articles]
Re: Looking for symbol table hashing algorithm smith@aragorn.uni-trier.de (1998-07-10)
Re: Looking for symbol table hashing algorithm scott@basis.com (1998-07-10)
Re: Looking for symbol table hashing algorithm henry@spsystems.net (1998-07-10)
Re: Looking for symbol table hashing algorithm chase@naturalbridge.com (David Chase) (1998-07-10)
Re: Looking for symbol table hashing algorithm dietz@interaccess.com (Paul Dietz) (1998-07-11)
Re: Looking for symbol table hashing algorithm scottdaniels@earthlink.net (Scott David Daniels) (1998-07-13)
Re: Looking for symbol table hashing algorithm buzzard@world.std.com (1998-07-13)
Re: Looking for symbol table hashing algorithm peter.r.wilson@boeing.com (Peter Wilson) (1998-07-20)
| List of all articles for this month |

From: buzzard@world.std.com (Sean T Barrett)
Newsgroups: comp.compilers
Date: 13 Jul 1998 23:47:19 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 98-07-030 98-07-045 98-07-084
Keywords: symbols

David Chase <chase@naturalbridge.com> wrote:
>Contrast this with balanced trees. If you add, say, 1000 elements to
>a balanced binary tree, you'll perform about 10 comparisons per probe.
>Those comparisons aren't cheap, especially with machine- generated
>identifiers (naturalbridge_aaaaa_01, naturalbridge_aaaaa_02, etc).


Or, compare it with traditional hashing, wherein however many probes
you'll make, you'll make that many comparisons. My pet peeve these
days is the speed of certain linkers (when they fail to be
incremental) that I work with. I wouldn't be surprised if the 200+
character name-mangled names we get from C++ are not quite the input
the linker's symbol table code was intended to support.


The latest version of "Algorithms in C" offers a data structure called
the "Ternary Search Tree", which searches down the letters of a symbol
progressively, and looks relatively straightforward to implement, but
I've never tried it, and it's new enough that I've never heard anyone
else mention trying it. It cites this paper:


J. Bentley and R. Sedgewick, "Sorting and Searching Strings", Eighth
Symposium on Discrete Algorithms, New Orleans, January, 1997.


If I don't misunderstand, it's effectively a tree of trees (of trees
of...); the initial BST searches the first character of the input, and
on an equality comparison, a third branch of the tree leads to a new
tree for searching on the next character. Complexity for comparing a
given character occurs only where there are other keys which deviate
at exactly that point in the string.


I can't really tell yet, but it seems like it might be "the" data
structure for symbol tables which support very long symbols. (On the
other hand, its biggest advantage over hashing (if you record the full
hash value) seems to be for -misses-, since it doesn't require
examining every character of the input.)


Sean
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.