|[11 earlier articles]|
|Re: Looking for symbol table hashing algorithm email@example.com (1998-07-10)|
|Re: Looking for symbol table hashing algorithm firstname.lastname@example.org (1998-07-10)|
|Re: Looking for symbol table hashing algorithm email@example.com (1998-07-10)|
|Re: Looking for symbol table hashing algorithm firstname.lastname@example.org (David Chase) (1998-07-10)|
|Re: Looking for symbol table hashing algorithm email@example.com (Paul Dietz) (1998-07-11)|
|Re: Looking for symbol table hashing algorithm firstname.lastname@example.org (Scott David Daniels) (1998-07-13)|
|Re: Looking for symbol table hashing algorithm email@example.com (1998-07-13)|
|Re: Looking for symbol table hashing algorithm firstname.lastname@example.org (Peter Wilson) (1998-07-20)|
|From:||email@example.com (Sean T Barrett)|
|Date:||13 Jul 1998 23:47:19 -0400|
|Organization:||The World Public Access UNIX, Brookline, MA|
|References:||98-07-030 98-07-045 98-07-084|
David Chase <firstname.lastname@example.org> wrote:
>Contrast this with balanced trees. If you add, say, 1000 elements to
>a balanced binary tree, you'll perform about 10 comparisons per probe.
>Those comparisons aren't cheap, especially with machine- generated
>identifiers (naturalbridge_aaaaa_01, naturalbridge_aaaaa_02, etc).
Or, compare it with traditional hashing, wherein however many probes
you'll make, you'll make that many comparisons. My pet peeve these
days is the speed of certain linkers (when they fail to be
incremental) that I work with. I wouldn't be surprised if the 200+
character name-mangled names we get from C++ are not quite the input
the linker's symbol table code was intended to support.
The latest version of "Algorithms in C" offers a data structure called
the "Ternary Search Tree", which searches down the letters of a symbol
progressively, and looks relatively straightforward to implement, but
I've never tried it, and it's new enough that I've never heard anyone
else mention trying it. It cites this paper:
J. Bentley and R. Sedgewick, "Sorting and Searching Strings", Eighth
Symposium on Discrete Algorithms, New Orleans, January, 1997.
If I don't misunderstand, it's effectively a tree of trees (of trees
of...); the initial BST searches the first character of the input, and
on an equality comparison, a third branch of the tree leads to a new
tree for searching on the next character. Complexity for comparing a
given character occurs only where there are other keys which deviate
at exactly that point in the string.
I can't really tell yet, but it seems like it might be "the" data
structure for symbol tables which support very long symbols. (On the
other hand, its biggest advantage over hashing (if you record the full
hash value) seems to be for -misses-, since it doesn't require
examining every character of the input.)
Return to the
Search the comp.compilers archives again.