Re: Data structures for symbol tables

sun!rtech!llama!daveb@ucbvax.Berkeley.EDU (Crack? No thanks, I've got a new CD player)
31 Mar 88 05:11:25 GMT

          From comp.compilers

Related articles
Data structures for symbol tables webber@athos.rutgers.edu (1988-03-28)
Re: Data structures for symbol tables sun!rtech!llama!daveb@ucbvax.Berkeley.EDU (1988-03-31)
| List of all articles for this month |

From: sun!rtech!llama!daveb@ucbvax.Berkeley.EDU (Crack? No thanks, I've got a new CD player)
Newsgroups: comp.compilers
Date: 31 Mar 88 05:11:25 GMT
References: <911@ima.ISC.COM> <914@ima.ISC.COM> <917@ima.ISC.COM> <929@ima.ISC.COM> <930@ima.ISC.COM>
Organization: Relational Technology, Inc. Alameda, CA

In article <930@ima.ISC.COM> webber@athos.rutgers.edu writes:
>In article <929@ima.ISC.COM>, beres@cadnetix.COM (Tim Beres) writes
>some stuff to which the moderator appends:
>
>> [It's hard to believe that storing a symbol table in a B-tree is worth the
>> effort; B-trees make sense for disk files since looking at a page is
>> relatively expensive, but for an in-core symbol table the O(1) lookup time
>> of hashing should be better. It's also a lot easier to program. -John]
>
>Depends on how large your symbol table is. It is easy to imagine
>B-trees out performing hash tables if the b-tree nodes are near page
>size. Remember, virtual memory is on a disk too. Given the variety
>of computer architectures, hash functions, ways of handling hash
>collisions, identifier usage patterns among languages, and programming
>goals...


I'd also like to question the presumed superiority of hashing for symbol
table functions. Even when in-memory the constant factors can outweigh
the O(1) vs O(log N) theoretical advantage.


[ Anecdotal ]


An application I was working on was using a hash table to hold around
30k entries. With instrumentation and profiling, I found long hash
chains *on average*, caused by a not very randomizing hash algorithm.
The profile showed most of the time being spent in strcmps down the
chains, and in the hash function proper. Changing the hash function for
better distribution slowed things down, because the time spent doing a
better hash was greater than the cost of doing the strcmps.


I replaced the hash tables with splay trees, and found the performance
was better, mostly because I was not doing the expensive hash function,
and effectively went strcmping right down the chain, whose average
length turned out to be about the same as the collision chain hanging
off the hash buckets before. As a side effect, it gives you a nicely
ordered scan should that be needed.


The moral is that good performance in a hash table depends a lot on:


* The distribution of the hashing function.
* The cost of the hashing function.
* The cost of searching a collision chain


Few people bother to instrument and/or tune the hash for the best
distribution, and this can easily wipe out whatever theoretical
advantage may be present.


Those interested in the code for a splay tree library suitable for
symbol table use will find it in comp.sources.unix in a few weeks, wind
and weather permitting. I'll mail it to anyone who is in a rush.


-dB
{amdahl, cpsc6a, mtxinu, ptsfa, sun, hoptoad}!rtech!daveb daveb@rtech.uucp
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.