|[15 earlier articles]|
|Re: symbol tables and search tries firstname.lastname@example.org (Matthias Blume) (2006-01-31)|
|Re: symbol tables and search tries email@example.com (Mr.E) (2006-01-31)|
|Re: symbol tables and search tries firstname.lastname@example.org (Dmitry A. Kazakov) (2006-02-02)|
|Re: symbol tables and search tries email@example.com (glen herrmannsfeldt) (2006-02-02)|
|Re: symbol tables and search tries DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-02-02)|
|Re: symbol tables and search tries firstname.lastname@example.org (Matthias Blume) (2006-02-03)|
|Re: symbol tables and search tries email@example.com (Karsten Nyblad) (2006-02-03)|
|Re: symbol tables and search tries firstname.lastname@example.org (2006-02-03)|
|Re: symbol tables and search tries DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-02-03)|
|Re: symbol tables and search tries email@example.com (2006-02-03)|
|Re: symbol tables and search tries firstname.lastname@example.org (2006-02-06)|
|Re: symbol tables and search tries Suif.email@example.com (jjyynmb) (2006-02-24)|
|From:||Karsten Nyblad <firstname.lastname@example.org>|
|Date:||3 Feb 2006 18:38:30 -0500|
|References:||06-01-117 06-01-120 06-01-132 06-01-139 06-02-012|
Hans-Peter Diettrich wrote:
> Since every symbol name is extracted from a text file, could we get
> any improvement from doing the inevitable O(m) things already in the
> At least it would be possible to calculate an hash code once, for
> subsequent lookup in multiple symbol tables. For tree structures the
> penalty occurs with every table to search, so it might be a good idea
> to use a single global STB, possibly with a scope list added to every
> symbol entry?
> [Um, once you calculate the hash codes, what would you call the data
> structure you use to remember the hash codes and the related data for
> subsequent occurrences of the same string in the input text? -John]
It depends on how much control you have on symbol tables for
debugging, etc. whether or not Hans-Peter's idea can be used. Perhaps
you are already having a table of all spellings of identifiers. In
one compiler I was working on, we were using the offset into the table
of spellings of identifiers as to represent for the spelling.
> [Re multiple symbol tables, it seems to me there are fairly
> straightforward ways to avoid the cost of multiple lookups, starting
> by using the same string hash in each one. -John]
That would force you to have all tables of one fixed size, and that
size can not be that big considering that a compiler should be capable
of handling thousands of scopes.
A normal hash function consists of two parts: First a pseudo random
number generator that takes the string to a 32-bit number, and
secondly the reminder of that number divided by the size of the hash
table is calculated. When looking up in differently sized tables you
may as well use the same pseudo random generator for all the hash
functions. Then there is no need for recalculating the pseudo random
over and over again. All you need to do is to calculate the pseudo
random number once. Then you calculate the reminder for each table.
Many people make sure their table size is a prime number, but there is
only so much to gain if your pseudo random number is evenly
distributed over the integers. You can use table sizes that are
powers of 2, and then the reminder operation is an AND operation.
On insertion you may also put the pseudo random number into the table
and later on when looking up start by comparing the random number of
the string being looked up with the random number in the entry in the
table. Of course that also speed up resizing the table because you do
not need to recalculate all the pseudo random numbers.
[Yes, that's what I meant, same PRNG, then divide down to the table
Return to the
Search the comp.compilers archives again.