Re: symbol tables and search tries

Matthias Blume <blume@tti-c.org>
31 Jan 2006 21:22:30 -0500

          From comp.compilers

Related articles
[9 earlier articles]
Re: symbol tables and search tries henry@spsystems.net (2006-01-31)
Re: symbol tables and search tries find@my.address.elsewhere (Matthias Blume) (2006-01-31)
Re: symbol tables and search tries danwang74@gmail.com (Daniel C. Wang) (2006-01-31)
Re: symbol tables and search tries mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-01-31)
Re: symbol tables and search tries DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-01-31)
Re: symbol tables and search tries henry@spsystems.net (2006-01-31)
Re: symbol tables and search tries blume@tti-c.org (Matthias Blume) (2006-01-31)
Re: symbol tables and search tries mr.waverlye@verizon.net (Mr.E) (2006-01-31)
Re: symbol tables and search tries mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-02-02)
Re: symbol tables and search tries gah@ugcs.caltech.edu (glen herrmannsfeldt) (2006-02-02)
Re: symbol tables and search tries DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-02-02)
Re: symbol tables and search tries blume@tti-c.org (Matthias Blume) (2006-02-03)
Re: symbol tables and search tries d148f3wg02@sneakemail.com (Karsten Nyblad) (2006-02-03)
[5 later articles]
| List of all articles for this month |

From: Matthias Blume <blume@tti-c.org>
Newsgroups: comp.compilers
Date: 31 Jan 2006 21:22:30 -0500
Organization: private
References: 06-01-085 06-01-111 06-01-117 06-01-125
Keywords: symbols, performance
Posted-Date: 31 Jan 2006 21:22:30 EST

haberg@math.su.se (Hans Aberg) writes:


> John Levine wrote:
>
>> I make my hash tables big enough that the chains are short, i.e.,
>> design so that k>=N so it's O(1). Hash headers need only be one word
>> pointers, so pick a number larger than the number of symbols you
>> expect in a large program, say, 10,000, and use that. So it adds 40K
>> to the size of the program, that's down in the noise these days.
>
> Even though one in a practical application can choose k large enough,
> hoping that somebody will not choose a larger N, from the theoretical
> point, when computing the complexity order, N is not limited. So, the
> adjustment of k must be taken into account when computing the
> complexity order. Then, by doubling k, one gets down to logarithmic
> time.


Hans, please, do the calculation! It does /not/ give you logarithmic
time, it gives you constant time (in the amortized sense).


Suppose, for simplicity, you set the load threshold to 1 (i.e., you
double the size when there are as many elements as there are buckets).
Clearly, access is O(1) on average (assuming a reasonable hash function
without major collisions). So let's look at insertions:


In the worst case, the very last insert just pushes the load over the
edge and causes another doubling. Thus, every element in the table
has suffered through at least one rebuild. Half the elements have
suffered through at least one additional rebuild, one quarter has
suffered through at least a third rebuild, and so on. This gives rise
to a sum of a geometric series:


      1 + 1/2 + 1/4 + ...


which is 2 in the limit. Thus, in the limit, each element has (on
average) suffered through 2 rebuilds. The time each rebuild takes is
constant per element that participates in that rebuild. Since each
element that is in the table must have been inserted at some point, we
charge the amortized constant amount of work due to rebuilds to that
insert operation. As a result, the time for one insert is constant on
average.


(In the best case, the last insert just fills the table without
causing another rebuild. In this case the series to be summed is


    1/2 + 1/4 + ... = 1


i.e., the average cost of insertions due to rebuilds is only half as big.
In either case, it is constant.)


> I also think one cannot do any better, because the best sorting
> algorithm, over all choices is O(N log N), so if one has better
> inserting complexity in a container than O(log N), one can beat that
> by merely insert elements one by one.


A hash table does not sort, so this line of reasoning is not
applicable.


Matthias


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.