31 Jan 2006 21:22:30 -0500

Related articles |
---|

[9 earlier articles] |

Re: symbol tables and search tries henry@spsystems.net (2006-01-31) |

Re: symbol tables and search tries find@my.address.elsewhere (Matthias Blume) (2006-01-31) |

Re: symbol tables and search tries danwang74@gmail.com (Daniel C. Wang) (2006-01-31) |

Re: symbol tables and search tries mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-01-31) |

Re: symbol tables and search tries DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-01-31) |

Re: symbol tables and search tries henry@spsystems.net (2006-01-31) |

Re: symbol tables and search tries blume@tti-c.org (Matthias Blume) (2006-01-31) |

Re: symbol tables and search tries mr.waverlye@verizon.net (Mr.E) (2006-01-31) |

Re: symbol tables and search tries mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-02-02) |

Re: symbol tables and search tries gah@ugcs.caltech.edu (glen herrmannsfeldt) (2006-02-02) |

Re: symbol tables and search tries DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-02-02) |

Re: symbol tables and search tries blume@tti-c.org (Matthias Blume) (2006-02-03) |

Re: symbol tables and search tries d148f3wg02@sneakemail.com (Karsten Nyblad) (2006-02-03) |

[5 later articles] |

From: | Matthias Blume <blume@tti-c.org> |

Newsgroups: | comp.compilers |

Date: | 31 Jan 2006 21:22:30 -0500 |

Organization: | private |

References: | 06-01-085 06-01-111 06-01-117 06-01-125 |

Keywords: | symbols, performance |

haberg@math.su.se (Hans Aberg) writes:

*> John Levine wrote:*

*>*

*>> I make my hash tables big enough that the chains are short, i.e.,*

*>> design so that k>=N so it's O(1). Hash headers need only be one word*

*>> pointers, so pick a number larger than the number of symbols you*

*>> expect in a large program, say, 10,000, and use that. So it adds 40K*

*>> to the size of the program, that's down in the noise these days.*

*>*

*> Even though one in a practical application can choose k large enough,*

*> hoping that somebody will not choose a larger N, from the theoretical*

*> point, when computing the complexity order, N is not limited. So, the*

*> adjustment of k must be taken into account when computing the*

*> complexity order. Then, by doubling k, one gets down to logarithmic*

*> time.*

Hans, please, do the calculation! It does /not/ give you logarithmic

time, it gives you constant time (in the amortized sense).

Suppose, for simplicity, you set the load threshold to 1 (i.e., you

double the size when there are as many elements as there are buckets).

Clearly, access is O(1) on average (assuming a reasonable hash function

without major collisions). So let's look at insertions:

In the worst case, the very last insert just pushes the load over the

edge and causes another doubling. Thus, every element in the table

has suffered through at least one rebuild. Half the elements have

suffered through at least one additional rebuild, one quarter has

suffered through at least a third rebuild, and so on. This gives rise

to a sum of a geometric series:

1 + 1/2 + 1/4 + ...

which is 2 in the limit. Thus, in the limit, each element has (on

average) suffered through 2 rebuilds. The time each rebuild takes is

constant per element that participates in that rebuild. Since each

element that is in the table must have been inserted at some point, we

charge the amortized constant amount of work due to rebuilds to that

insert operation. As a result, the time for one insert is constant on

average.

(In the best case, the last insert just fills the table without

causing another rebuild. In this case the series to be summed is

1/2 + 1/4 + ... = 1

i.e., the average cost of insertions due to rebuilds is only half as big.

In either case, it is constant.)

*> I also think one cannot do any better, because the best sorting*

*> algorithm, over all choices is O(N log N), so if one has better*

*> inserting complexity in a container than O(log N), one can beat that*

*> by merely insert elements one by one.*

A hash table does not sort, so this line of reasoning is not

applicable.

Matthias

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.