11 Jul 1998 23:40:49 -0400

Related articles |
---|

[9 earlier articles] |

Re: Looking for symbol table hashing algorithm preston@tera.com (1998-07-08) |

Re: Looking for symbol table hashing algorithm drh@microsoft.com (Dave Hanson) (1998-07-08) |

Re: Looking for symbol table hashing algorithm smith@aragorn.uni-trier.de (1998-07-10) |

Re: Looking for symbol table hashing algorithm scott@basis.com (1998-07-10) |

Re: Looking for symbol table hashing algorithm henry@spsystems.net (1998-07-10) |

Re: Looking for symbol table hashing algorithm chase@naturalbridge.com (David Chase) (1998-07-10) |

Re: Looking for symbol table hashing algorithm dietz@interaccess.com (Paul Dietz) (1998-07-11) |

Re: Looking for symbol table hashing algorithm scottdaniels@earthlink.net (Scott David Daniels) (1998-07-13) |

Re: Looking for symbol table hashing algorithm buzzard@world.std.com (1998-07-13) |

Re: Looking for symbol table hashing algorithm peter.r.wilson@boeing.com (Peter Wilson) (1998-07-20) |

From: | Paul Dietz <dietz@interaccess.com> |

Newsgroups: | comp.compilers |

Date: | 11 Jul 1998 23:40:49 -0400 |

Organization: | The World's Usenet -- http://www.Supernews.com |

References: | 98-07-030 98-07-045 98-07-084 |

Keywords: | symbols |

David Chase wrote:

*> What this means is that the bucket chains must get really long*

*> before chaining is costly, and with a grade A hash function, the*

*> odds of that happening are very low indeed. I lack the skill-time*

*> product to generally compute odds, but if you've got 16 bucket*

*> heads, the odds against adding 16 entries and landing them all in*

*> the same bucket are (I think) 2**60 to 1.*

To make chaining have good expected time on any input set it suffices

to choose a random hash function such that the probability that two

different keys hash to the same location is <= 1/(table size). That's

because the time to insert n keys will be proportional to the

expectation of the sum of the n^2 random variables C(i,j), where

C(i,j) = 1 if keys i and j collide, and 0 if they do not. Expectation

of a sum of RVs is the sum of expectations, even if the RVs are not

independent.

Universal hash functions are covered in Cormen, Leiserson and Rivest.

*> For scoped symbol tables, there's a stunt, I think due to Paul Dietz,*

*> that is useful in some applications.*

[...]

*> Rather than modify the symbol table to reflect the changing scopes*

*> as compilation moves through the program, you simply note which*

*> scope you are in; a reference to a symbol X takes the form "get*

*> scopes from symbol table for X, find current scope in set of*

*> scopes". Numbering the scopes with in and out numbers in a standard*

*> tree walk generates bracketing ranges that allow you to find the*

*> nearest enclosing scope in log N time (binary search).*

What I did was solve this problem in the case of a tree that is

growing by addition of individual nodes, so one can't use a static

numbering. The original paper had a log* n extra factor, which Tarjan

immediately (I mean, seconds after the presentation) pointed out how

to avoid (see also the paper by Tsakalidis.)

You can do the lookup in O(loglogn) time on a RAM with O(logn) word

size using the van Emde Boas integer set data structure. Probably not

terribly useful practically.

Paul

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.