Re: C Hashmap implementation

Vladimir Makarov <vmakarov@redhat.com>
25 Apr 2007 04:16:22 -0400

          From comp.compilers

Related articles
C Hashmap implementation Sean.Gillespie@bisonofborg.com (bison) (2007-04-23)
Re: C Hashmap implementation vmakarov@redhat.com (Vladimir Makarov) (2007-04-25)
Re: C Hashmap implementation cr88192@hotmail.com (cr88192) (2007-04-26)
Re: C Hashmap implementation Sean.Gillespie@bisonofborg.com (bison) (2007-04-26)
Re: C Hashmap implementation cdiggins@gmail.com (Christopher Diggins) (2007-04-26)
Re: C Hashmap implementation gneuner2@comcast.net (George Neuner) (2007-04-26)
Re: C Hashmap implementation gene.ressler@gmail.com (Gene) (2007-04-26)
Re: C Hashmap implementation roessmann@gmx.net (Bernhard Roessmann) (2007-04-26)
[3 later articles]
| List of all articles for this month |
From: Vladimir Makarov <vmakarov@redhat.com>
Newsgroups: comp.compilers
Date: 25 Apr 2007 04:16:22 -0400
Organization: Red Hat, Inc.
References: 07-04-089
Keywords: symbols
Posted-Date: 25 Apr 2007 04:16:22 EDT

bison wrote:
> Hello. I'm looking into hashmap implementations for a VM based
> dynamically typed language. Since the implementation must allow me to
> resize on demand, I am a bit confused about the various approaches.
> Here is what I have found out so far:
>
> Almost everyone I've talked to has said that Chained Hashmaps are much
> easier to implement than Open Addressed maps.
>
> Wikipedia suggests that an approach to resizing hashmaps is to
> allocate space for a newer hashmap and copy elements from to the new
> table, and in some cases do it incrementally.
>
> Quick question about the last point: I'm curious about a starting
> point. How much space should a hashmap allocate initially, and when
> it's full...by how much would I typically increase it? I realize
> there are lots of different answers here, so a good starting point
> would really help out.
>


I don't think there is a theory for choosing better start size and
rules to change the table size because it is very dependend on usage
pattern of the resizable hash table. Usually all of this is taken
from experimental results on some benchmarks.


Gcc has resizable hash tables with double hashing and without chains
for several years because gcc is very memory hungry and such hashtable
without chains is more compact as a rule. You could look at that as a
start point (the file is libiberty/hashtab.c). Somebody told me (but
I did not check that), these hash tables are used in a Huskell
implementation.


Practically the same implementation but with simplier interface can be
found on http://cocom.sf.net (the hashtable description is on
http://cocom.sourceforge.net/ammunition-5.html). This package is used
in implementation of associative tables in Dino language (the element
of the table can be of any type including associative table itself).
The description of Dino language and implementation can be found on


http://cocom.sourceforge.net/dinopage.html


Hash table implementation for DINO is in file cocom/DINO/d_heap.c


Gcc and Dino for example have different rules for increasing table
size because of different usage patterns.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.