Re: C Hashmap implementation

"cr88192" <cr88192@hotmail.com>
26 Apr 2007 09:35:06 -0400

From comp.compilers

Related articles
C Hashmap implementation Sean.Gillespie@bisonofborg.com (bison) (2007-04-23)
Re: C Hashmap implementation vmakarov@redhat.com (Vladimir Makarov) (2007-04-25)
*Re: C Hashmap implementation cr88192@hotmail.com (cr88192)* (2007-04-26)**
Re: C Hashmap implementation Sean.Gillespie@bisonofborg.com (bison) (2007-04-26)
Re: C Hashmap implementation cdiggins@gmail.com (Christopher Diggins) (2007-04-26)
Re: C Hashmap implementation gneuner2@comcast.net (George Neuner) (2007-04-26)
Re: C Hashmap implementation gene.ressler@gmail.com (Gene) (2007-04-26)
Re: C Hashmap implementation roessmann@gmx.net (Bernhard Roessmann) (2007-04-26)
Re: C Hashmap implementation DrDiettrich1@aol.com (Hans-Peter Diettrich) (2007-04-27)
[2 later articles]

| List of all articles for this month |

From:	"cr88192" <cr88192@hotmail.com>
Newsgroups:	comp.compilers
Date:	26 Apr 2007 09:35:06 -0400
Organization:	Saipan Datacom
References:	07-04-089
Keywords:	symbols
Posted-Date:	26 Apr 2007 09:35:06 EDT

"bison" <Sean.Gillespie@bisonofborg.com> wrote in message
> Hello. I'm looking into hashmap implementations for a VM based
> dynamically typed language. Since the implementation must allow me to
> resize on demand, I am a bit confused about the various approaches.
> Here is what I have found out so far:
>
> Almost everyone I've talked to has said that Chained Hashmaps are much
> easier to implement than Open Addressed maps.

I would agree here.
there are some cases where they make a lot more sense as well.

> Wikipedia suggests that an approach to resizing hashmaps is to
> allocate space for a newer hashmap and copy elements from to the new
> table, and in some cases do it incrementally.

yes.
but then you have 2 hashes, one of which is around until the old one is
emptied.

sometimes one can resize the table though, and use a gradiated scheme:
first try the largest hash;
if failed, try the smaller hash;
....

when done copying the entry to the largest hash.
though not tried personally, this could have contention issues though (esp
if one uses a permutative lookup scheme, where potentially removing an entry
could eliminate existing entries, implying that one can't simply stop when
they encounter an empty slot, but must first check some n permutations).

> Quick question about the last point: I'm curious about a starting
> point. How much space should a hashmap allocate initially, and when
> it's full...by how much would I typically increase it? I realize
> there are lots of different answers here, so a good starting point
> would really help out.
>

for resizable things, usually I start out with a low-max guestimate, say 256
to 4096.
usually I expand in increments of 50%, so:
sz'=sz+(sz>>1);

in practice, this seems a fairly good factor (avoids extra expansions while
still usually keeping a fairly tight fit on the data, vs say, doubling the
size each time).

if being more conservative, a 4/3 factor might be better than a 3/2 factor,
but I don't know...

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: C Hashmap implementation

"cr88192" <cr88192@hotmail.com>26 Apr 2007 09:35:06 -0400

"cr88192" <cr88192@hotmail.com>
26 Apr 2007 09:35:06 -0400