Re: data allocation in interpreters

George Peter Staplin <georgeps@xmission.com>
3 Jun 2006 18:58:36 -0400

          From comp.compilers

Related articles
data allocation in interpreters weltraum@astrocat.de (2006-05-30)
Re: data allocation in interpreters pjb@informatimago.com (Pascal Bourguignon) (2006-05-30)
Re: data allocation in interpreters haberg@math.su.se (2006-06-03)
Re: data allocation in interpreters georgeps@xmission.com (George Peter Staplin) (2006-06-03)
Re: data allocation in interpreters gmt@CS.Arizona.EDU (2006-06-07)
| List of all articles for this month |

From: George Peter Staplin <georgeps@xmission.com>
Newsgroups: comp.compilers
Date: 3 Jun 2006 18:58:36 -0400
Organization: nil
References: 06-05-091
Keywords: interpreter, storage, comment
Posted-Date: 03 Jun 2006 18:58:36 EDT

weltraum@astrocat.de toggled some bits and produced:
> Hi,
>
> What are common techniques for data allocation (strings, high-level
> data structures) in interpreters? I could think about:
>
> - malloc / free with trying to free unused data as soon as possible.
> - same as above with some custom implementation of malloc / free (which
> one?)
> - full blown grabage collector.
> - somthing hybrid (how?)
>
> What do you think, how is it implemented in the famous interpreters?


I assume "full blown" GC doesn't include reference counting. A lot of
implementations in C use structures/objects with a reference count
member. Whenever the int ->refCount drops to 0 or below the object is
generally recycled. By recycled I mean the object is available for
reuse, and is generally added to the head of a free-list of an object
pool. The problem with that approach generally is that it requires
more manual management, but if you automatically generate the C it can
be easier. Also, it can't handle circular structures without very
clever methods. A paper reference for circular structures handled
with a clever reference counting algorithm is in Knuth's volume 1 from
what I recall (in the garbage collection section).


Tcl uses reference counting in the C API.


Python from what I've heard uses reference counting, and some automatic
GC. It's probably a "hybrid" architecture.


Java has a variety of "full blown" GC methods that can be tuned:
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html


Garbage collection is actually a very old technique in the history of
computing. Dijkstra's concurrent collector is my favorite, because it
doesn't result in long pauses, and it's elegant.


Also, there are some people that claim that the Boehm collector for C is
good. I think it's very dangerous, because if you have a large
allocation, and some integer that happens to have the same value as a
pointer's address the object may not be freed. Thus it's called a
"conservative collector." IMO it's just a bad idea, but it works for
simple programs I suppose, and if you only retain a few small objects
now and then.


You asked about string management. AFAIK that's usually done with
another allocator than the object-pool allocator (something like
malloc), and a pointer is associated with the object. When the object
is released to the free-list, the string is usually freed.


Have fun :)


-George
[I gather that the Boehm collector has worked in some rather large programs,
and the number of false pointers is usually quite low. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.