Re: Garbage collection and quality of implementation issues

boehm@parc.xerox.com (Hans Boehm)
Thu, 26 May 1994 00:51:10 GMT

          From comp.compilers

Related articles
Garbage collection and quality of implementation issues mps@dent.uchicago.edu (1994-05-25)
Re: Garbage collection and quality of implementation issues boehm@parc.xerox.com (1994-05-26)
Re: Garbage collection and quality of implementation issues jgmorris+@cs.cmu.edu (1994-05-26)
Re: Garbage collection and quality of implementation issues mps@dent.uchicago.edu (1994-05-27)
Re: Garbage collection and quality of implementation issues mw@ipx2.rz.uni-mannheim.de (1994-05-27)
Re: Garbage collection and quality of implementation issues jgmorris+@cs.cmu.edu (1994-05-28)
Re: Garbage collection and quality of implementation issues adobe!mhamburg@uunet.UU.NET (1994-05-29)
| List of all articles for this month |

Newsgroups: comp.compilers
From: boehm@parc.xerox.com (Hans Boehm)
Keywords: GC, design
Organization: Xerox Palo Alto Research Center
References: 94-05-105
Date: Thu, 26 May 1994 00:51:10 GMT

mps@dent.uchicago.edu (Michael Spertus) writes:
>A conservative collector never deletes anything that even appears to have
>a pointer aimed at it. If a float looks like a pointer that is enough to
>protect what that psudo pointer aims at. This is poor at reclaiming space
>but has the advantage of working with languages like C which have
>ambiguous data structures such as unions and unstoppable memory moves and
>casts.


In practice, it actually tends to be quite good at reclaiming space. (My
paper in the 1993 SIGPLAN PLDI Proceedings has some details. Zhong Shao
also found that for most of Ben Zorn's test prgrams, the difference
between the maximum malloc/free allocated space and the maximum reachable
space found by the collector was very small.) Clearly you can still do
better with more type information.


Empirically, often the most significant source of leakage is not pointer
misidentification, but dead pointers on the stack.


>Better is a collector which understands data structures because it is
>built into the language.


"Built into the language" doesn't seem quite right. If my Scheme program
calls a C library, passing Scheme pointers, I would like the collector to
continue to operate correctly. I agree that it would be nice for the
compiler to pass on as much type information as it can, assuming this
doesn't involve a significant performance penalty.


>An OO way of looking at it is the collector asks
>objects what other objects they need to keep around. The method which
>answers that question (I call it a markRefs method) is built by the
>compiler. The markRefs method is often one method that uses tables but
>those tables can be projected into code for speed.


There's actually some debate around here about whether generating mark
procedures is really more efficient than generating tables. A lot no
doubt depends on the details. You can reasonably optimize the tables at
program startup time, but probably not the mark procedures. I don't fully
understand the tradeoffs.


>This way of looking at it allows a user defined markRefs method. This
>would allow ambiguous data structures, calculated pointers etc. It can
>also import human knowledge of the program and expand the class "things
>the collector is sure will never be needed".


>Consider a large array used as a stack. A generated collector will have to
>scan the entire array for pointers keeping many objects alive that will
>never be reached. A user defined markRefs method for the stack would treat
>it as a stack following only pointers below the stack top.


All of these are a good idea under the right circumstances. (In fact, our
so-called conservative collector supports all of them.) However, user
defined mark procedures are clearly error prone and should be used only
when really necessary. Compiler generated type information is nice when
it's available. But things shouldn't fall apart because my program
happens to use a third party library written in C, and the vendor happened
to use a compiler that didn't provide type information.


It seems to me that the right approach to getting garbage collection into
the mainstream is to start with a conservative collector (and ensuring
that compiler back ends generate code that does not conceal the last
reference to an object). Compiler provided type information can be added
later. Old code will continue to run, since conservative scanning (and
uncollectable, explicitly deallocated objects) are still available as a
backup.


Hans-J. Boehm
(boehm@parc.xerox.com)
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.