Re: 'conservative' GC == 'risky' GC

chase@Think.COM (David Chase)
Thu, 26 May 1994 22:42:48 GMT

          From comp.compilers

Related articles
[2 earlier articles]
Re: 'conservative' GC == 'risky' GC wright@asia.cs.rice.edu (1994-05-23)
Re: 'conservative' GC == 'risky' GC pardo@cs.washington.edu (1994-05-24)
Re: 'conservative' GC == 'risky' GC tmb@arolla.idiap.ch (1994-05-25)
Re: 'conservative' GC == 'risky' GC markt@harlequin.co.uk (1994-05-26)
Re: 'conservative' GC == 'risky' GC jgmorris+@cs.cmu.edu (1994-05-27)
Re: 'conservative' GC == 'risky' GC boehm@parc.xerox.com (1994-05-27)
Re: 'conservative' GC == 'risky' GC chase@Think.COM (1994-05-26)
Re: 'conservative' GC == 'risky' GC hbaker@netcom.com (1994-05-31)
Re: 'conservative' GC == 'risky' GC hbaker@netcom.com (1994-05-30)
Re: 'conservative' GC == 'risky' GC boehm@parc.xerox.com (1994-05-31)
Re: 'conservative' GC == 'risky' GC ok@cs.rmit.oz.au (1994-06-07)
Re: 'conservative' GC == 'risky' GC boehm@parc.xerox.com (1994-06-13)
| List of all articles for this month |

Newsgroups: comp.compilers
From: chase@Think.COM (David Chase)
Keywords: GC
Organization: Thinking Machines Corporation, Cambridge MA, USA
References: 94-05-084 94-05-109
Date: Thu, 26 May 1994 22:42:48 GMT

tmb@arolla.idiap.ch (Thomas M. Breuel) writes:
> The existence of conservative GC "add-ons" to a wide variety of
> compilers demonstrates that this does not impose a significant
> burden on language implementors.


I've been meaning to make an extended comment on this discussion, but I
still don't have the time to do so. There's a couple of things that need
saying:


1. Let's not get wrapped around the axle w.r.t. what is or is not
"conservative" (after all, as compiler writers, we speak of "optimizing"
programs, which is a pretty hopeful piece of jargon). The way in which
the Boehm-Weiser collector (all instances of it, in fact) treats the
contents of static data areas, registers, and stacks, suits me as a good
definition of "conservative", and it is (to my knowledge) the most widely
ported and used garbage collector for C. If you read the relevant
literature (e.g., the papers by Boehm) you'll see that the people working
on conservative collectors are well aware of different ways that things
can go wrong.


2. TMB's comment could be very wrong, depending upon how sure you want to
be of your conservative GC. It's one thing to observe that a compiler
rarely, if ever, thwarts your conservative GC. It's an extraordinarily
much harder thing for a compiler writer to verify, after the fact, that
the compiler will never, ever, hide a pointer from the conservative
garbage collector (and yes, I actually considered doing this while working
on an optimizer at Sun. The result of this informal spare-time study was
to try instead the approach described in the paper that Hans Boehm and I
wrote). This job is made especially hard because (as Chris Aoki observed)
"how are you going to test the damn thing?" Simply running your tests
isn't going to do much at all, because garbage collections are relatively
rare, even in a generational system. Furthermore, failures (especially in
a multithreaded environment) may be horribly non-repeatable. I'd like to
think that I'm good enough and careful enough a programmer to get it right
from first principles if I started writing a compiler from scratch (as if
someone would pay me to do this), but we tested plenty much back at Sun,
and damned if the tests didn't show me where I'd screwed up from time to
time (in one case, the bug lay hidden for 18 months).


And, in fact, a suitably patient person who has some idea of what an
optimizing compiler trying to do, can usually "coach" the compiler into
both generating an out-of-bounds derived pointer and reusing the register
that holds the only "bare" version of the pointer. It takes an
appropriate combination of register pressure and weird addressing
expressions in a loop, but I first did this in 1988 with a Sun-3 Fortran
compiler.


3. I thank Henry Baker for the thesis plug, but if anyone is crazy enough
to actually read it, it is worth noting that there has been progress in
the world of garbage collection since then, and this affects the
collector-compiler contracts that might be interesting. In particular, I
think the Bartlett conservative-compacting collector is a wonderful thing,
and now (unless I were trying to implement a persistent store, like Eliot
Moss et al at U Mass) I would only propose variations on conservative
stack scanning. There's at least two interesting ways that the compiler
can help. For unannotated stacks, it can ensure that if there is a
derived pointer to X which is live, then there is a "bare" pointer to X
also visible to the collector. This is all that is necessary to prevent
incorrect collection. The second approach is slightly more aggressive,
but not much. If the compiler is going to take the trouble to ensure that
there is a bare pointer to "X" visible if a derived pointer is in use,
then the compiler could also emit some sort of a frame descriptor that
identified all locations holding bare pointers (it knows where they are,
right?). The collector only needs to scan the identified pointers, though
the objects so identified still may not be relocated (this is an ever-so-
slightly-different sense of conservative -- the B-W collector must be
conservative both because it isn't sure if it sees all pointers, and
because it isn't really sure if it is seeing a pointer or an unlucky
non-pointer -- this collector would know that it sees pointers, but it is
almost certainly not seeing all pointers).


My suspicion is that the second approach has certain other advantages with
respect to testing, in that the compiler is actually making explicit
assertions about the state of pointers -- I can imagine instrumenting the
object code to verify this information, for example.


4. There are a number of other ways that things can "go wrong" besides the
hidden pointers. How wrong they can go depends upon the collector that
you use -- things go much further wrong with a compacting collector, for
instance. Here's one example I haven't seen mentioned yet (this is taken
straight from my dissertation, back in 1987): If you have a compacting
collector, and you allocate an aggregate, you had better be sure (as an
implementor) that its fields are all initialized to "sensible" values. Of
course, it is highly likely that the programmer is going to turn right
around and initialize it again with some "real" values. That means that
the first ("sensible") initialization is ALL DEAD CODE, right? Any
optimizer with its salt would eliminate it, right? OOPS.


Dead code problem number two comes with things like reference counting
collectors and conservative collectors. In both cases, it often helps to
write zeros into variables and aggregate fields when you "know" that you
are done with them -- if the object is involved in a cycle, this might
break the cycle, or if the object is spuriously retained, this might help
bound the size of the leak. Of course, since you're done with the
variable (why else would you write a zero into it?) then the assignment is
dead, and if your compiler is smart/lucky, it will find and delete that
dead assignment, undoing whatever assistance you thought you were
providing to the collector. OOPS.


I'm beginning to think that perhaps I should publish a little more on this
front. Sigh.


David Chase, speaking for myself
Thinking Machines Corp.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.