'conservative' GC == 'risky' GC

hbaker@netcom.com (Henry G. Baker)
Sat, 21 May 1994 22:01:39 GMT

          From comp.compilers

Related articles
'conservative' GC == 'risky' GC hbaker@netcom.com (1994-05-21)
Re: 'conservative' GC == 'risky' GC jgmorris+@cs.cmu.edu (1994-05-22)
Re: 'conservative' GC == 'risky' GC wright@asia.cs.rice.edu (1994-05-23)
Re: 'conservative' GC == 'risky' GC pardo@cs.washington.edu (1994-05-24)
Re: 'conservative' GC == 'risky' GC tmb@arolla.idiap.ch (1994-05-25)
Re: 'conservative' GC == 'risky' GC markt@harlequin.co.uk (1994-05-26)
Re: 'conservative' GC == 'risky' GC jgmorris+@cs.cmu.edu (1994-05-27)
[7 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: hbaker@netcom.com (Henry G. Baker)
Summary: long, but important
Keywords: GC
Organization: Compilers Central
Date: Sat, 21 May 1994 22:01:39 GMT

What's a 'Conservative' Garbage Collector?


It has recently become clear from various postings on the Internet that
the phrase 'conservative garbage collector' is not being used correctly.
This is especially distressing, because people are starting to use
'conservative' garbage collectors, and they have not been properly
informed of their characteristics and _risks_. In this post, I would like
to clarify the meaning of this phrase, and possibly some other
misconceptions about 'garbage collection' in general.


A _memory manager_ is a set of routines for managing storage in a language
and/or operating system. A memory manager typically handles the problem
of finding and allocating memory blocks of the appropriate size, and
accepting 'free' blocks back from the language and/or application for
later reallocation.


An _automatic memory manager_ or _garbage collector_ is a memory manager
which is automatically called by the language and/or operating system to
reclaim 'free' blocks when the language and/or operating system can prove
that they are no longer accessible to the program and/or application.


Automatic memory managers or garbage collectors fall into two major
categories: reference counters and tracers. Reference count memory
managers keep a counter for each managed object which indicates the number
of 'live' pointers to this object, whether from other objects, or from
non-object locations called 'roots', which usually consist of global
variables and local variables in 'stack frames' or 'activation records'.
When the reference count of an object becomes zero, the object can be
reclaimed, and its storage made available for a new usage. The reference
counts of objects pointed to by this newly freed object can be immediately
or promptly (strictly) decremented, leading to recursive freeing.
Alternatively, the reference count decrementing of the objects pointed to
by this newly freed object can be deferred or done _lazily_, at the time
that the object is reallocated. This lazy reference count decrementing
requires a _stack_ or a _queue_ for a 'freelist', but spreads the work
more evenly than the 'strict' reference count method usually taught.


A tracing memory manager does not require the continual updating of
reference counts, but occasionally traces out the entire 'accessibility'
relation by starting from the 'roots' and computing the transitive closure
of the 'points to' relation. After this has been done, the GC can now be
sure that the _complement_ of the set of accessible memory locations is
thus _inaccessible_ and can be reused for other purposes. There are
generally two methods of tracing -- 'marking' and 'copying'. Marking
simply paints each object as it sees it, and recursively paints the
objects it points to, and when everything that is accessible has been
painted, it knows that unpainted objects are inaccessible and therefore
garbage. Copying is even simpler; as each accessible object is found, it
is moved to a new location, leaving a forwarding address. After all
accessible objects have been moved, the old neighborhood is simply
condemned en masse.


Thus, _all_ automatic memory managers utilize the concept of a 'root',
which is a non-object pointer to an object. Traditionally, languages
which utilized automatic memory managers had their compilers provide
detailed information about the 'roots' that the memory manager needed. A
reference counting manager needs to know when a root is created, and when
one goes away. A tracing memory manager needs to be able to _enumerate_
(iterate through) the roots.


Recently, however, there has been great interest in adding garbage
collection to languages like C which were not designed for garbage
collection, and whose compilers do not provide the proper 'root'
information to the memory manager. We are thus led to the concepts of a
'conservative' garbage collector and an 'ambiguous' root.


An 'ambiguous root' is a location that _may_ be a root, but the memory
manager can't be sure. It therefore attempts to treat it like a root, and
considers any objects that it might point to as _accessible_. If the set
of _ambiguous roots_ is guaranteed to be a superset of the set of _actual
roots_, then a tracing memory manager will not erroneously reclaim an
object which is still in use (dangling pointer problem), but _may_
continue to hang onto objects that really are garbage, but appear to be
accessible due to the ambiguity of one or more roots.


A 'conservative' garbage collector is simply one in which the data
segment, the stack segment, and the current register set are treated
as the set of 'ambiguous' roots. _Assuming_ that this set includes
all 'roots', such a 'conservative' garbage collector will find and
mark all accessible objects, along with some number of inaccessible
objects, and then free the unmarked objects which are _inferred_ to be
garbage. In a large number of cases, conservative garbage collection
appears to work well, even in very uncooperative environments like C.


_However_, a 'conservative' GC _can fail_, and _can produce dangling
pointers_. This happens when some object is accessible from a root that
isn't _evident_ in the ambiguous root set. This can happen, for example,
if an object is directly addressed by a RISC architecture by loading a
half-word constant into a register, and then later addressing the object
via a half-word offset to the constant in the register. Note that a
proper pointer to the object _never_ appears in any register (except for
an inaccessible memory address register), and so the object may be
erroneously reclaimed by a 'conservative' garbage collector and later
cause the program to crash due to a doubly allocated object.


Thus, the word 'conservative' in the phrase 'conservative garbage
collection' is a _misnomer_ in the sense that it isn't conservative at all
-- a conservative garbage collector is actually taking _risks_. In this
sense, the phrase 'conservative garbage collection' should really be read
as 'risky garbage collection'.


As a result of the foregoing discussion, _I would advise against using
'conservative' garbage collection in any mission-critical system --
especially any systems involving possible risks to human life or health.


On the other hand, _as a temporary measure_ until C and C++ can get their
act together to start supplying proper root information to memory
managers, conservative garbage collection may be very useful in a
_development_ or _research_ environment. _Just don't get too comfortable,
and forget that this risk could come back to haunt you_.


Here are some of the basic references on 'conservative' garbage collection:


Bartlett, J.F. Compating garbage collection with ambiguous roots.
Tech. Rept. DEC WRL 88/2, DEC WRL, Palo Alto, CA Feb., 1988.


Boehm, H.-J., and Weiser, M. Garbage collection in an uncooperative
environment. Software Practice & Experience 18,9 (Sept. 1988),
807-820.


Paul Wilson (wilson@cs.utexas.edu) also has a number of ftp-able GC papers
which touch on all sorts of topics.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.