Large-space GC

pardo@cs.washington.edu (David Keppel)
Thu, 20 Aug 1992 02:23:17 GMT

          From comp.compilers

Related articles
Large-space GC pardo@cs.washington.edu (1992-08-20)
Re: Large-space GC hisgen@src.dec.com (1992-08-20)
Re: Large-space GC kandt@ai-titania.Jpl.Nasa.Gov (1992-08-21)
Re: Large-space GC moss@cs.umass.edu (1992-08-24)
Re: Large-space GC pardo@cs.washington.edu (1992-08-27)
| List of all articles for this month |

Newsgroups: comp.compilers
From: pardo@cs.washington.edu (David Keppel)
Organization: Computer Science & Engineering, U. of Washington, Seattle
Date: Thu, 20 Aug 1992 02:23:17 GMT
Keywords: GC, storage, design

As long as we're discussing GC: has anybody studied garbage collection in
wide-address/persistant store machines? Consider, for example, that you
have a machine with a 64-bit address space and a distributed single-level
store. In such a system it is likely that pointers in the active process
will point to memory on disk, across the network, etc. Today's GC's
already can take a long time because the GC space may be hundreds of
megabytes and because parts of that space are out of main memory. In
``large-space'' machines it is likely a space will be many gigabytes of
fragmented memory, distributed across even devices slower than disks.


Note that `conventional' distributed GC algorithms don't really solve the
problem because they still are designed for spaces on the order of
hundreds of megabytes. In short, they assume the the entire pointer space
can be traced to do one round of GC.


I am imagining situations where a process can map in large databases that
are used only sparsely, or where a process contains a pointer to a data
structure that contains pointers in to other processes, each happily
generating (and reclaiming) garbage. Even a simple dictionary could cover
gigabytes of data structures describing text, tables, static pictures,
animations, and so on. The time to trace the dictionary is probably on
the order of days.


The general problem is that in a large address space there isn't enough
time to trace *all* pointers to live objects. The GC really needs to
restrict the search to reasonable-size subspaces.


I imagine several possible solutions, but I'm sure there must be
others besides:


  - Reference counting. Tends to have a large compute overhead and
      fails when there are cycles, but has excellent locality and can
      reallocate freed storage immediately.


  - Mark segments of the memory space as non-GC'd. Variations are
      possible: the collector gets to avoid performing a collection in the
      segment, it gets to assume that the segment is never GC'd, thus
      never moves, contains no roots, ...


  - Mark segments as externally GC'd.


  - Keep summary info for a segment, where the summary info says what
      other segments have pointers in to the segment and what other
      segments are pointed to by the segment. The summary info might be
      similar to inter-generation info in generational collectors, or to
      distributed collector references.


  - Make all references in to a segment equivalent. The segment is
      deallocated all at once, and only when the last pointer in to the
      segement disappears.


  - Qualify objects and types as reference-contributing or not,
      exportable (to another segment) or not, ...


There are additional problems to be solved. For example, for security and
protection reasons, my user-level process really shouldn't be allowed to
set mark bits in something owned by another ``administrative entity.''
Yet halting tracing at read-only boundaries doesn't work, because the
read-only object may have a reference back in to the local (writable)
segment that the collector is trying to GC.


I will summarize e-mail.


;-D oN ( GC for the 90's: Reduce, Reuse, Recycle! ) Pardo
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.