Related articles |
---|
Large-space GC pardo@cs.washington.edu (1992-08-20) |
Re: Large-space GC hisgen@src.dec.com (1992-08-20) |
Re: Large-space GC kandt@ai-titania.Jpl.Nasa.Gov (1992-08-21) |
Re: Large-space GC moss@cs.umass.edu (1992-08-24) |
Re: Large-space GC pardo@cs.washington.edu (1992-08-27) |
Newsgroups: | comp.compilers |
From: | pardo@cs.washington.edu (David Keppel) |
Organization: | Computer Science & Engineering, U. of Washington, Seattle |
Date: | Thu, 20 Aug 1992 02:23:17 GMT |
Keywords: | GC, storage, design |
As long as we're discussing GC: has anybody studied garbage collection in
wide-address/persistant store machines? Consider, for example, that you
have a machine with a 64-bit address space and a distributed single-level
store. In such a system it is likely that pointers in the active process
will point to memory on disk, across the network, etc. Today's GC's
already can take a long time because the GC space may be hundreds of
megabytes and because parts of that space are out of main memory. In
``large-space'' machines it is likely a space will be many gigabytes of
fragmented memory, distributed across even devices slower than disks.
Note that `conventional' distributed GC algorithms don't really solve the
problem because they still are designed for spaces on the order of
hundreds of megabytes. In short, they assume the the entire pointer space
can be traced to do one round of GC.
I am imagining situations where a process can map in large databases that
are used only sparsely, or where a process contains a pointer to a data
structure that contains pointers in to other processes, each happily
generating (and reclaiming) garbage. Even a simple dictionary could cover
gigabytes of data structures describing text, tables, static pictures,
animations, and so on. The time to trace the dictionary is probably on
the order of days.
The general problem is that in a large address space there isn't enough
time to trace *all* pointers to live objects. The GC really needs to
restrict the search to reasonable-size subspaces.
I imagine several possible solutions, but I'm sure there must be
others besides:
- Reference counting. Tends to have a large compute overhead and
fails when there are cycles, but has excellent locality and can
reallocate freed storage immediately.
- Mark segments of the memory space as non-GC'd. Variations are
possible: the collector gets to avoid performing a collection in the
segment, it gets to assume that the segment is never GC'd, thus
never moves, contains no roots, ...
- Mark segments as externally GC'd.
- Keep summary info for a segment, where the summary info says what
other segments have pointers in to the segment and what other
segments are pointed to by the segment. The summary info might be
similar to inter-generation info in generational collectors, or to
distributed collector references.
- Make all references in to a segment equivalent. The segment is
deallocated all at once, and only when the last pointer in to the
segement disappears.
- Qualify objects and types as reference-contributing or not,
exportable (to another segment) or not, ...
There are additional problems to be solved. For example, for security and
protection reasons, my user-level process really shouldn't be allowed to
set mark bits in something owned by another ``administrative entity.''
Yet halting tracing at read-only boundaries doesn't work, because the
read-only object may have a reference back in to the local (writable)
segment that the collector is trying to GC.
I will summarize e-mail.
;-D oN ( GC for the 90's: Reduce, Reuse, Recycle! ) Pardo
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.