Related articles |
---|
Our experiences with garbage collection of C++ kelvin@kickapoo.cs.iastate.edu (1992-08-20) |
Newsgroups: | comp.compilers,comp.lang.c++ |
From: | kelvin@kickapoo.cs.iastate.edu (Kelvin Don Nilsen) |
Return-Path: | <news@news.iastate.edu> |
Organization: | Iowa State University, Ames IA |
Date: | Thu, 20 Aug 1992 20:43:24 GMT |
Keywords: | C++, GC |
Ph.D. candidate William Schmidt and I are winding down some recent
research on hardware-assisted real-time garbage collection. Together, we
would like to contribute the following additional observations to the
discussions regarding garbage collection for C++.
Background:
We have designed a new memory architecture that mimics traditional memory
while performing garbage collection as a background activity. We use a
variant of Baker's real-time copying garbage collection algorithm.
Numerous studies have demonstrated that this algorithm, implemented in
software, severely degrades overall system throughput. The goals of our
hardware design were to provide guaranteed real-time response to all
memory operations with minimal negative impact on system performance.
Additionally, it was our hope that the hardware costs could be kept to a
minimum.
To evaluate the utility of our proposed architecture, we have constructed
a RISC-based hardware simulator and retargeted a version of the g++
compiler to this architecture. Since we use a copying garbage collection
algorithm, it is necessary for the compiler to assist us in locating all
pointers in the system. In our environment, we tag each word at the time
of its allocation. These tags are transparent to the C++ application;
each 32-bit word of garbage-collected memory is accompanied by an extra
one-bit pointer tag.
With help from our g++ compiler, we have ported several applications to
the experimental architecture. These applications include a fast-fourier
transform program, the groff typesetting software, a simple line editor,
and a lisp interpreter written by Tim Budd. Empirical investigation of
these applications revealed shortcomings and bottlenecks in the original
designs of the architecture and code generator. In response to these
findings, we have made several refinements to the original design.
Unfortunately, due to limited time and resources, we have not yet been
able to run all of the sample applications through the most recent
incarnation of the system. Some of the performance figures reported below
represent careful extrapolations of measured results.
Results:
We feel that we have satisfied our major goals.
Guaranteed Real-Time Response:
The worst-case time required to fetch a word of garbage-collected
memory is six traditional memory cycles. The worst-case time
required to write a word of garbage-collected memory is three
traditional memory cycles. In practice, over 99% of fetches and
stores are completed in only one memory cycle. The worst-case time
required to initiate a flip is two traditional memory cycles times
the number of pointers in the application's root set.
High System Throughput:
Garbage-collected memory can be cached, and our simulations reveal
that data cache hit rates for garbage collected memory are usually
within a fraction of a percent of the hit rates for traditional
implementations of otherwise identical applications. The average
cost of servicing a cache miss is generally within 50% of the
cost of servicing cache misses from traditional memory. Garbage
collection consistently executes in approximately a tenth of the
time required by traditional C++ implementations to execute malloc
and free. However, the performance of our garbage collected C++
system is limited by the overhead of "tagging" pointers within
function activation frames. In sum, we have found that
garbage-collected C++ programs generally run within plus or minus
25% of the time required to run traditional implementations of
the same programs (That's right. Some programs run 25% slower.
Some run 25% faster!) These measurements are based on real
applications originally written for traditional architectures,
which have been tuned (to varying degrees) to those environments.
Is the hardware practical?
Remember, we are talking about real-time garbage collection. Thus,
we assume that the entire application fits within real memory.
Further, the amount of memory available to represent live objects
is only a fraction (typically, 35-45%) of the total amount of memory
dedicated to the garbage-collected memory module. We estimate the
costs of the additional hardware required to support this
architecture as 15-50% of the cost of the memory controlled by the
module. The bottom line:
Are you willing to spend 3-5 times the cost of traditional
memory to get "high-performance" real-time garbage
collection?
By the way, the special hardware required to support garbage
collection is located entirely within the expansion memory
module. No special hardware is required in the CPU. We
originally set out to design hardware that is completely
portable between CPU and bus-based system architectures.
Our garbage-collected memory module will work in any bus-based
architecture that provides support for some form of multi-processor
cache coherency. Alternative configurations are available for
cached and uncached single-processor environments.
If you are interested in more complete descriptions of our system design
and/or experimental methods, please don't hesitate to contact either of
us:
Kelvin Nilsen William Schmidt
Assistant Professor Ph.D. candidate
Computer Science Dept. Computer Science Dept.
Iowa State University Iowa State University
kelvin@cs.iastate.edu schmidt@cs.iastate.edu
--
Kelvin Nilsen/Dept. of Computer Science/Iowa State University/Ames, IA 50011
(515) 294-2259 kelvin@cs.iastate.edu uunet!kelvin@cs.iastate.edu
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.