Re: Run-time representation of classes

George Neuner <>
Sun, 08 Feb 2009 00:21:37 -0500

          From comp.compilers

Related articles
[6 earlier articles]
Re: Run-time representation of classes (cr88192) (2009-02-03)
Re: Run-time representation of classes (George Neuner) (2009-02-04)
Re: Run-time representation of classes (Michael Schuerig) (2009-02-05)
Re: Run-time representation of classes (cr88192) (2009-02-05)
Re: Run-time representation of classes (Larry Evans) (2009-02-07)
Re: Run-time representation of classes (cr88192) (2009-02-08)
Re: Run-time representation of classes (George Neuner) (2009-02-08)
| List of all articles for this month |

From: George Neuner <>
Newsgroups: comp.compilers
Date: Sun, 08 Feb 2009 00:21:37 -0500
Organization: A noiseless patient Spider
References: 09-01-055 09-02-001 09-02-007 09-02-010 09-02-011
Keywords: OOP, GC
Posted-Date: 08 Feb 2009 04:42:25 EST

On Thu, 05 Feb 2009 02:19:44 +0100, Michael Schuerig
<> wrote:

>George Neuner wrote:
>> On Mon, 02 Feb 2009 14:05:48 +0100, Michael Schuerig
>> <> wrote:
>>>George Neuner wrote:
>>>> Most texts cover basic GC and creating pointer/reference maps for
>>>> your
>>>> structured types. That works well provided the maps are simple and
>>>> fast to decode (e.g., byte or word maps rather than bits). A
>>>> slightly faster, but more complex, method is to generate a
>>>> customized scanning function for each type that knows where to find
>>>> embedded references ... not all texts mention this method.
>>>Could you be a bit more specific about the texts, please? AFAIR,
>>>neither the Dragon Book nor Jones/Lins, "Garbage Collection", or
>>>Scott, "Programming Language Pragmatics", cover this. Surprisingly, I
>>>might add.
>> You need to re-read chapters 9,10 in Jones/Lins. *Carefully*.
>Definitely, however, they are concerned with GC for C and C++, but
>that's not what my question was (meant to be) about. What I didn't see
>in the books I mentioned is a description of pointer/reference maps for
>precise GC. The terms aren't even in the indexes. Of course, the
>concept shines through in places, but there is no treatment of specific
>datastructures generated by, say, a Haskell or Scheme compiler, neither
>of how these structures are used by the GC.

Specifically, those chapters are about some ways to achieve precise
knowledge of pointer locations in the heap. I'm sorry that the
authors' use of C and C++ for their discussions was so distracting.

>Now, with some imagination I can fill-in the blanks, still, I'm
>surprised that I have to.

Seeing the nuts and bolts is useful, but if you depend on it for
learning, you'd better find another profession. Filling in the blanks
is a pre-requisite for being a software developer. Many times an
example is all you'll find - it might even be in a language you know.
For some hard problems all you'll ever find is a discussion of how to
approach a solution.

Similarly, as a developer, the probability of finding drop-in code
that solves your exact problem will be quite low. I've been writing
software professionally for 25 years and I can count, without running
out of fingers, the number of times I've been able to drop in 3rd
party routines or a library that didn't require a significant amount
of glue to adapt it to my program. [I'm discounting situations where
a program was deliberately written to use a library from the start.
I'm talking mostly about grabbing 3rd party code to quickly add
features to an existing application. Somehow, it never works out as
easily as you'd like.]

>> I don't have the latest [purple] dragon book, but given how recent it
>> is and its stated emphasis on compiling for more advanced languages, I
>> would be very surprised if there wasn't a treatment of pointer finding
>> and maps. Perhaps not an implementation, but certainly a discussion.
>There's about one page (sec. 7.8.3, p. 498) on conservative GC and
>scanning for pointers, but, no, there is nothing on how a precise GC
>knows which words in a chunk of memory are pointers.

That's disappointing. I expected more.

>Again, I presume, they have left this out because they think it is
>obvious. And I agree, conceptually it is obvious. Looking at the source
>code of a safe language, there is no question what is a reference and
>what isn't. Well, let's just assume the GC knows the same and be done
>with it, there's no theoretical problem there. But then, there's no
>real theoretical problem with GC: just discard those objects that
>aren't referenced anymore.

I would say the tri-color abstraction is a theoretical issue, but I
agree that GC is mostly a series of practical implementation problems.

>Still, not knowing any better, I suspect that there's a whole lot of
>interesting practical issues related to reference maps. Just three that
>I can think of: What are the trade-offs among various datastructures
>that might be used for the task? How do reference maps affect caching
>if the GC has to access a reference map for each object it touches?

Detlef (Jones/Lins ch10) does address performance tradeoffs between
using bitmaps and bytecode maps.

Locality and cache, of course, can be affected by using maps - that's
why Detlef's implementation put the maps in the objects themselves
(trading space for speed). A better solution is to associate a
scanning function with each object/class. The function is generated
with hard coded pointer offsets taken directly from the object/class
definition. Detlef mentions this approach as an alternative to maps
in his paper - I don't recall if Jones/Lins discussion also includes
it (I'm away from the book at the moment).

Obviously you can also use scan functions that embed pointer maps as
constant data. That also has cache consequences although the effects
may be different than when placing maps with class descriptors.

You can also use scan functions for stack frames, although it's more
involved to implement functions using hard coded offsets. What's more
typical is to have just one stack scan function that interprets a
pointer map embedded in each frame. This has little effect on
locality because the map is in the object (frame) being scanned.

>How are they organized when there are multiple translation units that
>are linked together?

It's only a problem if somebody doesn't play by the rules.

My preference is to use scan functions regardless of whether maps are
also used. The collector doesn't need to know anything about object
representations, but only how to call a function given an object's
address. It simplifies the interface to the GC and puts the
responsibility to understand object representations on the language
runtime (where I think it belongs).


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.