Re: Run-time representation of classes

"cr88192" <>
Sun, 8 Feb 2009 08:28:13 +1000

          From comp.compilers

Related articles
[5 earlier articles]
Re: Run-time representation of classes (Michael Schuerig) (2009-02-02)
Re: Run-time representation of classes (cr88192) (2009-02-03)
Re: Run-time representation of classes (George Neuner) (2009-02-04)
Re: Run-time representation of classes (Michael Schuerig) (2009-02-05)
Re: Run-time representation of classes (cr88192) (2009-02-05)
Re: Run-time representation of classes (Larry Evans) (2009-02-07)
Re: Run-time representation of classes (cr88192) (2009-02-08)
Re: Run-time representation of classes (George Neuner) (2009-02-08)
| List of all articles for this month |

From: "cr88192" <>
Newsgroups: comp.compilers
Date: Sun, 8 Feb 2009 08:28:13 +1000
References: 09-01-055 09-02-001 09-02-007 09-02-010 09-02-011
Keywords: OOP, GC, storage
Posted-Date: 08 Feb 2009 04:41:46 EST

"Michael Schuerig" <> wrote in message
> George Neuner wrote:
>> On Mon, 02 Feb 2009 14:05:48 +0100, Michael Schuerig
>> <> wrote:
>>>George Neuner wrote:
>>>> Most texts cover basic GC and creating pointer/reference maps for
>>>> your
>>>> structured types. That works well provided the maps are simple and
>>>> fast to decode (e.g., byte or word maps rather than bits). A
>>>> slightly faster, but more complex, method is to generate a
>>>> customized scanning function for each type that knows where to find
>>>> embedded references ... not all texts mention this method.
>>>Could you be a bit more specific about the texts, please? AFAIR,
>>>neither the Dragon Book nor Jones/Lins, "Garbage Collection", or
>>>Scott, "Programming Language Pragmatics", cover this. Surprisingly, I
>>>might add.
>> You need to re-read chapters 9,10 in Jones/Lins. *Carefully*.
> Definitely, however, they are concerned with GC for C and C++, but
> that's not what my question was (meant to be) about. What I didn't see
> in the books I mentioned is a description of pointer/reference maps for
> precise GC. The terms aren't even in the indexes. Of course, the
> concept shines through in places, but there is no treatment of specific
> datastructures generated by, say, a Haskell or Scheme compiler, neither
> of how these structures are used by the GC.
> Now, with some imagination I can fill-in the blanks, still, I'm
> surprised that I have to.

well, sadly, the GC for a VM is often closely tied to the workings of that
many VMs make use of special "tagged reference" values, and build the
machinery for managing much of the typesystem right into the GC...

however, in my case, I don't like this approach, so I made the GC be
"generically" typed, so whenever creating an object, one tells the GC the
type and size of the object. the GC then links the object back to a "type
context", which may hold any number of useful function pointers.

sadly, as a cost, it is no longer possible to have the same level of speed
for typechecking possible with tagged references (so, one may end up being
far more sparing with typechecks...). a rare exception here is CONS, which
does have specialized machinery in the GC (and, also, much faster

the typesystem code also provides its own typecheck code, which in some
cases may be faster, but sadly still not as fast (for example, a function
call and a range check for fixnums is not as fast as a little bit-twiddling
for the tagged reference would be).

But, as I see it, the great reduction in pain from being free of tagged
references and all of the types being fused with the GC pays for the
reduction in speed.

> [...]
>> Checking my shelf ... I find:
>> Appel's "Modern Compiler Implementation" [...]
> I don't know that one, but have his "Compiling with Continuations" on my
> list to read some day.
>> I don't have the latest [purple] dragon book, but given how recent it
>> is and its stated emphasis on compiling for more advanced languages, I
>> would be very surprised if there wasn't a treatment of pointer finding
>> and maps. Perhaps not an implementation, but certainly a discussion.
> There's about one page (sec. 7.8.3, p. 498) on conservative GC and
> scanning for pointers, but, no, there is nothing on how a precise GC
> knows which words in a chunk of memory are pointers.
> Again, I presume, they have left this out because they think it is
> obvious. And I agree, conceptually it is obvious. Looking at the source
> code of a safe language, there is no question what is a reference and
> what isn't. Well, let's just assume the GC knows the same and be done
> with it, there's no theoretical problem there. But then, there's no
> real theoretical problem with GC: just discard those objects that
> aren't referenced anymore.

there is a simple solution to this problem:
for simple object types, the GC calls a "mark handler", which is a special
function responsible for marking any pointers in the object.

for classe instances, the mark handler is located in the object system, and
may then scan over the class fields and mark any references (potentially
recursively, where it will mark any references in the current class, and
then recurse and mark the superclass).

this could allow precise GC, but may not be sufficient for moving GC (which
would either require the mark handlers passing the pointers to the
references to the GC, or having an additional "reference update handler",

I guess reference maps are also possible, since they would free the GC from
needing to make use of specialized mark functions (but, it would be needed
to have a reference map per-class, which could risk entangling the GC
machinery and the object-system machinery...).

> Still, not knowing any better, I suspect that there's a whole lot of
> interesting practical issues related to reference maps. Just three that
> I can think of: What are the trade-offs among various datastructures
> that might be used for the task? How do reference maps affect caching
> if the GC has to access a reference map for each object it touches? How
> are they organized when there are multiple translation units that are
> linked together?

well, a lot of this will depend somewhat on how the GC is organized.

here is another thing: don't bother thinking about caching in a GC pass.
reason: from the way a GC works, when a GC runs it typically thrashes the
cache anyways (since it accesses very likely a good portion of the app's
address space all in a short period of time).

but, likely, a reference map could exist per-type, that or a per-type
callback is used:
"give me a reference map for this object".

as for data structures, probably a good form would be an integer array,
potentially either with an explicit length or a terminator value (say, -1).
this could hold the offset relative to the start of the object, and would
allow the handler to reuse and return a general array.

filling and/or returning a pointer array could also be another possible
option (but would assume that the array be created or filled whenever it is

both approaches could be a little ugly when dealing with array types, and so
maybe this is not an ideal approach for arrays.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.