Re: Fat references

"BGB / cr88192" <>
Sun, 3 Jan 2010 21:09:34 -0700

          From comp.compilers

Related articles
[16 earlier articles]
Re: Fat references (Hans-Peter Diettrich) (2010-01-03)
Re: Fat references (Ray) (2010-01-03)
Re: Fat references (2010-01-03)
Re: Fat references (glen herrmannsfeldt) (2010-01-03)
Re: Fat references (Jon Harrop) (2010-01-04)
Re: Fat references (Kaz Kylheku) (2010-01-04)
Re: Fat references (BGB / cr88192) (2010-01-03)
Re: Fat references (Robert A Duff) (2010-01-04)
Re: Fat references (2010-01-04)
Re: Fat references (Kaz Kylheku) (2010-01-04)
Re: Fat references (Jon Harrop) (2010-01-05)
Re: Fat references (Dennis Ritchie) (2010-01-05)
Re: Fat references (Kaz Kylheku) (2010-01-05)
[6 later articles]
| List of all articles for this month |

From: "BGB / cr88192" <>
Newsgroups: comp.compilers
Date: Sun, 3 Jan 2010 21:09:34 -0700
References: 09-12-045 10-01-014
Keywords: storage, GC
Posted-Date: 04 Jan 2010 11:25:40 EST

"Ray" <> wrote in message news:10-01-014@comp.compilers...
> Jon Harrop wrote:


> A lot of modern VMs use fat references. I know that the JVM uses at
> least 2x(sizeof( void* )) to represent a pointer. And yes, it's to
> support the GC.


I have used oversize pointers in a few places, but I guess the point of note
is that very few of them are this way.

> When I implemented a lisp runtime, I went through two iterations; in
> the first, pointers were the size of pointers on the native
> architecture (in this case 64 bits) and there was a 5-word (ie, the
> size of 5 pointers) overhead for each heap-allocated object. But the
> code was complex and, as it turned out, slow.

a 5 pointers per object overhead is itself nasty...

actually, in my case I have 0 pointers for smaller objects.

I can actually do this because my GC design was originally influenced by
those used in Guile and PLT Scheme, which had used bitmap-based memory
allocation for small objects.

my current GC mutated away from its original form in many respects, so I am
not sure exactly how much is common now (many years have passed and I don't
remember their code).

from what I remember, both used tagged references, but my modern GC uses raw
pointers, ...

in any case, storage for smaller objects is generally managed by the use of
bitmaps (actually, they are 8-bit bytes per cell in my current GC, but I
don't have a better term for this, and each byte is essentially treated as a
collection of bits).

this leads to a constant-factor storage overhead for small objects (around
6.25%, since cells are 16 bytes).
there is then also an 8-byte per-object header (identifies object type and
some other info).

so, the small-object heap consists of "chunks", typically 1MB, which are
filled with cells (960KiB data, 64KiB bitmap, so about 61440 usable data

an object takes at least 1 cell (for <=8 bytes payload), or 2 cells (for <=
24 bytes).

cons cells have their own heap chunks, where they have a similar 6.25%
bitmap overhead, but don't have headers.

large objects use a more traditional allocator (at present, I use malloc for
objects >= 6KiB, since malloc's tend to be fairly efficient except for huge
numbers of very small objects, where malloc may turn evil).

resolving a pointer to an object usually involves a quick check to determine
where in the heap it points (into the cons heap, small object heap, or to a
large object), and after this point, it is mostly a matter of arithmetic to
get the object base (this much is mostly an artifact of conservative GC, a
precise GC would not need to "resolve" objects in this way since it would
know which references were valid in advance).

(my GC can optionally do faster resolution for some operations, but at the
cost of a loss of safety checking and having to supply the right type of

<snip> description of a (generational?) GC using fat pointers and linked
lists. </snip>

ran line counter, my GC is 7630 loc, so it is a bit larger.
granted, it is also a concurrent conservative mark/sweep + ref-counting GC.

1500 loc could be trimmed (there is a deflate compressor there which doesn't
need to be there...).
so, around 6130 loc or so.

actually, personally I have rarely been able to get good performance out of
linked-list GC's. I had tried in the past, but they were slow (taking easily
seconds to perform a GC pass on a 50-60MB heap...).

this is why I like cells, in addition to the relatively high memory density
for small objects...

actually, it is also why I like using arrays of pointers, and quicksort (to
get these arrays sorted before GC to allow faster lookup), so that it
becomes an O(log2 n) operation to find the heap chunk (or a large object)
for a given pointer (followed by a small amount of arithmetic for small
objects or conses).

> I expected performance with fat pointers to be horrible, but it
> actually wasn't. It turned out to be faster than the first version,
> (on an AMD64 dual-core processor with 128K/2M cache), probably due to
> more inlined values (I was using a lot of doubles) and better locality
> of reference.

granted, the critical issue WRT cache is actually the working set.

if the working set itself mostly fits in cache, then performance will not
change much (until one exceeds this).
OTOH, if the working set is huge, then performance will also not change
much, since it didn't fit in the cache before either (although, there could
be a linear slowdown due to the amount of memory access, which is less
likely to matter if it does fit).

then again, I guess the big downside would be that, if using the GC from C,
explicit fat pointers (AKA: not managed by the compiler) would be awkward
and likely prevent creation of an opaque API.

after all, if the client code knows anything, it is that the GC uses fat
pointers (and, very likely, a good deal more). this is unlike the case of
the client knowing little more of the GC's internals than that it uses raw
pointers (and some of the API calls, ...).

so, it "could" be a concern as well as to how the API looks, like, if it is
a clean and opaque API, or if the client code can see into the internals.

granted, there are many other cases where opaqueness is not as much of a

I guess though, all is a bit moot though absent objective comparrisons...

then again I am aware of some current internal flaws in my GC's
implementation which impede its reliability, and so trying to do a
stress-test on the thing is likely to just cause crashes...

before the thing was multi-threaded though (which is where problems were
introduced), I stress-tested the thing some, and optimized and fixed
whatever was misbehaving...

but, now I would need to fix it up some...

in general, it works well enough I guess though...

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.