Re: Fat references

Jon Harrop <jon@ffconsultancy.com>
Wed, 30 Dec 2009 21:57:57 +0000

          From comp.compilers

Related articles
Fat references jon@ffconsultancy.com (Jon Harrop) (2009-12-29)
Re: Fat references paul.biggar@gmail.com (Paul Biggar) (2009-12-30)
Re: Fat references bobduff@shell01.TheWorld.com (Robert A Duff) (2009-12-30)
Re: Fat references cr88192@hotmail.com (BGB / cr88192) (2009-12-30)
Re: Fat references gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-12-30)
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2009-12-30)
Re: Fat references kkylheku@gmail.com (Kaz Kylheku) (2009-12-30)
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2009-12-30)
Re: Fat references gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-12-31)
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2010-01-01)
Re: Fat references cr88192@hotmail.com (BGB / cr88192) (2010-01-01)
Re: Fat references cr88192@hotmail.com (BGB / cr88192) (2010-01-01)
Re: Fat references anton@mips.complang.tuwien.ac.at (2010-01-02)
Re: Fat references gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-01-02)
[21 later articles]
| List of all articles for this month |

From: Jon Harrop <jon@ffconsultancy.com>
Newsgroups: comp.compilers
Date: Wed, 30 Dec 2009 21:57:57 +0000
Organization: Flying Frog Consultancy Ltd.
References: 09-12-045 09-12-050
Keywords: storage, GC
Posted-Date: 30 Dec 2009 23:32:41 EST

Robert A Duff wrote:
> GNAT (the gcc Ada compiler) uses fat pointers by default to represent
> pointers to unconstrained arrays. (Unconstrained means different
> objects of a given array type can have different lengths.) The fat
> pointer consists of the address of the bounds and the address of the
> data.


I see.


> The purpose of these fat pointers has nothing to do with GC, though.
> GNAT does not currently support GC, except for the versions targeting
> .NET and the JVM.


Right. With my approach, the pointer to the run-time type can be used to
fetch a GC-specific function that traverses the references in a given type,
or a user-function to pretty print a value of that type.


> I don't fully understand the problem you're trying to solve, nor how
> fat pointers are the solution.


My original motivation was to evade typeless null pointers that afflict many
VMs including the JVM and CLR. Various operations hit a dead wall with
these nulls because they don't provide any type information. For example,
pretty printing null is a serious problem in F# because lists, sets, option
types and other types all want to use null as an efficient representation
of "empty" but the run-time system just sees a typeless null and can only
print "null". So F# wraps the empty list in a heap allocated object,
massively degrading the performance of most uses of lists.


However, with the benefit of hindsight, there is a simpler but still
efficient solution to this problem on the CLR: wrap the null in a value
type instead of an object. That way there is no heap allocation and
(assuming my performance results from HLVM carry) you don't see any
slowdown at all. Unfortunately, this would work perfectly in theory but, in
practice, the CLR has problems with structs inhibiting TCO. HLVM doesn't
have that problem though (thanks to LLVM!).


> Are you saying all pointers are fat (4
> words per pointer!?)? Or just pointers to arrays?


Yes, all pointers. Every single reference in the heap now occupies a
quadword. I even blindly load and store entire quadwords when reading and
writing references.


> Under what circumstances are you worried about an extra copy?


Extra copy of what?


> If the array is allocated on the C side, you have to concoct the meta data
> somehow...


That's no problem. If the array came from the C side then I must JIT compile
a stub function that accepts HLVM's fast calling convention and invokes the
external function using C's calling convention anyway, so I can put the
extra code to convert the C array into a fat reference in that stub,
injecting type information into the typed fat reference derived from the
FFI definition of the external function.


> Why not just say that arrays allocated on the C side are not
> subject to GC?


Simplicity is *very* important for me. I want to create a useful language
implementation with the minimum of effort (where useful means
high-performance parallel numerics to me) by drawing upon research without
doing anything new myself. My entire VM is still well under 2kLOC of OCaml
code. If I started adding special cases like different kinds of arrays then
it would complicate everything from the type system to the GC and shadow
stack.


> By the way, if you're interested in GC, you should join the
> GC list, which our esteemed moderator set up some years ago.


I'd love to. Where is it?


HLVM has another unusual characteristic in that it defines a high-level
language (e.g. with tuples and tail call elimination) that is not only the
target language but also the language the GC is written in. This has the
advantage that I can easily JIT compile type-specialized GC code to improve
performance. The approach seems to work very well because it makes
metaprogramming trivial (e.g. instantiating generic definitions per type is
trivial and very efficient) and was easy to implement.


--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.