Re: Fat references

Jon Harrop <jon@ffconsultancy.com>
Wed, 06 Jan 2010 02:01:45 +0000

          From comp.compilers

Related articles
[24 earlier articles]
Re: Fat references anton@a4.complang.tuwien.ac.at (2010-01-04)
Re: Fat references kkylheku@gmail.com (Kaz Kylheku) (2010-01-04)
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2010-01-05)
Re: Fat references dmr@bell-labs.com (Dennis Ritchie) (2010-01-05)
Re: Fat references kkylheku@gmail.com (Kaz Kylheku) (2010-01-05)
Re: Fat references cr88192@hotmail.com (BGB / cr88192) (2010-01-05)
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2010-01-06)
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2010-01-06)
Re: Fat references gneuner2@comcast.net (George Neuner) (2010-01-07)
Re: Fat references gneuner2@comcast.net (George Neuner) (2010-01-07)
Re: Fat references gneuner2@comcast.net (George Neuner) (2010-01-07)
| List of all articles for this month |

From: Jon Harrop <jon@ffconsultancy.com>
Newsgroups: comp.compilers
Date: Wed, 06 Jan 2010 02:01:45 +0000
Organization: Flying Frog Consultancy Ltd.
References: 09-12-045 09-12-050 09-12-056 10-01-010 10-01-022 10-01-029
Keywords: storage, VM
Posted-Date: 08 Jan 2010 10:17:44 EST

Robert A Duff wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
>> Memory consumption is only increased for duplicated references
>> (i.e. when two or more references in the heap point to the same value)
>> but I believe that is comparatively rare.
>
> That's a good point.
>
> But don't forget to count 'null' pointers. For example, consider
> a balanced binary tree of depth 3. You've got 7 nodes and
> 7 non-null pointers to them (one from outside the tree).
> But you've also got 8 null pointers stored in the leaf
> nodes. As I understand it, those 8 null pointers are
> going to be 4 words each, in HLVM.


If you put null references in the data structure, yes.


> You can think of 'null' as being a pointer to a single
> dummy object -- and there are typically lots of duplicates!


Potentially, yes. In fact, that is exactly how F# represents many null
values: the empty list [] is represented as a pointer to a global
empty list of that type. That representation was chosen over a simple
NULL because it conveys run-time type information and, consequently,
makes it possible to pretty print the empty list.


HLVM certainly would lose out on that sharing if you used that
representation. However, if you use a better representation then you
get better performance and memory consumption and fat references do
not appear to be a hindrance. Specifically, you take any variant type
such as the type of a list:


    type 'a list = Nil | Cons of 'a * 'a list


and you special case type constructors with Nils in them so Cons(x, Nil)
becomes One x:


    type 'a list = Nil | One of 'a | Cons of 'a * 'a list


HLVM gives you at least 2^32 type constructors so adding an extra one
costs nothing but now the only lists that use Nil are empty lists, so
the number of null references has been dramatically reduced but you
still have all of the advantages (e.g. typed representation that can
be pretty printed).


This is so successful that I use an equivalent trick to special case
branches of tree with empty children in OCaml's Set data structure and
it improves performance 30% (and memory consumption). In fact, that
removes all of the null references from your example tree.


> Also, why you do say "in the heap" above? I'd say "Memory consumption
> is only increased for duplicated references (i.e. when two or more
> references point to the same value)." Doesn't matter whether those
> pointers are in the heap, or on the stack (or in a register -- four
> registers! -- for that matter).


No, because the optimizer eliminates redundant loads and
stores. Although HLVM naively generates LLVM IR asking it to load and
store entire quadwords, LLVM's optimization passes detect when the IR
requires less (e.g. is just reading the length of an array) and loads
only a single word.


The main exceptions are pushing and popping quadwords from the shadow
stack and passing fat references in function arguments (where they
consume 4 registers instead of 1).


> Anyway, if you've got good measurements, they trump my speculation.


Yes. I think these results are really surprising. Everyone I spoke to
when I was designing HLVM said that it was known to be a bad idea but
it obviously isn't.


--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.