Re: Fat references

"BGB / cr88192" <cr88192@hotmail.com>
Fri, 1 Jan 2010 14:35:49 -0700

          From comp.compilers

Related articles
[5 earlier articles]
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2009-12-30)
Re: Fat references kkylheku@gmail.com (Kaz Kylheku) (2009-12-30)
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2009-12-30)
Re: Fat references gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-12-31)
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2010-01-01)
Re: Fat references cr88192@hotmail.com (BGB / cr88192) (2010-01-01)
Re: Fat references cr88192@hotmail.com (BGB / cr88192) (2010-01-01)
Re: Fat references anton@mips.complang.tuwien.ac.at (2010-01-02)
Re: Fat references gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-01-02)
Re: Fat references bobduff@shell01.TheWorld.com (Robert A Duff) (2010-01-02)
Re: Fat references jon@ffconsultancy.com (Jon Harrop) (2010-01-03)
Re: Fat references DrDiettrich1@aol.com (Hans-Peter Diettrich) (2010-01-03)
Re: Fat references bear@sonic.net (Ray) (2010-01-03)
[17 later articles]
| List of all articles for this month |

From: "BGB / cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Fri, 1 Jan 2010 14:35:49 -0700
Organization: albasani.net
References: 09-12-045 09-12-055 10-01-005
Keywords: GC, architecture
Posted-Date: 02 Jan 2010 12:22:34 EST

"Jon Harrop" <jon@ffconsultancy.com> wrote in message
> Kaz Kylheku wrote:


<snip>


>>
>> So now you have references that can't fit into a machine register.
>
> If you don't have 128- or 256-bit registers on your 32- or 64-bit machine,
> respectively.
>


one needs AVX to have 256 bit registers at present (on x86 / x86-64 and
friends).


given AVX is not (yet) available in general "run of the mill" CPUs,
this seems to be a bit agressive. similarly, 256 bit pointers are
hardly cheap...




>>
>> How do you fit three pointers and a tag into a ``quad word''?
>
> I grew up with 32-bit words (ARM) and meant 4*sizeof(void *).
<merge>
> No, there is no consistent meaning.
>


granted, many also live with x86, where "quad word" has historically
been 64 bits. this may be because in x86 land, it is still fairly
common to use 16-bit words for many operations.




>> From your description, it seems that what you describe resembles a
>> handle-based approach whereby some objects are split into two parts: a
>> fixed size record, which contains a pointer to the remaining storage, in
>> addition to other fields, like a type tag and whatnot. The fixed size
>> records are allocated in an array fashion from heaps, where garbage
>> collection takes place. However, the records are not regarded as
>> references, and are never passed by value. The reference values that
>> are manipulated by the program are actually pointers to these records,
>> not the records themselves. So in other words, you keep your fundamental
>> value/reference type of a size that fits into a register. If you need
>> atomic operations, you can do them nicely on that type. Variables,
>> including function arguments, and the slots of structure and vector
>> objects, are all just one pointer wide, so you can do a compare-swap (or
>> whatever) on any memory location that holds a value, whether it's a
>> local variable, array element, etc.
>
> I see. That is not what I'm doing but, besides atomicity, what is the
> motivation for trying to squeeze a reference into a pointer?


the main thing I am thinking here is:
1, saving memory:
given that, typically, pointers are one of the most common value types
(especially in dynamic languages), large pointers translates to HUGE
increases in memory use.


2. performance:
increasing memory use greatly reduces cache efficiency, and as a result,
code will run much slower on average.




as I sometimes say, having lots of a resource is not a good reason to
waste it. just because a modern computer has > 1GB of RAM (my desktop
has 6GB and my laptop 4GB), does not mean it is a good idea to
increase memory use by a huge factor without good reason.


it is typically better to use more of a resource to have more stuff, than to
squander it and have the same, or less, overall.


much like how now, a person can easily have multiple TB of hard-drive space,
yet data compression has by no means become meaningless.


to make little savings at the bottom can often translate to big savings
overall...




hence my idea is maybe to consider a scheme where, most of the time,
pointers remain at their natural size, and can occasionally route through
another structure for added functionality (this structure need not be
necessarily heap-based). alternatively, most info can be placed in the
object (as is often done).


in the structure based strategy, also possible could be to have an "object
indirection table", which would basically be used for any item addressed via
a fat pointer. in this case, the narrow pointer encodes an index or pointer
into this table, and the table in turn refers to the object (with any added
info).


this would allow greatly reducing overall costs to multiple references to
the same object.
more extreme would be to make nearly all references index via this table
(possibly even allowing 32-bit "pointers" to be used on a 64 bit system, if
desired).


I actually before did something roughly similar before when using a
tagged-reference based memory manager, mostly noting that this indirection
actually made the whole thing, on average, faster than encoding a pointer
directly into the tagged reference.


the main weak side of tagged references is that they are a hassle to work
with from C, but at the same time, the same could be said of fat pointers
(without explicit compiler support). actually, it may well be worse, because
at least a tagged reference will usually fit nicely into an unsigned integer
or similar...




actually, I don't think it is a coincidence that there are many high level
VM's which use tagged references, but relatively few which use fat pointers.


a simple example of a tagged reference scheme:
low 3 bits:
0/4: fixnum (2-bit 0 tag)
1/5: flonum (2 bit 1 tag)
2: object reference (28 bit index into object heap/table)
3: cons reference (28 bit index into cons-cell heap)
6: large object reference (28 bit reference into large objects table)
7: expanded reference, 5 additional tag bits
        0: reserved
        1: special constants (NULL, UNDEF, EOL, TRUE, FALSE, ...)
        2: character
        3: remote object handle
        ...


even on a 64-bit system, a person may still use 32-bit tagged values,
meaning LESS space use than if they used pointers (although, there is a
finite limit on practical heap size, for example, 4GB worth of cons cells,
...).


the reason for giving so much space to fixnums and flonums is that these are
usually very common types, and it is desirable to minimize the need for
heap-allocated integers or floats.


similarly, 30 bits is a bit more accurate than 28 bits (what I currently use
on 32-bit x86 for flonums in raw pointers...), and so is more comprable to a
single-float (at 28 bits, the approximation as not as good, and at 24 bits,
the accuracy is just poor...).




sadly, it would require 64-bit tagged references to have double-approximate
tagged references.


either way is still much less than with fat pointers...




note that, in my compiler lower end, I have actually ended up packing a good
portion of the C typesystem (representation and semantics) into a 32-bit
integer (via "bit twiddling ninjutsu"), although some more complex types
(notably function-pointer types and multi-dimensional arrays) need to
"overflow" into a structure-based representation (certain bit patterns
indicate this case, so the type is still passed around as a 32-bit integer).


in most other places though, I use a string based represantation (the so
called "signature-based" type system), since this is much more generic (and
does not have to deal with terrible-complicated bit-twiddling logic...).


however, in the lower-end, the bit-packed representation does have the
advantage that it allows more easily doing predicate operations (basically,
returning true/false based on whether a type exhibits a given property,
which is the main way used for handling types, as oppesed to the more common
set-based view of types).


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.