Summary: Representations of Dynamic Type Information (David Gudeman)
Fri, 30 Apr 1993 02:31:35 GMT

          From comp.compilers

Related articles
Representations of Dynamic Type Information (1993-04-23)
Summary: Representations of Dynamic Type Information (1993-04-30)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (David Gudeman)
Keywords: types, summary
Organization: U of Arizona CS Dept, Tucson
References: 93-04-095
Date: Fri, 30 Apr 1993 02:31:35 GMT

Well, thanks for all of the responses. I got several new ideas, and got
reminded of several things I had forgotten to include. It would be a
little impractical for me to "summarize" the entire report here, since it
is 26 pages long. However, there is a fairly complete draft available (at
26 pages it ought to be fairly complete :-), and if anyone is interested
in reading it, it is available in compressed postscript by anonymous ftp
from in the file "/usr/ftp/janus/jc/". If anyone
has trouble getting this or needs an uncompressed file, or wants the latex
source, let me know. I haven't had time to discuss all of the ideas I got
from the net, so I've included draft notes in the text to give a hint
about these (and I've tried to credit the people who wrote me). A few of
the draft notes are essentially the remainders of my outline. Since I
have to go on to other work, the report is probably going to remain in
this unfinished state for a while.

For those who are curious, but not curious enough to read a 26-page
report, here is a rough outline of the methods (this does not include all
of the tricks and special cases)

tagged-words (type information is in the machine word)
    tag fields (word is broken up into tag and data fields)
        tag field in high end (most-significant bits) of word
              use tag of all zeros for one type to avoid tagging cost
              negative ints get a tag of all ones, non-negative ints
                                                                    get a tag of all zeros
              use sign bit for one type
                  use sign-bit = 0 for one type and optimize another type by
                                                                    giving it the tag of all ones in the
                                                                    high end and tagging by negation.
        tag field in low end of word
            use n-bit tags to represent pointers data aligned on n-byte boundaries
                                                                    this allows tagging without shifting
                use a tag of all zeros to avoid tagging and untagging
            use tags that contain only one non-zero bit to make testing faster
            use all zeros to optimize integer arithmetic
            optimize integer arithmetic by adding/subtracting tag
                                                                    after subtraction/addition
        tag field in both ends of word
              various combinations of tricks from the other two options
    partitioning by pattern (type is encoded in the representation of the value,
                                                      no boxing/unboxing is done, usually words are
                                                      partitioned into ranges of numbers they rep.)
        simple range tests to identify types
        segments with statically determined types (a segment is a range of
                                                      numbers that can be identified by an
                                                      initial field in the bit-patterns used to
                                                      represent them).
        segments with dynamically determined types
            use BIBOP (table indexed by segments) to identify types
            identify type in the segment itself
                first word of segment
                last word of segment
            on a segmented architecture like the 80x86, one location has
                                                      many addresses so the (machine) segment
                                                      part of the address can be set to the
                                                      (representation) segment of the data type
                                                      being represented, and the offset can be
                                                      set in such a way that any object can be in
                                                      any machine segment.
            on machines that ignore the high bits of pointer, these bits can
                                                      be used for free boxing/unboxing
    make everything a legal IEEE float, using the NaN bit patterns to
                                                      encode all other types of data.

object-pointers (untagged pointers refering to self-identifying blocks
                                  on the heap)
    combinations of this scheme with the tagged-word scheme

descriptors (two-word data elements divided into a type word and a
                          value word)
    encoding a qualifier (address+length representation of a sequence)
                                                      in a descriptor
    encoding a cons cell in a descriptor
    direct representation of floats

segregated type codes (type information is kept elsewhere)
    type information for globals kept in a global area
    type information for locals kept in stack frame
    type information kept in header of aggregate objects

Thanks to Paul Tarau, Richard Fateman, Mikael Pettersson, Nick Thompson,
Andrew Wright, Benjamin Goldberg, Aubrey Jaffer, Eric Benson, Marc
Brandis, Kelvin Nilsen, Stavros Macrakis, Alexandr Kopilovich, Pekka
Pirinen, William Griswold, David Keppel, Marc-Michael Brandis, Mark
Tillotson, Richard Brooksby,, Hintermeier Claus,
Hendrik Boom, and Tim Lindholm for your help.
David Gudeman

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.