Re: How to implement dynamic typing?

George Neuner <>
Wed, 07 Apr 2010 18:57:18 -0400

          From comp.compilers

Related articles
[4 earlier articles]
Re: How to implement dynamic typing? (bartc) (2010-04-06)
Re: How to implement dynamic typing? (Paul Biggar) (2010-04-06)
Re: How to implement dynamic typing? (Ian Lance Taylor) (2010-04-06)
Re: How to implement dynamic typing? (Barry Kelly) (2010-04-07)
Re: How to implement dynamic typing? (2010-04-07)
Re: How to implement dynamic typing? (Ctalk Project) (2010-04-07)
Re: How to implement dynamic typing? (George Neuner) (2010-04-07)
Re: How to implement dynamic typing? (BGB / cr88192) (2010-04-10)
Re: How to implement dynamic typing? (bartc) (2010-04-11)
Re: How to implement dynamic typing? (Harold Aptroot) (2010-04-11)
Re: How to implement dynamic typing? (BGB / cr88192) (2010-04-11)
Re: How to implement dynamic typing? (BGB / cr88192) (2010-04-12)
Re: How to implement dynamic typing? (George Neuner) (2010-04-13)
[14 later articles]
| List of all articles for this month |

From: George Neuner <>
Newsgroups: comp.compilers
Date: Wed, 07 Apr 2010 18:57:18 -0400
Organization: A noiseless patient Spider
References: 10-04-009
Keywords: types
Posted-Date: 10 Apr 2010 00:10:11 EDT

On Fri, 2 Apr 2010 16:38:05 +0100, David Belier
<> wrote:

>Recently I finished chapter 7 of the dragon book (2nd edition) but I
>still don't understand how to implement dynamic typing.

Let's clarify some terminology. There are at least 2 definitions of
"dynamic typing" that you may run into.

The first and most common definition is based in mathematics.
Mathematical variables are not storage locations, but rather are name
bindings that stand-in for or represent values. Bindings have no type
of their own - where typing is required, bindings inherit their type
from the value(s) they refer to. The Lisp/Scheme family and FP
languages like ML, Haskel, etc. are examples which use this notion of
[Note that Haskell and languages in the ML family are not dynamically
typed - they are statically typed. They are mentioned here because
they (mostly) use the mathematical notion of binding.]

The less commonly used definition is computer centric and says that
only operations are typed ... values are just collections of bits
which can be interpreted in different ways for different purposes.
This idea is quite familiar to assembler programmers but is much less
common in high(er) level languages (although the 'B' language - a
precursor to 'C' - worked in this way, and there are strange anomalies
such as Ocaml's use of different operators for integer and floating
point arithmetic while otherwise being a fairly modern FP language).

>I don't get how you are supposed to store the type information with
>each variable and check it every time using an acceptable amount of
>resources and time.

You store the type with the value - variables (conceptually at least)
become pointers to values. Many interpreted languages use a tagged
union representation for values and all variables are really pointers
to these value unions.

Compiled dynamic languages like Lisp, Scheme and others often separate
atomic (register sized) values from structured values. Structured
values exist on the heap and are very similar to C++ objects - they
typically have a hidden type field which points to a compiler
generated descriptor.

Atomic register sized values - pointers, small integers, characters -
are often tagged by stealing a few bits from the binary
representation. This usually means reducing the range of the tagged
value ... for example, many 32-bit Lisps have 29 or 30 bit integers.
Storing the tag in the value in this way is called "immediate"
tagging. Note that floating point values, although register sized,
cannot be tagged in this way.

Immediate tagging schemes are designed to avoid the need for untagging
values whenever possible and thus are generally CPU specific to take
advantage of any help the hardware may offer.

If you take high bits, you can use 2's complement integers directly by
using separate tags for positive and negative values. For example if
you take 2 high bits and define your tags as:

    00 - positive integer
    01 - ...
    10 - ...
    11 - negative integer

the integer tags are identical to what the corresponding bits would be
in an untagged integer. With high tags, you *must* check for overflow
because overflow might change the tag - so after arithmetic you must
check that the result is still an integer. On most CPUs, high tags on
a pointer must generally be removed by masking before the pointer can
be used, however, masking can sometimes be avoiding if addresses don't
require full width pointers or by clever use of the virtual memory

If you use low bits, an integer tag does not affect arithmetic, but
the value may have to be scaled (shifted) before use. However, for
indexed addressing, the index generally has to be scaled for element
size anyway (unless you have a VAX) so much of the need for scaling is
by necessity. For pointers, some CPUs are word aligned and will
ignore the low 2 or 3 bits of an address. Additionally, most CPUs
have displacement addressing modes which can compensate for a small
offset in a pointer, so using low bits for tagging may not impact
pointer range or performance much at all.

Because most current CPUs can compensate for offset pointers, many
immediate tagging schemes use low bits and define the integer tag to
be 00(0) so that integer arithmetic is simple.

There is also a clever way of using virtual memory for typing. Called
BiBOP (for "Big Bag Of Pages"), it relies on reserving a large range
of addresses for each distinct data type - sort of like using
segmentation on x86 with a different segment prefix for each type.
BiBOP typing allows data like floating point values and fixed size
structured types such as complex numbers, ratios, conses (pointer
pairs), etc. to be stored on their natural alignment boundaries
without wasting space.

WRT performance: a lot of type checking can be elided by control flow
analysis - if the type of a value already has been checked by a prior
operation, it does not need to be checked again unless it has been
changed. Primitive functions in Lisp (akin to operators in C)
generally have 2 entry points: one which checks arguments and one
which doesn't. Good Lisp compilers work hard to avoid primitive type
checking wherever possible and can sometimes provide this kind of
optimization for user written functions as well.

> Can someone please name papers or websites to read?

I see Rob already mentioned:

      David Gudeman, "Representing Type Information in Dynamically Typed
      Languages", University of Arizona at Tucson, Technical Report
      TR93-27 (1993).

      Highly recommended!

See also:

      Peter A Steenkiste, "The Implementation of Tags and Run-Time
      Type Checking", in Topics in Advanced Language Implementation,
      MIT Press, 1991, pages 3-24.

      Robert A. MacLachlan (ed), "Design of CMU Common Lisp", 2003.

      Cyrille Comar and Brett Porter, "Ada 9x Tagged Types and their
      Implementation in GNAT", 1994, ACM 0-89791-666-2/94/0011-0071.

      Gary J. Dismukes and Mike A. Rome, "Implementing Tagged Types
      and Type Extensions for Ada 9X", 1992,
      ACM 0-89791-529-1/92/0011--0068.

      Olin Shivers, "Data-Flow Analysis and Type Recovery in Scheme",
      CMU School of Computer Science, Technical Report CMU-CS-90-115,

      Alexander Aiken and Brian R. Murphy, "Static Type Inference
      in a Dynamically Typed Language", 1991.

      Robert L. Akers, "Strong Static Type Checking for Functional
      Common Lisp", Computational Logic Inc., Technical Report 96,

      Sheetal V. Kakkad, Mark S. Johnstone, and Paul R. Wilson,
      "Portable Run-Time Type Description for Conventional Compilers",


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.