Re: How to implement dynamic typing?

"BGB / cr88192" <>
Sat, 10 Apr 2010 05:12:01 -0700

          From comp.compilers

Related articles
[5 earlier articles]
Re: How to implement dynamic typing? (Paul Biggar) (2010-04-06)
Re: How to implement dynamic typing? (Ian Lance Taylor) (2010-04-06)
Re: How to implement dynamic typing? (Barry Kelly) (2010-04-07)
Re: How to implement dynamic typing? (2010-04-07)
Re: How to implement dynamic typing? (Ctalk Project) (2010-04-07)
Re: How to implement dynamic typing? (George Neuner) (2010-04-07)
Re: How to implement dynamic typing? (BGB / cr88192) (2010-04-10)
Re: How to implement dynamic typing? (bartc) (2010-04-11)
Re: How to implement dynamic typing? (Harold Aptroot) (2010-04-11)
Re: How to implement dynamic typing? (BGB / cr88192) (2010-04-11)
Re: How to implement dynamic typing? (BGB / cr88192) (2010-04-12)
Re: How to implement dynamic typing? (George Neuner) (2010-04-13)
Re: How to implement dynamic typing? (bartc) (2010-04-14)
[13 later articles]
| List of all articles for this month |

From: "BGB / cr88192" <>
Newsgroups: comp.compilers
Date: Sat, 10 Apr 2010 05:12:01 -0700
References: 10-04-009
Keywords: types
Posted-Date: 11 Apr 2010 00:21:31 EDT

"David Belier" <> wrote in message
> Recently I finished chapter 7 of the dragon book (2nd edition) but I
> still don't understand how to implement dynamic typing. I don't get
> how you are supposed to store the type information with each variable
> and check it every time using an acceptable amount of resources and
> time. Can someone please name papers or websites to read?

as others have said, it is not stored with the variable.

> Note: I have a strong C++ background and know x86 assembler. I can
> imagine a way to implement it doubling memory requirements and making
> the program dead slow in the process so there must be better ways. In
> the past I have also used Java and Python for small programs (CS
> course and writing file converters). They seemed to show reasonable
> performance.

well, I am not sure what way you are imagining.

but, in my recent VM / compiler, I have been using a mix of static and
dynamic typing, where in the case of fully dynamic typing, a variable may
fall back to using a special type designated to be dynamic. working with
this type may be handled differently than working with static types, as
instead of compiling the operations themselves, the compiler may compile the
operation as a runtime call (this is typically explicit when using dynamic
typing from plain old C). the runtime call is then responsible for
implementing the operation.

another case possible in my case is the use of "hybrid" dynamic typing,
where basically objects are allocated on the heap and used typically in a
statically typed manner, but then may be put into a dynamically-typed
variable and used this way instead (performing type-checks, ...).

something fairly similar to the above can be done readily in both Java and
C#, or in C++ if using RTTI and "dynamic_cast<>" (although of these, C#
probably provides the most cumbersome syntax IMO).

C++: if(dynamic_cast<Foo>(obj)) ...
Java: if(obj instanceof Foo) ...
C#: if(typeof(Foo).IsInstanceOf(obj)) ...

as a side note:
support for type tag/annotation is built directly into my garbage collector,
where they are stored in the object-header for the type (along with the
allocation size and a few other bits of info). this tag is itself
represented as a small integer value (currently 12 bits in my case) which
identifies a TypeInfo structure, which contains much more information (the
name of the type, some hashes and magic numbers, a pointer to a "vtable"
which contains a number of fairly common operations, ...).

typically, whenever allocating memory for an object, a type-name is
supplied, so that the appropriate internal structure is used. however,
client code doesn't generally need to see much of any of this (from the POV
of client code, it looks and behaves much like "malloc()" or similar).

so, dynamically-typed references in my case are generally represented as raw
note that the type-handling code may accept arbitrary addresses, but will
generally return a NULL type if it doesn't know what the type is (this is of
great advantage when dynamically-typed data may be mixed with raw memory
pointers, since raw memory need not be entirely excluded).

note, they are generally not represented as actual class/instance objects,
since in my case C/I OO is actually a different set of facilities, and so
provides its own internal facilities (one can determine if an arbitrary
reference is an instance of a class by first checking that the type-tag
indicates that it is a class/instance object, then followed by checking if
the class for the object is a subclass of the class is question).

admitted, type-checks are not free, so it is a good idea to use them usually
only when there is a good reason to do so (code may reasonably accept
multiple types), and instead rely on static typing for the vast majority of
code. granted, this is not possible for a dynanamically-typed language,
meaning that there is some (typically minor) performance overhead involved.

granted, storing everything as on-heap objects would be wasteful of memory
and hurt performance.

a trick I have found is to exploit that common architectures (Win32, x86-64
in general, ...) have a chunk of address space which is otherwise unusable.
this space can then be carved up into virtual regions representing various
types (such as smaller integers or float values). in Win32, this is the
high 1 or 2 GB of the process, and in x86-64 this is currently a very large
region (2^64 - 2^56 bytes). in Linux on x86 though, in some distros this
area of unusable address-range may not exist, and I don't currently know of
a good means to check for this (apart from trying to probe via "mmap()" or
similar, and possibly use mmap to allocate "unusable" regions as a

in any case, a pointer into one of these regions can then be interpreted as
if it were a various type, and the pointer value is used to indicate the
specific value. (type-checking is mostly a matter of seeing if a pointer
falls within the given address region).

as with memory objects, much of this is managed by the garbage collector
(although the logic for the specific types is implemented elsewhere).

sadly, the available regions (for x86) are not as large as is possible with
tagged values, so I have to make due with 28-bit integers and raw
pointer-based float values (custom / non-IEEE encoding). on x86-64, both are
given 48 bits.

however, given that this is possible with a space which is equivalent to the
raw C-based pointer-space, it is far more convinient and usable from C (in
my case my most-used language), and far cheaper overall than boxing values.

so, in all, it is working fairly well, although it took many years to find
good ways to do all this, and to find good tradeoffs (between convinience,
performance, ...). in general, I am fairly satisfied with the result
(although, with my GC it is a good idea to actually free objects when
possible, since this greatly helps with overall performance vs trying to
imagine that the GC is free...).

actually, this would be analogous to public locations:
there is a custodian, and they will clean the floors;
but, this doesn't mean it is best practice to simply throw all their trash
on the ground either.

so, instead, people generally use the trash can even though the custodian
exists, more so that everyone is happy and the custodian doesn't have to
work as hard. but, at the same time, one doesn't have to worry about every
little detail either (dropped bits of food, ...), since the floors will be
cleaned eventually.

granted, the idea of still freeing memory is apparently antithema in most
garbage-collected languages, but oh well. in my case, it makes everything
faster and the GC run less often, which is a good tradeoff IMO.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.