compiling dynamically typed languages -- basic to sexy

Paul Wilson <wilson@carcoar.Stanford.EDU>
Sat, 23 Sep 89 01:53:26 -0700

          From comp.compilers

Related articles
compiling dynamically typed languages -- basic to sexy wilson@carcoar.Stanford.EDU (Paul Wilson) (1989-09-23)
| List of all articles for this month |
Date: Sat, 23 Sep 89 01:53:26 -0700
From: Paul Wilson <wilson@carcoar.Stanford.EDU>

First, let me say that *yes*, it is worth compiling dynamically
typed languages. Even if you still do all your type checking
at run time, things will usually go several times as fast.
The big win is that you avoid the overhead of interpreting
the statements of the program one at a time. For a great
introduction to interpretation and compilation, see Abelson
and Sussman (with Sussman), _Structure_and_Interpretation_of_
Computer_Programs_." It's not light reading, but it is jammed
with elegant, enlightening stuff. Scheme is used to implement
a Scheme interpreter, as well as a Scheme compiler. (Scheme is
a very elegant, modern, lexically-scoped dialect of Lisp.)


Anybody who's interested in advanced compilers for dynamically-typed
languages should check out Ungar & Chambers' compiler for
the language Self, a prototype-based object-oriented language.
Self is probably as dynamically typed as languages get, even
more so than Smalltalk. See their paper in the SIGPLAN '89
PLDI Conference proceedings (which was a special issue of
SIGPLAN Notices a few months ago).


The Self compiler does very sophisticated dynamic compilation.
That is, it compiles the procedures that are actually executed
on an as-needed basis. It tentatively inlines and inter-procedurally
optimizes methods (object-oriented procedures), invalidating
and recompiling code when the compiler assumptions are violated.
It does several kinds of type inference. (For example, noticing
that the receiver of a message that an object sends to itself is
necessarily of the same class as the sender. And by propagating
type information differentially along branches from tests, and by
splitting the control flow into two versions where type information can
be preserved that way.)


Some similar ideas were used in some APL compilers in the late
70's, and apparently forgotten by almost everybody since. (Greg
Jaxon pointed this out to me when I started to reinvent these
wheels and posted to the net about it.) See the APL '79 proceedings,
and an article in the Hewlett-Packard Technical Journal from
1978 or thereabouts, called something like "The Dynamic Incremental
Compiler of APL/3000" or something like that. (Sorry for the
approximate citations. I'll try to dig up the details.) These
techniques may still be in use in APL, but just not talked about much.


Ungar and Chambers came up with the same basic idea, but have taken
it *much* farther. I think theirs is the sexiest compiler in
existence, bar none. (Subjective judgement -- please, no flames.
Compilers that are really transparent just appeal to me.)


The APL work does have its own interesting features, though. In APL
you can tell statically if an array ever has more than one type
of data stored into it. When you first encounter, for example,
a 30-element one-dimensional array of integers at a certain
point in the program, you can compile a special version of the
code for it. In this case, it could be an unrolled loop with
the type-testing moved out of the loop. Next time, you check
to see if you're operating on another 30-element one-dimensional
array again. If so, you just execute the same code. If not,
you recompile, relaxing the assumptions and creating more
generic (but slower) code.




I think this could be a big win for polymorphic languages with
homogeneous collection types. For example, I could write a program that
applies a generic operation to every element of a collection,
without knowing what kind of collection or what kind of
elements it holds. (And without the compiler being able
to statically determine it, as Ada generics require.)


If the running program encountered an array of floats, it
would compile an unrolled loop to apply the appropriate
version of the generic operation (say, the floating point
"add") with no type checks. And if next time it encountered
a linked list of integers, it would compile an optimized
routine for that as well. In a language with an extensible
type system, this amounts to lazily compiling a potentially
infinite number of special-case optimized code segments.
(Which is why you can't do it statically.)


Similar techniques could be applied to structures with typed
fields. (E.g., variable x may be a foo or a bar, but if it's
a bar, its first field is always an int.)


It's important to note that none of this sacrifices dynamic
type checking -- it just optimizes most of it away. This
is important in interactive programming environments and
fault-tolerant systems, because you don't want a simple
bug to corrupt the integrity of data structures.


                      -- Paul
~
Paul R. Wilson
Software Systems Laboratory lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154) wilson@carcoar.stanford.edu
Box 4348 Chicago,IL 60680





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.