Related articles |
---|
macros, grammars, and extensible compilers (thesis available) wilson@cs.utexas.edu (1996-11-14) |
From: | wilson@cs.utexas.edu (Paul Wilson) |
Newsgroups: | comp.compilers |
Date: | 14 Nov 1996 21:59:01 -0500 |
Organization: | CS Dept, University of Texas at Austin |
Keywords: | report, Scheme, available |
My student Stephen Carl has recently written a masters thesis that I
think should be of interest to everyone who works in Scheme, macros,
or extensible compilers. It describes the basic mechanisms we use in
the RScheme compiler for handling user-defined language extensions.
This thesis is available from http://www.cs.utexas.edu/users/oops/
I believe that Stephen's thesis is the first really clear exposition
of the issues in lexically-scoped macros, and presents the first
direct solution to the problems. The clarity and directness of this
solution makes our macro mechanism a suitable framework for an
extensible compiler or "compile-time reflection." We are working
towards an adaptable systems-programming language based on an
"exocompiler", where most compiler analyses and optimizations are
implemented as portable libraries, rather than in the compiler proper.
Recently, we've figured out how to do this for "traditional-looking"
languages like Java---it is not limited to languages like Scheme and
Lisp where the surface syntax of the language makes delimiters
obvious.
The mechanism presented in Stephen's thesis is an extension of the
Bawden-Rees "syntactic closures" idea, where expressions are closed in
"syntactic environments", which are essentially the "compile-time
environments" of traditional Scheme implementations a la Abelson &
Sussman. The meaning of an expression is fixed (mostly) by capturing
the context in which it's defined before transformationas are applied.
Our version of syntactic closures, called syntactic exposures, can
handle the advertent capture issues addressed by hygienic
macroexpansion. (I.e., you can define constructs like for which
introduce new lexical scopes, rather than just being pass-by-name
inlined constructs.)
Our macro processing is integrated directly into the compiler, and
macro processing steps are performed in a natural top-down way,
interleaved with the normal top-down compilation of nested
expressions. (Of course, this could be done in a precompiler, but
there are advantages to integrating it directly into whatever you
consider the "main" compiler.)
This compiler does no rewriting of expressions and no renaming of
variables. All scope information is encoded in the compile-time
environment, so that there is a natural representation of expressions
and scopes that can be used to directly represent scoping issues and
context sensitivity. When a macro call is encountered, the compiler
simply adjusts the compile-time environment to reflect the fact that
the scope changes at that point, and compiles the body of the macro in
the usual way, as though it had actually occurred at the call site.
To do this, we separate the two functions of the compile-time
environment, which are to represent the runtime context in which the
macro will execute, and to resolve identifiers correctly (scoping per
se).
(This is more powerful than the approach used in Dylan, which combines
traditional-looking syntax with macros. Our scheme can fully support
macros that introduce local macros, and their context-sensitive
grammar changes.)
Advertent capture (as with the for macro) and other scope
manipulations are performed with a mechanism call an "environment
virtualizer", which represents the fact that a macro may introduce
variables that are conceptually visible at the macro call site, even
though the actual binding occurs inside the macro. The environment
virtualizer is simply a new kind of compile-time environment frame
which acts as a placeholder for variables bound inside the macro, but
"pushed back" into the environment of call. This exposes advertently
bound variables (like the for loop index variable) to other macro
arguments that are evaluated in their scope.
The compile-time environment is a natural place to hang "context"
information, similar to attributes in an attribute grammar system.
When a macro call is processed, it can decorate the compile-time
environment, and it can query the compile-time environment for
information left by enclosing macro calls. This directly supports
facilities similar to inherited attributes. When a macro regains
control after its subexpressions (or some of them) have been compiled,
it can also query the compile time environment for information
recorded by macroexpansions of its subexpressions. This can be used
to achieve the effect of synthesized attributes.
Unlike a traditional attribute grammar system, this system is
extensible in user code without modifying the compiler, and the
grammar itself can be context-sensitive. That is, a macro may
introduce local macros that affect the parsing of enclosed
expressions. (That is, the transformation specified by a maro can
introduce new productions into the grammar, used locally within that
part of the program.) The parse tree is constructed in a natural
top-down way, and transformations at a given level can be used to
affect how enclosed expressions are parsed.
While the RScheme compiler currently uses a traditional Lisp/Scheme
surface syntax with fully delimited expresion nesting---i.e., lots of
parentheses---this is not necessary for our parsing scheme to work.
It is only necessary that the delimiters at a given level be
recognizable so that the arguments to a macro can be recognized.
(That is, you have to know where one argument to a macro stops and
another starts, even if you can't yet parse either one's internal
nestings yet.) Then you can apply the macro, which may introduce new
macros necessary for further parsing of subexpressions.
In developing this mechanism, we ave discovered a theoretically
interesting and very useful new class of top-down-parsable,
context-sensitive, transformational grammars. (That sounds
frighteningly complicated, but it's not.)
Traditionally, programming language grammars have been (approximately)
context-free, and language designers have shied away from
context-sensitive and transformational grammars.
We believe this was a mistake, and Lisp got it basically right (but in
a very awkward way) in 1963, with the introduction of s-expression
macros. Programming language grammars should be transformational and
context-sensitive because programming language syntax is sugared and
semantics is context-sensitive---scoping is just one good example.
The grammar of a language should be closely related to its semantics,
so that the language is understandable, the compiler is cleanly
structured and extensible, and early error detection is easy.
Our macro system can be seen as a step toward unifying the Lisp world
and the traditional programming language world---a step which we think
is at least three decades overdue.
McCarthy once joked that the original Lisp's dynamic scope rule was
due to McCarthy only having read the first half of the book on the
lambda calculus. We could make a similar joke that mainstream
programming languages' use of context-free grammars is because
language designers only read the first half of Chomsky's book on
grammars [Noam Chomsky, _Syntactic_Structures_, 1957].
Transformational grammar and context-sensitivity don't have to be
hairy, if you pick the right class of grammars, which are easily and
efficiently parseable top-down. We believe that this is the only
natural approach to parsing programming languages, and can avoid the
unnatural separation of syntax from so-called "static semantics".
--
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (wilson@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and Scheme interpreters and compilers available via ftp from
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.