Related articles |
---|
Re: language design tradeoffs [macro mayhem] tgl+@cs.cmu.edu (1992-09-25) |
Bliss-11 macro facility [was macro mayhem] (long) tgl+@cs.cmu.edu (1992-09-27) |
Newsgroups: | comp.compilers |
From: | tgl+@cs.cmu.edu (Tom Lane) |
Organization: | School of Computer Science, Carnegie Mellon |
Date: | Sun, 27 Sep 1992 00:35:14 GMT |
Summary: | The Right Thing for macros |
Keywords: | macros, design, syntax |
References: | 92-09-171 |
In a previous message I claimed that no one should be allowed to design
macro processors who hadn't studied the Bliss-11 macro facilities. This
naturally led to "where can I read about Bliss?" questions. I don't know
any readily accessible references, so here is a summary of the Bliss macro
facilities. I'll also try to explain why I think Bliss's underlying
syntax is more macro-friendly than C's. (I'm going to assume that you've
used C macros.)
Background about Bliss (skip if you've heard of Bliss before):
Bliss is a systems programming language designed at Carnegie-Mellon around
1970. It was very heavily used at CMU up until just a few years ago, and
was also used at DEC in their systems programming efforts for a number of
years (there's probably still a lot of Bliss code in DEC's proprietary
operating systems, for instance VMS). The macro facilities I'm about to
discuss were designed for the second-generation implementation of Bliss,
which was Bliss-11 for the DEC PDP-11; this compiler was the subject of
Wulf et al's famous book "Design of an Optimizing Compiler" (if you have a
copy, it's worth money :-). I believe the versions of Bliss used at DEC
had substantially the same macro facilities as Bliss-11.
The information below is taken from my ancient Bliss-11 Programmer's
Manual. It's undated, but I have an errata sheet that refers to the April
1974 printing of the manual, which seems about right. The macro facility
was apparently designed right about then, since a couple of features are
described as unimplemented in the manual, but were implemented according
to the errata sheet (and I remember using them around 1975-1976).
Doubtless there are people reading this list who remember Bliss better
than I; corrections are welcome. (In particular, if the macro facilities
were ever described in a published article, I'd like to have a citation.)
The right way to design language syntax:
The syntax of C poses a number of pitfalls for the unwary macro user.
Macro calls look much like ordinary function calls, and usually it's
desirable to make them act *exactly* like function calls, so that the
programmer needn't think about which way SOME_FUNCTION(X,Y,Z) is
implemented. As has been discussed before in comp.compilers, it's easy to
end up with one semicolon too few or too many, resulting in syntax
errors---or worse, invisible logic bugs.
Bliss is considerably more macro-friendly. First off, it treats semicolon
as a statement separator (a la Pascal) rather than a statement terminator.
In C, semicolon is a required terminator for simple-expression statements,
but it is not permitted as a terminator for other kinds of statements ---
notably { ... } blocks. Thus we have the well-known problem of
if (condition)
SOME_FUNCTION(x,y,z);
else
...
wherein the semicolon is *required* if SOME_FUNCTION is actually a
function (or is a macro expanding to a simple expression), while it is
*prohibited* if SOME_FUNCTION is a macro expanding to a bracketed
statement block. With Pascal-style syntax, the programmer would never
write a semicolon right before an "else", so it doesn't matter what the
expansion of SOME_FUNCTION is. (I don't want to get into the argument
about whether Pascal-style semicolons are natural or not --- I find them
so, but I grew up on Bliss and Pascal. My point is just that the
Bliss/Pascal rule is more consistent because it makes no distinction
between simple and compound statements, and this is exactly what you need
for macros.)
A more fundamental and innovative point is that Bliss does not distinguish
between statements and expressions. All the control structures that would
be considered statements in other languages are regarded as expressions in
Bliss. For example, "if X then Y else Z" is a perfectly good expression,
with value equal to the value of whichever of Y and Z is evaluated. This
approach eliminates in one stroke most of the C-macro syntax problems
we've been discussing, since they all come down to the question of whether
a macro call expands to an expression or a statement!
C actually has made a start in this same direction: it regards assignment
as an expression operator rather than a statement (which is rare in
languages of that vintage), and C has some control-flow capabilities
within expressions. But the Bliss approach is simpler and more general:
* Bliss's semicolon serves the purposes of both semicolon and the comma
operator in C. In Bliss,
( X ; Y ; Z )
causes X,Y,Z to be evaluated in series, and the result of the whole
expression is the value of Z (thus X and Y are only evaluated for
side-effects). X,Y,Z can be arbitrarily complex constructs.
* C needs two constructs, if-then-else and X ? Y : Z, to do what Bliss
does with just if-then-else.
* C's fundamental syntactic distinction between { ... } (statement grouping)
and ( ... ) (expression grouping) does not exist in Bliss. Bliss has
begin-end but the keywords are semantically the same as parentheses ...
you can write "X * begin Y + Z end" if you like. (People actually
do this, too, since it helps to clarify heavily nested expressions;
the compiler won't let you match a begin with a ), so you get some
extra error checking by using both types of grouping.)
Each of these regularities makes it easier to write general-purpose macros
that fit into any context without syntax problems.
There are some minuses to Bliss's everything's-an-expression approach,
having to do with the type (or more accurately, the lack thereof) of a
control structure's result value; but in a modern redesign those issues
could be dealt with. (I am not here to defend Bliss's approach to data
types :-).)
OK, now let's get to the macros:
Basic Bliss macros look much like macros anywhere: a call consists of a
macro name possibly followed by a parenthesized list of actual parameters.
This call is expanded into the macro's replacement text with actual
parameters substituted for formals. The macro declaration syntax is a tad
cleaner than C's. Here are a couple of typical Bliss macro definitions:
macro universal_answer = 42 $ ,
fatal_error (message) = begin
put_string(message);
exit(0)
end $ ;
The dollar sign is a reserved token used to terminate the text of a macro
definition; thus the definition text can extend across multiple lines with
no special effort. As with C, the macro body is considered to be a
sequence of tokens; tokenization and discarding of whitespace, comments,
etc occurs when the macro declaration is scanned.
In a macro call, the actual parameters (if any) are tokenized and then
substituted into the macro body's token sequence. The resulting token
sequence is then rescanned for more macro calls. Actual parameters are
separated by commas, but commas nested within parentheses or begin-end
pairs don't count (hence parentheses and begin-end pairs in an actual
parameter must ordinarily balance). Macro calls occurring within the
actual parameter list are expanded *before* counting up commas,
parentheses, etc, and substituting into the macro body. (This is "inside
out" macro expansion, whereas C uses "outside in" expansion. IMHO inside
out is better, although you can construct examples that favor outside in.)
While we are looking at this example, it may be worth emphasizing again
that a fatal_error() macro call is syntactically indistinguishable from a
simple function call; it's not necessary for the programmer to worry about
the fact that it actually expands to two calls. The begin-end pair is all
that's needed to ensure that the macro acts this way. (As with C, it's
usually advisable to put begin-end or ( ) around the text of a macro.)
The Bliss macro processor is integrated with the compiler's lexical and
syntactic analysis phases; in fact it comes between these two phases,
since it works on tokens. The macro processor also relies on the
syntaxer's symbol table for macro definition storage. One benefit of this
arrangement is that macro definitions are local to syntactic scopes. For
example, in
begin
macro local_hack(x) = .... $;
... statements ...
end
the macro definition automatically disappears after the block's end. We
can create a local redefinition of a globally defined macro if we want,
although we'd have to write something like
macro $quote local_hack(x) = .... $;
to keep the outer macro from being expanded at the point where we're
trying to name the inner macro. (More about $quote in a moment.)
Another advantage of integrating macro expansion into the compiler front
end is that we can have much more powerful and general token-mashing
capabilities. The "token pasting" # operator of ANSI C barely scratches
the surface of what can be done with these Bliss special forms:
$quote Prevents the macro processor from doing its normal thing to the next
token. For example, if a macro name is next then the macro is not
expanded, the name is just left alone. Within a macro definition,
"$quote $" puts a $ into the macro text instead of terminating the
definition. This allows nested macro definitions: the expansion of a
macro can define a new macro! Similarly you can $quote $quote, or
any of the other special forms, if you want to put one of them
into a macro text --- you'd need this if you wanted the special
form to be used in an inner macro definition, rather than
recognized as part of the current macro's definition or expansion.
At a macro call site, you can $quote a comma to keep it from being
treated as a parameter separator, and you can $quote a parenthesis
or begin or end if you need to put mismatched parentheses etc.
into an actual parameter.
$unquote Forces the next token to be evaluated. This is useful only
within macro definitions. Normally, names within a macro
definition text are just shoved, unbound, into the macro token
list. When the macro is expanded, those names will be bound to
whatever they mean at the macro call point. If you $unquote a
name within a macro definition, it is evaluated and bound to the
meaning it has *at the definition point*. For instance, you can
use $unquote to ensure that a macro references a global variable,
rather than some local variable that happens to have the same name
and be visible at the call point. (Notice that this is totally
impossible with C or any other pure-preprocessor approach; it
depends on the macro processor being part of the syntaxer.) In
our previous example of locally redeclaring a globally defined
macro, we could use "$unquote local_hack(x)" within the body of
the inner macro to refer to the *outer* macro definition.
$string(...) The string representations of the tokens of the argument
list are pasted together to form a single string literal. This is
similar to, but more general than ANSI C's # operator ... for one
thing, it can be used anywhere, not just in a macro definition.
The Bliss manual gives this example: after
macro outerr(num,msg) =
outstring(plit($string('ERR',num,': ',msg))$;
the call
outerr(4,'Invalid Data')
expands to
outstring(plit('ERR4: Invalid Data'))
(PLIT means pointer to literal; it is Bliss's formalization of C's
implicit literal-strings-represent-pointers-to-strings concept.)
I'm a little fuzzy on this, but I think that you could supply an
arbitrary compile-time-constant integer expression, and it would
be evaluated and converted to a decimal ASCII string --- another
feature not possible with an unintegrated macro processor.
$name(...) Like $string except that the created token is treated as a
name rather than a string literal. This is especially useful for
generating new, unique identifiers within successive expansions of
a macro (using a counter maintained through other language
features that I won't discuss). Another common use is for writing
global variable or function names that aren't valid according to
the Bliss lexical rules, for instance $name('ABC.XYZ'). (Very
useful in a systems language that has to interface to existing
code in other languages with different lexical rules...)
Actually, $string and $name represent tokenization operators that are
quite separate from the macro facility, but they are often useful in
macros.
Three other special forms, $length, $remaining, and $count, are allowed
only within macro definitions; they are useful with the advanced macro
types discussed below.
Advanced macro types:
So far we've seen parameterless macros and macros with fixed numbers of
parameters. These correspond to what's available in C. Bliss also
provides four types of macros with variable-length actual parameter lists;
these macro types allow effects that are impossible in C or most other
macro languages.
For all the variable-length macro types, the *call* syntax is the usual:
there are some number of actual parameters, comma-separated and surrounded
by parentheses. The macro *definition* uses either parentheses or
brackets to indicate how the actual parameter list should be broken up and
matched to the formal parameter names.
"Pass" macro (evaluated if there are any actual parameters):
macro name [] = body $;
If a non-empty actual parameter list is given, then the macro body
replaces the call; if the actual parameter list is empty, then the
expansion is empty. Within the macro body, $remaining can be used to
stand for the actual parameter list. ($remaining expands to the actual
parameter list with the outer parentheses removed, and with separating
commas changed to semicolons if required by the syntactic context; see
"default separators" discussion later.) This macro type looks fairly
useless, but it's actually quite handy in combination with the other
types, as illustrated below.
"Recursive" macro (evaluated if at least N actuals remain):
macro name (formal1,...,formalN) [] = body $;
If there are at least N actual parameters, then the macro body replaces
the call, with the first N actual parameters bound to formal1,...,formalN;
any remaining actuals are bound to $remaining in the same way as for a
pass macro. If there are fewer than N actuals, the expansion is empty.
This macro type neatly solves many situations where a variable number of
macro arguments are required. For example, if C had this feature, trace
messages with multiple levels of verbosity could work like so:
macro trace_message(level) [] =
{ if (global_trace_level >= (level)) printf($remaining); } $;
...
trace_message(1,"image size = %d x %d\n", xsize, ysize);
trace_message(5,"you'd hardly ever want to know that x = %d\n", x);
instead of the actual C practice where you need a separate macro for each
number of printf arguments you might want (or else an ugly hack with extra
parentheses in every call of the macro).
Recursive macros are so called because they often have recursive
definitions. The Bliss manual gives this example to illustrate pass and
recursive macros:
macro cond(bool,exp) [] =
if bool then exp el($remaining) cond($remaining) $,
el[] = else $;
Given any even number of actual parameters, cond(...) expands to a
properly constructed if-then-else chain. For example:
cond(C1,E1,C2,E2) => if C1 then E1 el(C2,E2) cond(C2,E2)
=> if C1 then E1 else cond(C2,E2)
=> if C1 then E1 else if C2 then E2 el() cond()
=> if C1 then E1 else if C2 then E2
"Iterated" macro (evaluated for multiple sets of actual parameters):
macro name [formal1,...,formalN] = body $;
A call should have M*N actual parameters for some integer M. The macro is
expanded M times, with the formals bound first to the first N actuals,
then to the next N actuals, etc. This differs from M calls of a simple
macro in three ways:
* Default separator tokens (commas or semicolons) are automatically emitted
between the iterations of the macro expansion.
* Within the macro body, $count expands to the number of iterations already
completed (thus 0 for the first set of actuals, next 1, next 2, etc).
* $remaining is bound to the as-yet-unused actuals.
If the number of actual parameters is not a multiple of N, no expansion
occurs for the last partial set of actuals, but they are accessible in
prior expansions through $remaining. (If I were doing the design today,
I'd probably make an odd number of actuals be an error condition.)
Iterated macros are incredibly useful for doing processing on individual
elements of a variable-length list. I had a program in which a large set
of counters counted different interesting events. The counters needed to
be manipulated as a group in several places (for instance, I wanted to
zero all of them in certain situations). I was able to set things up so
that the list of counters was explicitly written out in only *one* place
in the program text; to add a new counter I needed to modify only that
list. In a common include file I said:
macro Counters = ! the central list of counters
ThisCt, ThatCt, SomeOtherCt, etc etc $;
macro InitCounters [Ctr] = Ctr = 0 $;
The main program file declared the variables with
global Counters;
Anywhere I needed to zero the counter set, I said
InitCounters(Counters);
which would expand to
ThisCt = 0 ; ThatCt = 0 ; SomeOtherCt = 0 ; ... ;
(notice the default separators are semicolons, as required). Similar
macros took care of other group activities such as printing out all the
counters at once.
"Fixed iterated" macro (like iterated, but adds some overall parameters):
macro name (fixedformal1,...,fixedformalK) [formal1,...,formalN] = body $;
A call should have K+M*N actuals for some M. The first K actuals are
bound to the "fixed" formal parameter names; then the macro is expanded
like a plain iterated macro applied to the remaining actuals.
The Bliss manual gives the following example for loading values into
successive elements of an array:
macro load(array)[value] = array[$count] = value $;
call
load(tbl,4,3,7,0,1);
expands to
tbl[0] = 4; tbl[1] = 3; tbl[2] = 7; tbl[3] = 0; tbl[4] = 1;
Here we rely on $count to count the successive subscript positions, and
again use the default separators to get the right syntax.
Details: default separators and such
The iterated macro forms and $remaining generate punctuation tokens to fit
the context of the macro call or usage of $remaining. A separator token
is emitted between macro expansions or actual parameters; and in certain
contexts bracket tokens are generated before and after the whole
expansion. Here are the Bliss-11 manual's complete rules for the
generated tokens (no need to read this list closely):
1. Preceding context:
a. The separator ","
b. Declarator keyword such as MACRO, ROUTINE, BIND, etc.
c. The bracket "<"
d. The bracket "(" used as parameter list bracket for
function or macro call
e. PLIT (
f. PLIT
g. An expression or name (this covers all unspecified cases)
Separator generated: ","
Brackets generated: for cases f,g emit "(" and ")"
2. Preceding context:
a. The separator ";"
b. BEGIN
c. "(" used as open bracket of a block (ie not following
a macro or function name)
d. SET or NSET
e. OF following CASE or SELECT
Separator generated: ";"
Brackets generated: in case e, emit SET/TES for CASE,
emit NSET/TESN for SELECT
3. Preceding context:
a. Structure or linkage name in a declaration
Separator generated: ":"
If you haven't used Bliss some of these rules won't make much sense,
but it may help to know that Bliss has a couple of switch-like
constructs, written
CASE expr OF SET expr1; expr2; ... TES
SELECT expr OF NSET label1: expr1; label2: expr2; ... TESN
The detailed rules are not really interesting, I just want to make the
following points:
1. Yes, it's a bit of a hack. But it nearly always does the right thing.
2. Implementing such rules would have been quite painful if the macro
processor had not been integrated into the parser (compare 1d and 2c,
also 3a and 1g).
3. In an ab-initio language design, it should be possible to set up the
underlying syntax so that much simpler rules would suffice to get the
right separators. (Recall that original Bliss didn't have this macro
facility, so the macro design had to live with an existing syntax.)
A final detail is to specify the other special forms:
$remaining In the body of a pass, recursive, iterated, or fixed-iterated
macro, this special form denotes the list of actual parameters not yet
bound to a formal parameter. The list elements are separated and
bracketed by default punctuation as required by the context preceding
$remaining.
$length In the body of any form of macro, $length denotes the number of
actual parameters passed to the current macro call or iteration
(counting both those bound to formals and those bound to $remaining).
$count In the body of a recursive, iterated, or fixed-iterated macro, denotes
the recursion depth or iteration count for the current expansion of
the macro.
Summary:
The Bliss-11 macro facility provides an extremely useful set of features.
Common programming style in Bliss makes *very* heavy use of macros; almost
any standard programming pattern can be and usually is turned into a
macro. For example, in the Hydra multiprocessor-OS project, the sequence
lock semaphore;
access data protected by semaphore;
unlock semaphore;
was invariably represented by a macro CRITICAL(semaphore, access-actions).
(This macro was not quite as trivial as it looks, either; it had to deal
with the possibility of a signal being raised in the access action.)
I no longer have any Bliss code on-line, but perhaps someone at DEC can
supply some interesting or illuminating examples of Bliss macros.
I think the key lessons to be learned from the Bliss example are:
* Design the underlying language syntax with an eye to macro friendliness.
In particular, avoid gratuitous context-dependence of syntax.
* Get the macro expansion model right. C suffers terribly from the fact
that macros were practically unspecified by K&R; the way the first quick-
and-dirty implementations happened to work is the way we are stuck with.
* Tokenization has little to do with macro expansion. Make it a separate
facility.
* Consider integrating the macro processor with the compiler front end;
a good deal of synergy can be obtained thereby.
regards, tom lane
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.