Re: Extension Languages

Dave Gillespie <daveg@thymus.synaptics.com>
Sat, 19 Dec 1992 01:52:26 GMT

          From comp.compilers

Related articles
[2 earlier articles]
Re: Extension Languages davis@pacific.mps.ohio-state.edu (1992-12-14)
Re: Extension Languages daveg@thymus.synaptics.com (Dave Gillespie) (1992-12-15)
Re: Extension Languages drw@kronecker.mit.edu (1992-12-16)
Re: Extension Languages macrakis@osf.org (1992-12-17)
Re: Extension Languages ludemann@quintus.com (Peter Ludemann) (1992-12-17)
Re: Extension Languages macrakis@osf.org (1992-12-18)
Re: Extension Languages daveg@thymus.synaptics.com (Dave Gillespie) (1992-12-19)
| List of all articles for this month |
Newsgroups: comp.compilers
From: Dave Gillespie <daveg@thymus.synaptics.com>
Organization: Compilers Central
Date: Sat, 19 Dec 1992 01:52:26 GMT
References: 92-12-087 92-12-064
Keywords: design



Stavros Macrakis writes:
> Any of these notations are "easy for the machine", although some of the
> tables will be larger for infix notation.


and Peter Ludemann writes:
> 20-30% of the CPU time is spent in scanning for the next non-blank ---
> something a parser for any language needs to do (this scanner loop is in
> tightly coded assembler, by the way). As the standard method of handling
> infix languages is to push things onto a stack, the cost of interpreting
> an infix language should be the same as for prefix and postfix languages.


These are valid points, but they're not quite what I was trying to get at
in my original message. I guess I wasn't very clear about it. Let's try
to figure out what we really mean by the terms we're using.


"Post/pre/infix" can, as purely syntactic terms, be taken at two different
levels. There's the actual placement of the operator in an expression
involving two operands, which is admittedly a fairly minor issue, but
there's also the more general linguistic philosophy that tends to
accompany the three methods. I was trying to address the philosophies as
well as the actual notations, since that's what the original poster seemed
to be asking about.


If a language is designed so that all functions or operators take a fixed
number of arguments (or more generally, a self-delimiting number of
arguments), and if every function/operator always evaluates its arguments
except when explicit quoting is used, then it's possible to use a postfix
syntax which allows for extremely easy evaluation: Each argument pushes
its value onto the stack, so that when the operator is seen the arguments
are already sitting there in the proper order, ready to be taken off and
used. Forth and PostScript are good examples of such languages.


Some languages want to allow functions which take variable numbers of
arguments (without requiring explicit argument counts or terminator
arguments), or they want to provide implicit quoting of certain arguments
(where, for example, Lisp's "(setq x y)" quotes the first argument so that
it can operate on the variable itself rather than on the variable's
current value). Both of these can be accomplished by adding an explicit
notation to delimit the argument lists, such as Lisp's parentheses.
Experience seems to show that prefix notation works best with languages at
this level, where pursuit of simplicity is still a major concern. We tend
to think of these languages when we refer to "prefix languages."
(Traditional "f(x,y)" function calls without any operators, like
Mathematica's FullForm, also fit this description.)


Both notations have the property that the natural intermediate form for
storing a scanned-in program closely resembles the source form in
structure.


"Prefix" languages make the fine structure more explicit, so they tend to
be better suited when programs want to operate on other programs as data.
In an embedded language like Emacs Lisp, this has the advantage that many
of the interpreter's algorithms can double as primitives available to the
user, and also that much of the interpreter can be written in its own
language rather than the (usually less concise) implementation language.
Since we like our embedded languages to be small and unobtrusive, this is
a big win. Also, since embedded languages have to cope with changing
environments, the ability to operate on code is useful in its own right.


If simplicity of the interpreter isn't so great a concern, a design may
opt for a more complex language that more closely matches the concepts
human programmers like to work with. This produces languages which tend
to have a variety of infix, prefix, and postfix operators, plus various
syntactic elements like commas, semicolons, braces, etc. When we refer to
an extension language being "infix," I think we're usually thinking about
these sorts of languages.


The problem with "infix" languages is that because of their mixture of
human-style implicit assumptions and redundant syntax, they tend to be
unsuitable for direct use as intermediate languages, while also making the
need for an intermediate language more important because the language is
less tied to the constraints that make an implementation run fast. So
except for the simplest BASIC interpreters, most "infix" languages compile
to an intermediate form that's closer to a prefix or postfix language in
nature, even in so-called "interpretive" systems. Stavros brings up Lisp
M-expressions as an example, although I can't help noticing that the early
Lispers gave up on M-expressions because S-expressions (that is, prefix
notation) turned out to be easy enough to use directly.


So infix systems generally parse the input into an intermediate form that
is fairly far removed from the source language that humans work with. As
long as only the compiler or interpreter ever see the intermediate form,
this doesn't really matter. But in a language like Lisp where the user
works with the intermediate form directly, there is a lot to be gained
from having the intermediate form "look" just like the source form. It's
a lot less confusing, as well as making the parser a whole lot simpler.
Both Stavros and Peter point out that a more complicated parser doesn't
necessarily have to be much slower than a simple parser, but it *does*
tend to be a lot larger, and compactness of the interpreter is important
for embedded languages.


Stavros continues:
> 1) You seem to assume that postfix syntax translates into difficulty of
> accessing the operator. But the syntax doesn't tell you anything about
> the internal representation.


But my assumption was based on the desire to have the internal
representation be close to the syntax. When I said postfix was slightly
slower for Lisp-like languages, I meant that, given a direct mapping of
syntactic order onto internal Lisp lists, postfix has a slight extra cost
because the argument list must be scanned once to reach the operator
before you can know how to evaluate the arguments. It was only meant to
be a minor point, though. A system that stored argument vectors instead
of lists, for example, would not have this problem.


I'm pretty confident that prefix is more natural than postfix, all else
being the same, as least to English readers. Because a Lisp operator can
affect the way the arguments are interpreted, people reading left to right
want to see the operator first. That seems like a much more compelling
argument against postfix, unless you're going to go the whole postfix
route and have no parentheses to delimit the arguments.


> 2) Many machine-oriented languages are postfix, precisely because they are
> EASIER to interpret. The problem comes only with non-strict operators
> like "if", which such languages avoid by using explicit gotos.


Or by quoting, as with PostScript's "/" and "{ }" notations. Postfix is
indeed great if its utter minimalism is enough for you, but if you want to
write large programs, or do anything other than simply interpret a stored
program, it gets clumsy. (Even PostScript was designed to be written
mainly by other programs, where the human-friendly structure goes in the C
program that generates the PostScript rather than in the PostScript
itself.)


> I've also seen languages which are Lisp-like in essence, but with C-like
> parsers added on top to make them more palatable. In other words, they
> have a parser which reads "a+b" into the list "(+ a b)".


I've even used this technique myself. But it does complicate the parser,
and I can understand that such creature comforts might not seem worthwhile
in systems (like Emacs) which don't envision major applications being
written in the extension language (my own crazed efforts notwithstanding
;-).


> Yes, there are Lisp-like languages which provide an infix syntax, or even
> Lisps which happen to have an infix reader as well. But they are not
> Lisp-like because they represent the program as a tree! After all, many
> language processors represent the program as a tree at some point....


Right, they are Lisp-like because they represent the program as a *data
structure of the language* which is a tree. A true Lisp fanatic might
argue that your last point merely shows that other languages must convert
themselves to Lisp in order to get anything done!


-- Dave
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.