The melting ice technology (1): compilers & interpreters

bertrand@eiffel.com (Bertrand Meyer.)
Mon, 9 May 1994 17:53:17 GMT

          From comp.compilers

Related articles
The melting ice technology (1): compilers & interpreters bertrand@eiffel.com (1994-05-09)
The melting ice technology (2): levels bertrand@eiffel.com (1994-05-09)
Re: The melting ice technology (2): levels peter@objy.com (1994-05-12)
Re: The melting ice technology (2): levels mhcoffin@tolstoy.uwaterloo.ca (1994-05-13)
Re: The melting ice technology (2): levels papresco@undergrad.math.uwaterloo.ca (1994-05-13)
Re: The melting ice technology (2): levels lgm@polaris.ih.att.com (1994-05-14)
Re: The melting ice technology (2): levels pdlogan@ccm.jf.intel.com (1994-05-16)
| List of all articles for this month |
Newsgroups: comp.lang.eiffel,comp.compilers
From: bertrand@eiffel.com (Bertrand Meyer.)
Keywords: Eiffel
Organization: Interactive Software Engineering, Santa Barbara
Date: Mon, 9 May 1994 17:53:17 GMT

There has been some discussion recently on comp.lang.eiffel as to whether
the recently announced $49.95 Personal Eiffel for Windows used compilation
or interpretation.


Almost all the comments that have been made are correct; the trouble is
that ISE's Melting Ice Technology is sufficiently novel as to render
traditional classifications, in particular ``interpreted'' vs.
``compiled'', no longer exactly applicable.


This message gives some simple theoretical background. A companion one
applies these notions to ISE Eiffel.


                                        ------------


A computing device `interp' is a mechanism that given a program text P and
a set of input data D for that program will be able to run the program on
the data to yield the results R of that particular run. Mathematically one
may model `interp' as a two-argument function:


/1/ R = interp (P, D)


This does not quite capture the picture yet since we must take into
account the programming language L in which program texts such as P are
written. If we make L explicit then `interp' should really be written as
`interp [L]'. (If this were not ASCII we might use a subscript for L.) So
the above property becomes


/2/ R = interp [L] (P, D)


If we had a machine that for any language L would automatically produce a
silicon version of `interp [L]', and that version were efficient enough
for any L, P and D, things would be easy. Unfortunately they are a little
more complicated. What we can really get at not too much expense is an
automaton


interp [S]


where S is a simple language - simpler than most of the L we may be
interested in, such as Eiffel, Fortran or Miranda - such that for any L of
interest to us we have some hope that someone (the compiler writer) will
be able to write a translator `comp' from L to S at a reasonable cost. So
the actual execution mechanism is better captured by


/3/ R = interp [S] (comp [L, S] (P), D)


  [To avoid any confusion about the use of brackets and parentheses
        and their precedences: the notation


                ff [A, B, ...] (x, y, ...)


        means: the application of a certain function to the arguments
        x, y, ...; the function to be applied is itself obtained
        by applying the mechanism `ff' to the arguments A, B, ....
        Being able to use subscripts would make things much simpler!]


This is a quite general scheme for executing software on any computer. It
involves an interpreted part, represented by the function `interp', and a
compiled part, represented by the function `comp'.


In an extreme case, S is the machine code for some available hardware
architecture; for example S may be the 486 or Sparc instruction code, so
that we do not have to worry about `interp'; it is simply given to us by
Providence, through one of its benevolent envoys on earth, such as Intel,
Sun Microsystems or Cypress Semiconductors. This may be called the purely
compiled mode. But even in such a case by probing further one may usually
detect an interpreted component as well (because of the presence of
microprogramming and other lower-level layers). In addition some program
elements, such as printing formats, are typically left untouched by `comp'
and evaluated at run time only, meaning that `interp [S]' does have some
software component after all.


At the other extreme, there is no compilation whatsoever: `comp' is the
identity relation - that is to say, `comp [L, S] (P)' is just P for any P.
This case yields the first scheme, /1/, introduced at the beginning of the
present discussion, and may be called the purely interpreted mode. But
except for student projects it is almost never used in practice since it
would imply repeated parsing of the source text of P, and hence gross
inefficiencies. Even the crudest interpreters usually start by translating
the ASCII form of P into some internal form, so that they have a
non-identity `comp' component as per formula /3/.


In most cases, a programming language implementation uses a solution that
lies between the two extremes. It has an interpreted component `interp'
and a compiled component `comp' as per formula /3/, and is largely defined
by the description of the intermediate language S.


A common further refinement of this scheme is to use not just one target
language S but several such languages. For example we may rely on four
languages S4, S3, S2 and S1 where S4 is our source language L (such as
Eiffel) and each language in the list is at a lower level than the
preceding one, S1 being the lowest level - the same as S in the above
discussion. Then the function `comp' can be expressed as a composition:


/4/ comp [S4, S1] = comp [S4, S3] ; comp [S3, S2] ; comp [S2, S1]


[In this notation the semicolon `;' represents function
composition; in other words, for two functions `f' and `g',
their composition `f ; g' is the function `h' such that,
for any x, h (x) = g (f (x)). See the book ``Introduction
to the Theory of Programming Languages''.]


The interpreter function in formula /3/ must be `interp [S1]' for the
lowest-level language in the list, which is then the same as S. As a
result of the above comments, if we choose S1 to be the machine code for
our platform we essentially have a fully compiled implementation. But if
we choose a language at any higher level we have some degree of
interpretation.


I have used the example of four languages (that is to say, two
intermediate levels) because it makes things concrete and also because it
is close to what you will find in ISE Eiffel. But of course there can be
any number of levels.


To discuss language implementations, then, one should not just use the
term ``an interpreter'' or ``a compiler'', which are generally too
simplistic.


Instead one should describe the combination of interpretation and
compilation on which it relies. In particular this means describing the
list S1, S2, ... of intermediate languages; S1 is especially important
since it gives the level of the interpreted part.


Note that further discussions of these issues may be found in the book
``Introduction to the Theory of Programming Languages'', Prentice Hall,
1991, ISBN 0-13-498510-9 (0-13-498502 pbk). In particular the book
describes the ``currying'' mechanism which provides the necessary notation
to talk about the kind of functions needed - with or without subscripts.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.