Re: Why is compiled basic slower than C? (Basic is the future)

burley@geech.gnu.ai.mit.edu (Craig Burley)
Tue, 18 Aug 1992 16:40:56 GMT

          From comp.compilers

Related articles
[7 earlier articles]
Re: Why is compiled basic slower than C? (Basic is the future) burley@geech.gnu.ai.mit.edu (1992-08-14)
Re: Why is compiled basic slower than C? (Basic is the future) dbenn@leven.appcomp.utas.edu.au (1992-08-15)
Re: Why is compiled basic slower than C? (Basic is the future) pardo@cs.washington.edu (1992-08-15)
Re: Why is compiled basic slower than C? (Basic is the future) macrakis@osf.org (1992-08-17)
Re: Why is compiled basic slower than C? (Basic is the future) fjh@munta.cs.mu.OZ.AU (1992-08-18)
Re: Why is compiled basic slower than C? (Basic is the future) imp@Solbourne.COM (1992-08-18)
Re: Why is compiled basic slower than C? (Basic is the future) burley@geech.gnu.ai.mit.edu (1992-08-18)
Re: Why is compiled basic slower than C? (Basic is the future) pdg@crosfield.co.uk (1992-08-19)
Re: Why is compiled basic slower than C? (Basic is the future) pk@cs.tut.fi (1992-08-21)
Re: Why is compiled basic slower than C? (Basic is the future) robert@metropolis.com (1992-08-25)
| List of all articles for this month |

Newsgroups: comp.compilers
From: burley@geech.gnu.ai.mit.edu (Craig Burley)
Organization: Free Software Foundation 545 Tech Square Cambridge, MA 02139
Date: Tue, 18 Aug 1992 16:40:56 GMT
References: 92-08-042 92-08-095
Keywords: interpreter, performance, design

macrakis@osf.org (Stavros Macrakis) writes:


      burley@geech.gnu.ai.mit.edu (Craig Burley) writes:


            I think interpreted languages _could_ compile well if _lots_ of effort
            were expended. However, that kind of effort probably isn't worth it --


      Could you please define `interpreted language'? Is it anything more than
      `a language whose first implementation was interpretive'? In the present
      discussion, the language under question is Basic, whose first (and only
      for a long time) implementation was in fact compiled. Other languages you
      might call `interpreted', like Lisp, Prolog, ML, and Snobol, in fact have
      very good compilers, which make major differences in runtime.


I was using the apparent prevailing definition of "interpreted language"
which I then subsequently broke down in a few (but not all) ways. The
prevailing definition, to me, seemed to be "a language I believe is most
often implemented as an interpreter", where "I" is whoever posted the
original question.


Meanwhile, as I think I also pointed out in my post, BASIC was indeed
originally a compiled language, and the first implementation I ever used
(and hacked myself) was compiled. (Note, however, that anyone starting
out using that particular implementation of BASIC would likely assume it
was _interpreted_, and their coding style sometimes changed in obvious
ways once they learned otherwise. A lot of what I was addressing in my
post had to do with how knowledge of the implementation can govern how
code is written, and this can then affect how easy it is to build a
different implementation that mproves performance for existing code.)


            the performance gains compared to the effort, that is -- when contrasted
            to the effort expended in buying new, faster processors,


      The compiler speedup is complementary to the hardware speedup.


Agreed. Memory also is becoming much more plentiful, which generally
improves the ease with which compilers are written more than it does for
interpreters. Nevertheless, interpreters are still, to varying degrees
for each language, easier to write than compilers. Most people experience
BASIC as a personal computer implementation, essentially all of which have
been interpreters. That's probably because building a half-decent
compiler to run on a 4K machine is much harder than building a half-decent
interpreter for that machine.


            making the interpreters run faster (often quite easy),


      Interpreters can, of course, be bummed (and often are). But in many
      cases, this still leaves you far from the compiled speed.


Obviously. But writing the compiler is the trick. Simply writing _a_
compiler is not enough -- you must write _the_ compiler that produces
output that consistently and greatly exceeds the performance of the
fastest interpreter likely, since the fastest interpreter is probably much
easier to build than even a half-decent compiler, and is probably more
portable.


translating programs to
            maintenable code in a language designed for compilation, and so on.


      This is a possible approach. However, you lose whatever advantages the
      `interpreted' language had, e.g. more appropriate abstractions, easier
      debugging, fast turnaround, etc. There are of course compiler-based
      systems that provide easy debugging and fast turnaround e.g. most Lisp
      compilers, but also the CenterLine C (formerly Saber C) system.


Yes, but from what little I've seen, people who write code in BASIC rarely
have a problem with the speed of the resulting program. When they do,
they often are "ready" to move "up" to a language designed for high-speed
execution via compilation, such as C, Pascal, or Fortran, if not Assembly.


It would seem to make sense that there would be a large market for a
top-quality compiler for "interpreted languages" (again, languages for
which most users have access only to an interpreted implementation) such
as BASIC, Hypertalk, and so on, and the existence of this market would
cause such a compiler to be built. As far as I know, such compilers don't
permeate the market the way compilers for C, Pascal, Fortran, or perhaps
even LISP do. I can't say why that is, except to do the usual "punt to
market forces".


            I remember a discussion with rms (GNU EMACS author) regarding TECO, the
            language in which he first wrote EMACS. TECO was (and still is,
            primarily) an interpreted language (well, it's an editor, kind of like
            the MS-DOS DEBUG program is a partition editor :-).


      Teco is a weird language. Most commands are single-character, and the
      main loop of the interpreter essentially dispatches directly to the
      relevant routine. So it is a sort of human-readable (well, this is
      debatable, too) byte-compiled form.


Obviously it is more human-readable than an object file. One doesn't have
to worry about endianness. :-)


            He told me about how someone we both knew had developed (or help
            develop) a compiler for TECO and was prepared to demonstrate its
            superiority in benchmarks. But submitting one simple case
            (something like deleting every other line in the file) proved the
            opposite.


      Teco is an extreme case, but even so, you could probably _slightly_ reduce
      runtime by compiling to machine code. I certainly agree that in most
      cases, it would not be worthwhile to try to compile Teco code....


Right, and this makes the likelihood of being able to fund development of
a top-quality compiler for TECO more difficult. However, it is
technically feasible, probably roughly as feasible as doing a top-quality
BASIC compiler.


Remember, the original question was "why is compiled BASIC slower than
[compiled] C?" I'm saying the answer has a lot to do with the fact that
there doesn't seem to be enough of a market for a top-quality BASIC
compiler that might be able to challenge the superiority of C compilers.
On the other hand, an OOP BASIC compiler might be able to make a good run
at most C++ implementations.


(Of course, TECO is probably like APL in that the time
            it takes to read in source code and figure out what it means is
            minimal, as compared to C, FORTRAN, and COBOL. So the penalty for
            being an interpreter could be seen as significantly lower for terse
            languages like TECO and APL,


      Teco is not just terse, it requires no lexing or parsing. Like APL, many
      of its operators are bulk operators and so most runtime is within their
      inner loops. But even APL has been compiled with good results....


Yes, and even TECO, BASIC, and all other languages _can_ be compiled with
excellent results, perhaps even superior to what can be achieved via
translation to C. I know, for instance, that a lot of Fortran code cannot
be translated to C without losing some speed (unless the compilers
globally optimize entire programs) and without using extensions to the
language that I've rarely seen implemented.


The question is, why aren't there lots of great compilers for BASIC? The
answer seems to be "because there aren't lots of people demanding them".


            ...There is another major issue regarding interpretation which puts
            both BASIC and C in the "compiled" camp: whether the running
            program has a means to modify itself at run time. LISP, for
            example, belongs in the "interpreted" camp.


      This is not really a problem. The amount of Lisp code which actually
      modifies the program written by the programmer is just minuscule. Much
      more code _generates_ pieces of Lisp, typically using the `macro' feature.
      But almost all cases of macro usage are handled straightforwardly in Lisp
      compilers. There are a few cases where Lisp code generates code on the
      fly, then executes it. This is handled by having an interpreter loaded
      along with the compiled code, and assuring that code can work correctly in
      such a mixed environment. In fact, most Lisp implementations _assume_ a
      mixed environment....


I don't feel it is a _problem_, but it is, I think, a more important
_issue_ in defining what constitutes an _interpreted_ vs. _compiled_
language. Ideally, one might say a compiled language is one where all the
facilities needed by a program written in that language are identifiable
by analyzing the source program; an interpreted language is one where all
the facilities are available to the running program at all times. By
"facilities" I mean things like library routines, language constructs, I/O
capabilities, and so on.


By this definition, for existence, one could say IBM's JCL is a fairly
pure compiled language, at least as well as I understood it way back when.
Your source program had to identify _all_ the input and output files (data
sets) explicitly -- no runtime determination of which file to open and when
to open it. This seems artificially constraining, but it's really great
for some things that are "meta-language" issues, like scheduling of tasks
(you don't get "deadly embrace" in an environment where multiple JCL jobs
run at the same time like you might in a traditional interactive-shell-
script-as-batch-job environment).


On the other hand, LISP is a fairly pure interpreted language, since,
while running, a program can read in or otherwise construct an entirely
new program and run that. It also can examine and modify itself as
needed.


Fortran and C lean more toward the compiled side, Fortran-77 more so since
its memory needs are statically determinable without global
interprocedural analysis. But they both offer the ability to dynamically
determine names of files to be opened and closed, and both offer dynamic
determination of formatting for strings. Hence, any Fortran or C program
that uses a run-time format or printf-type string, especially if that
run-time string is pulled from an input stream, the compiler must ensure
that _all_ facilities accessible via that mechanism are available at run
time.


In some ways, BASIC is even more of a compiled language than Fortran and
C. (However, this is based on awfully old and probably faulty knowledge
of the full scope of the language.) I don't recall that it has anything
quite like _run-time_ formatting of strings. On the other hand, its
memory needs, unlike Fortran, are not statically determinable. (The
implementation I started out with just let arrays, even string arrays,
grow as much as needed, like C can do via malloc.)


Of course, a big determinant of how easy it is to build a compiler that
produces excellent code is how extensive the built-in type mechanism is
and how much it is used. C and even Fortran are pretty well endowed in
this respect because they offer a fair number of frequently useful types
and programmers tend to use those types appropriately. LISP, Smalltalk,
and BASIC, to varying degrees, offer less of some types (varying kinds of
integers and floats) and more of others (varying-length strings, dynamic
and/or sparse arrays), and programmers in those languages might not always
be as careful to use the most restrictive type minimally necessary to get
the job done as they are with Fortran and C. So, the compiler's job must
include the ability to determine, in place of the programmer, what more
restricted type can be gotten away with to produce more efficient code. C
and Fortran compilers generally don't have to do this to do what is widely
considered to be a good job of compilation. A good BASIC compiler
probably would.


Practically speaking, lots of what makes a compiler or interpreter hard or
easy to implement is not so much the theoretical things as the practical.
C programmers have long designed and implemented code with the idea that
memory is basically flat, pointers can be passed around freely, and so on.
A C interpreter would have to deal with things like how to do ioctl() or
other things where it might need to determine how the programmer expected
memory to be laid out. That might be easy or hard depending on how the
interpreter is designed and what other problems it is solving. A Fortran
interpreter that executes as it reads code without first reading the
entire procedure can't be built anyway, since a DATA statement that
specifies initial values for variables can be at the end of a procedure.
BASIC, on the other hand, ws pretty much designed for this kind of
execution, from what I can remember -- it didn't, years ago, even have to
worry about "separately compiled modules" (where the initial value for a
global variable used by, say, the main program is contained in some other
file or procedure).


And it's actually easier in some ways to build a compiler for a structured
language, as far as how to handle a GOTO from one block to another, as in:


... { ... goto foo; ... }


... { ... { ... { ... foo: ... } ... } ... } ...


The compiler of code like this in ANSI C or Fortran-77 has much less to
worry about than a canonical interpreter, the latter having to worry about
playing around with stacked blocks and so on. This may seem obvious to
most of the readers here, but 've seen at least one implementation of an
interpreter where the designer seemed to have forgotten about cases like
this (it was for a shell language much like PL/I), or at least the code
didn't handle it right.


Ultimately, interpreters and compilers seem to be one of those areas in
which theory is fine, but practice is what governs both perceptions and
realities. There don't seem to be adequate definitions of terms (but then
I'm no theorist) nor summaries of key language facilities that distinguish
between the various languages described (to which could be usefully added
and researched: Forth, MUMPS, Eiffel, ADA...). I wonder how hard or easy
(or useful) it would be to design a language that had facilities that met
today's and tomorrow's projected needs, yet lent itself well to pure
interpretation, pure global compilation, and other flavors in between the
extremes?
--
James Craig Burley, Software Craftsperson burley@gnu.ai.mit.edu
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.