Re: how to increase compilation performance

Stephen J Bevan <stephenb@harlequin.co.uk>
15 Feb 1999 23:17:54 -0500

From comp.compilers

Related articles
how to increase compilation performance genshirt@modicon.de (Thomas Gänshirt) (1999-02-05)
Re: how to increase compilation performance rweaver@ix.netcom.com (1999-02-07)
Re: how to increase compilation performance cbrtjr@ix.netcom.com (Charles E. Bortle, Jr.) (1999-02-10)
Re: how to increase compilation performance cbrtjr@ix.netcom.com (Charles.E.Bortle@dfw-ixnews9.ix.netcom.com,Jr.) (1999-02-12)
*Re: how to increase compilation performance stephenb@harlequin.co.uk (Stephen J Bevan)* (1999-02-15)**

| List of all articles for this month |

From:	Stephen J Bevan <stephenb@harlequin.co.uk>
Newsgroups:	comp.compilers
Date:	15 Feb 1999 23:17:54 -0500
Organization:	Harlequin Ltd., Manchester, UK.
References:	99-02-027 99-02-043
Keywords:	books, history

> [I'd love to get a copy of that book. It's long out of print. -John]

I can't get you a copy, but I can give you a summary of what is in it :-

@book
{ Brown:wici:1979
, author= "P. J. Brown"
, title= "Writing Interactive Compilers and Interpreters"
, publisher= "Wiley"
, year= 1979
, refs= 74
, pages= 265
, checked= 19971223
, source= "Main Library, University of Manchester"
, sjb= "A bit dated given its use of Basic, but nevertheless a
good hands on account of how to write an interactive
compiler/interpreter. Does not spend any time on optimization and
code generation, suggests using a reverse polish form and interpreting
that and/or converting it to machine code. Basic is used not because
it is considered the best language, rather because it is felt that
this language will be the most familiar to the intended audience.

In the discussion of the parser notes that Knuth~\cite{Knuth:spe:1971}
has shown that a high percentage of expressions used in real FORTRAN
programs consist of just one operand and no operators at all. In a
recursive descent parser. If the language has n levels of precedence,
this will result in n calls to parse it. Suggests that an operator
precedence method can be more efficient and gives a simple example.

A {\em break-in} is the term used for interrupting an interactive
compiler when it is running in order to stop it doing whatever it is
doing, including running a user program. The two obvious techniques
for implementing a {\em break-in} are via an {\em interrupt} or by
{\em polling}. Points at~\cite{Heher:spe:1976} as an example of a
system that allows a user program to be stopped, some commands
executed to examine state and then have the user program resumed.
Lists some of the problems with both approaches and suggests a
synthesis in which the interrupt routine is inhibited for most of the
time and only enabled at safe points.

Suggests that two ends of the spectrum of compiler users are

\begin{itemize}
\item [Dr Spend] who wants his program to run as quickly as possible.
He is willing to sacrifice compilation speed and executable size in
order to gain speed.

\item [Mr Conn] who has a large program which he just wants to run
occasionally. He is willing to sacrifice speed for space.
\end{itemize}

To support both types of users with one compiler requires and
adaptable compiler. The compiler first translates the source program
into a compact internal language. When the program is run the
compiler keeps a record of the number of times each statement is
executed and on the basis of this information the heavily executed
statements are translated into machine code. This is called the {\em
mixed-code} approach to compilation -- note it is not clear from the
description whether the re-compilation is done during execution
(on-line) or as a separate process (off-line). Suggests looking
at~\cite{Dakin:Poole:bcs:cj:1973},\cite{Dawson:bcs:cj:1973}
and~\cite{Ng:Cantoni:bcs:cj:1976}. Notes
that~\cite{Hansen:Wulf:po:1976} apply a similar technique to the
levels of optimization attempted by a conventional compiler.

The process of translating a program as it runs is given the name {\em
dynamic compilation}. It is specifically given to the process of
converting from the source language to the intermediate language and
then to machine code as the program is running. That is, syntax and
static semantic errors are not located until the program is executed.
Therefore it is suggested that the a program in the source language is
first converted to the internal language. It is this language which
is then the subject of compilation as the program runs. Notes that
this allows some optimizations and checks to be performed that an
off-line compiler cannot support, for example checking for unassigned
variables at low cost (the check is done as part of the compilation
process, the resulting code runs at full speed) and values that are
constant at run-time can be propagated and special code generated.

Besides being more work for the compiler writer, notes that the {\em
mixed-code} and {\em dynamic-compilation} approach will result in Mr
Conn's program getting larger and larger as it is run, possibly to the
point where it will not fit in memory anymore ({\em less likely to
occur in these days of multi-Mb memories}).

Suggests using {\em throw-away compilation} to solve this problem.
That is code that has been compiled can be thrown away after it has
been executed since it can always be recreated from the internal
form. No details are given of any algorithms to try and minimise the
amount of compilation versus space. Suggests looking
at~\cite{Brown:spe:1976}, \cite{Brown:bcj:cj:1979},
\cite{Hammond:spe:1977} and \cite{VanDyke:hpj:1977}.

\begin{itemize}
\item The first deadly sin is {\em to code before you think}.
\item The second deadly sin is {\em to assume the user has all the
knowledge the compiler writer has}.
\item The third deadly sin is {\em not to write proper documentation}.
\item The fourth deadly sin is {\em to ignore language standards}.
\item The fifth deadly sin is {\em to treat error diagnosis as an
afterthought}.
\item The sixth deadly sin is {\em to equate the unlikely with the
impossible}.
\item The seventh deadly sin is {\em to make the encoding of the
compiler dependent on its data formats}.
\item The eighth deadly sin is {\em to sue numbers for objects that
are not numbers}.
\item The ninth deadly sin is {\em to pretent you are catering for
everyone at the same time}.
\item The tenth deadly sin is {\em to have no strategy for processing
break-ins}.
\item The eleventh deadly sin is {\em to rate the beauty of
mathematics above the usability of your compiler}.
\item The twelfth deadly sin is {\em to let any error go undetected}.
\item The thirteenth deadly sin is {\em to leave users to find the
errors in your compiler}.
\end{itemize}"
, reffrom= Leong:Jodis:Sullivan:Jiang:DeMaine:ieee:tose:1990
}

The following are the most of the papers cited in the above summary :-

@string{bcs:cj = "The Computer Journal"}
@string{hpj = "Hewlett Packard Journal"}
@string{spe = "Software -- Practice and Experience"}

@article
{ Dawson:bcs:cj:1973
, author= "J. L. Dawson"
, title= "Combining interpretive code with machine code"
, journal= bcs:cj
, volume= 16
, number= 3
, pages= "216--219"
, month= aug
, year= 1973
, refs= 1
, checked= 19980212
, source= "Computer Science Library, University of Manchester"
, abstract= "Justification is provided for mixing interpretive code
and machine code in the same program, particularly in the output from
a compiler. Three possible methods of implementation are described,
together with a comparison of their advantages and disadvantages.
Examples of the use of this technique are given."
, sjb= "The justification is mainly the saving in program size.
The first two methods differ on whether the mixture of compiled and
interpretive code is done at the routine level or at the basic block
level (called a ``basic hunk'' in this paper). The third method
involves using hardware traps i.e. a jump to an interpretive routine
generates a trap and the interpreter is then invoked by the trap
handler. Contains the following interesting point :-

\begin{quote}
Barbieri and Morrisey~\cite{Barbieri:Morrissey:afcrl:1967} suggest a
method of compiling whereby all the code is initially interpretive.
As the program runs, the interpreter collects statistics of
instruction usage, and eventually those parts of the code which are
most frequently used can be converted to machine code. The combined
machine code and interpretive code can then be run \ldots.
\end{quote}

The paper does not mention any actual implementation."
, reffrom= Ng:Cantoni:bcs:cj:1976
, reffrom= Brown:wici:1979
}

@article
{ Dakin:Poole:bcs:cj:1973
, author= "R. J. Dakin and P. C. Poole"
, title= "A mixed-code approach"
, journal= bcs:cj
, volume= 16
, number= 3
, pages= "219--222"
, month= aug
, year= 1973
, refs= 5
, checked= 19980212
, source= "Computer Science Library, University of Manchester"
, abstract= "This paper describes the development of a version of the
MITEM text editor in which heavily-used sections of code are obeyed
directly while other sections are generated into a pseudo-code which
is interpreted. The main ideas underlying this work are adaptable
code generation, selective optimisation of run time and program space
in code generation and the use of mixed code. A general mixed code
system promises substantial advantages for debugging and program
monitoring."
, sjb= "Provides some storage requirements for direct,
interpretive and mixed versions of MITEM6 (portable text editor) for
the ICL 470: 36,240, 20,420 and 20,450 bytes. Execution profiling of
the interpretive version showed that over 97 percent of the
instruction executions occurred in less than 10 percent of the code.
In a particular run more than half of the code was never executed at
all since not all facilities were used, e.g. error reporting.
Generating the mixed version involves marking those sections (whole
functions) which are to be executed directly. This marking is done
externally by reference to fixed points in the code (presumably
function names) so that it is not necessary to alter the source code
itself. Suggests that this could be automated so that all the user
has to do is select a point on a curve plotted from training runs to
decide how much code is direct and how much is interpretive."
, reffrom= Ng:Cantoni:bcs:cj:1976
, reffrom= Brown:wici:1979
}

@article
{ Ng:Cantoni:bcs:cj:1976
, author= "T. S. Ng and A. Cantoni"
, title= "Run time interaction with FORTRAN using mixed code"
, journal= bcs:cj
, volume= 19
, number= 1
, pages= "91--92"
, month= feb
, year= 1976
, refs= 3
, checked= 19970118
, source= "Computer Science Library, University of Manchester"
, abstract= "This paper describes an interpretive function generator,
COMPIL, for use with the PDP-11 FORTRAN COMPILER and Object Time
System. The implementation used a ``mixed code'' approach, which
enables the exchange of run time for program space. Other benefits
include improved debugging facilities and extended interactive
facilities such as the ability to define and link functions at run
time. This latter feature is especially useful on the PDP-11 as
compilation and linking of function subprograms takes an appreciable
time."
, reffrom= Brown:wici:1979
}

@article
{ Heher:spe:1976
, author= "A. D. Heher"
, title= "Some features of a real-time BASIC executive"
, journal= spe
, volume= 6
, number= 3
, pages= "387--391"
, year= 1976
, reffrom= Brown:wici:1979
}

@article
{ Brown:spe:1976
, author= "P. J. Brown"
, title= "Throw-Away Compiling"
, journal= spe
, volume= 6
, number= 3
, pages= "423--433"
, year= 1976
, reffrom= Turner:spe:1979
, reffrom= Brown:bcs:cj:1979
, reffrom= Brown:wici:1979
, reffrom= Franz:plasa:1994
}

@article
{ Hammond:spe:1977
, author= "J. Hammond"
, title= "BASIC -- an evaluation of processing methods and a study
of some programs"
, journal= spe
, volume= 7
, pages= "697--711"
, year= 1977
, reffrom= Brown:wici:1979
}

@article
{ VanDyke:hpj:30:1972
, author= "Van Dyke, E. J."
, title= "A dynamic incremental compiler for an interpretive language"
, journal= hpj
, volume= 28
, number= 11
, pages= "17--24"
, year= 1972
, reffrom= Brown:wici:1979
}

@article
{ Brown:bcs:cj:1979
, author= "P. J. Brown"
, title= "Software methods for virtual storage of executable code"
, journal= bcs:cj
, volume= 22
, pages= "50--52"
, year= 1979
, refs= 10
, checked= 19971222
, source= "Main Library, University of Manchester"
, abstract= "Virtual storage systems were originally based on swapping
to and from a disc or drum. More recently, however, the same
techniques have been used in new areas, where the constraints imposed
by electro-mechanical devices do not apply. This necessitates a fresh
evaluation of storage management strategies. An area where new
strategies may reap specially good rewards is the management of code
areas by software."
, sjb= "Suggests performing paging of code and notes that the
basic block is a good unit of size since it mimimises wasted store."
, reffrom= Brown:wici:1979
}

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: how to increase compilation performance

Stephen J Bevan <stephenb@harlequin.co.uk>15 Feb 1999 23:17:54 -0500

Stephen J Bevan <stephenb@harlequin.co.uk>
15 Feb 1999 23:17:54 -0500