Re: Advice on writing a C compiler

konfrbr@eua.ericsson.se (Fredrik Bredberg)
Fri, 11 Aug 1995 07:38:41 GMT

From comp.compilers

Related articles
Advice on writing a C compiler fabcab@tcp.co.uk (1995-08-09)
*Re: Advice on writing a C compiler konfrbr@eua.ericsson.se* (1995-08-11)**

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	konfrbr@eua.ericsson.se (Fredrik Bredberg)
Keywords:	design
Organization:	Ellemtel Telecom Systems Labs, Stockholm, Sweden
References:	95-08-076
Date:	Fri, 11 Aug 1995 07:38:41 GMT

I'm about to do something similar, e.g write a C-like compiler
for a C-like language of my own.

I'd be happy to see a ``I'm writing my own compiler'' thread
in comp.compilers that will go through all steps in such a
project. And maybe create a ``Writing a compiler FAQ''.

Anyway hope this refs will help you. :-)

/Keep in touch. Fredrik

+--
| Fredrik Bredberg
| E-Mail..: frbr@enea.se
| Pager...: + 46 746 457371
| Enea Tel: + 46 8 6385067
| Enea Fax: + 46 8 6385050
| EUA User: konfrbr
| EUA Tel: + 46 8 7273638
+----------------------------------------------------------------------------+

>From isc@panix.com (David Koosis)
Subject: New Book: Retargetable C Compiler... & question
Date: Fri, 28 Apr 1995 19:42:46 GMT

Just read an interesting new book:

"A Retargetable C Compiler: Design and Implementation"
Christopher W. Fraser (AT&T Bell Labs)
David R. Hanson (Princeton U.)
ISBN 0-8053-1670-1
Benjamin/Cummings Publishing Company, Inc.
Copyright (c) 1995 AT&T & David Hanson

I found this book, after a first reading, to be one of the better
compiler or compiler-tool books I've read. Highly recommended.
Right up there with the Dragon Book and O'Reilly's 2nd edition
Lex & Yacc book (that ought to get this post past John ;) ).

The authors present an in-depth look at lcc, a retargetable
c compiler. They contrast their approach to texts (like the
Dragon Book, IMO) which provide breadth and strong theoretical
material, rather than depth and implementation details.

Also, the authors present their book as an example of "programming
in the large." It provides an opportunity for programmers to learn
about the design decisions, wrong turns, constraints, and strategies,
that went into a large successful software project.

At the larger level, "A Retargetable C Compiler..." is organized in
the traditional way: starting with basics, then lexical analysis,
then parsing, then code generation.
Within each section, however, the book is organized un-traditionally:
as a running narrative on the actual lcc source code:

"This book not only describes the implementation of lcc, it
*is* the implementation. The 'noweb' system for 'literate
programming' generates both the book and the code for lcc
from a single source. This source consists of interleaved
prose and lableed code 'fragments'." [p1]

The approach works well, although there were times when I
would have liked to see the source code in some other arrangement
than that provided. Fortunately the source is available via ftp:
ftp.cs.princeton.edu /pub/lcc.

The most interesting parts of this book detail the author's efforts
to make lcc easily retargetable. There's a good discussion about the
interface between the independant front-end and the various dependant
back-ends. Fraser & Hanson's observations and experiences area will
be very useful to anyone dealing with multi-platform compiler issues.
The chapters on code generation for MIPS, SPARC, and x86 code provide
sufficient detail so that one comes away with a specific, detailed
understanding, of some of the less-obvious issues in code generation.
(A good companion text for these chapters is "Microprocessors" by
Dewar and Smosna ISBN 0-07-016638-2)

Interestingly, Fraser & Hanson use a code generator-generator, lburg,
for the back-end, but they hand code the lexical analyzer and parser.
They seem to brush off Lex and Yacc as not particularly useful for
their purpose of generating a small, fast, portable c compiler
(though they do say that lex & yacc have an important role to play
for more complex languages, throw-away utilities, or instances in
which compilation speed is not important).

Fraser & Hanson state that "automatically generated analyzers,
such as those produced by LEX, tend to be large and slower than
analyzers built by hand." [p107] While they say that re2c and
ELI can generate faster analyzers than lcc's, they do not say
the same for flex, which they say generates analyzers
  "much faster and smaller than those produced by LEX." [p125]

Is is true that LEX's analyzers are generally larger and slower than
hand-made analyzers?

All & all, this is a worthwhile book which I recommend to anyone
interested in compilers.

//dk
  ISC Consultants, Inc. 14 East 4 Street Ste 602
                                                                                                    New York City 10012-1141
212 477-8800 fax:477-9895
[AT&T lex generates slow and often buggy lexers. Flex's are competitive
with hand-coded ones, particularly when you take advantage of its reports
and fiddle your token definitions to avoid having the lexer back up. -John]
+----------------------------------------------------------------------------+

>From Dave Hanson <drh@CS.Princeton.EDU> ()
Subject: lcc v3.3 available
Date: Mon, 10 Jul 1995 20:28:04 GMT

lcc version 3.3 is now available for anonymous ftp from
ftp.cs.princeton.edu in /pub/lcc/3.3.tar.gz. This release corresponds
to the second printing of "A Retargetable C Compiler: Design and
Implementation" (Fraser & Hanson, Benjamin/Cummings, 1995, ISBN
0-8053-1670-1), and it fixes all of the errors noted in the errata at
http://www.cs.princeton.edu/software/lcc/errata.html.

For the latest information on lcc and on "A Retarget C Compiler", see
http://www.cs.princeton.edu/software/lcc/.
+----------------------------------------------------------------------------+

In article 030@comp.compilers, riiber@oslonett.no (Arne Riiber) writes:
>Here's a list of Small-C variants with location for most of them.
>
>ARCHIVE TARGET OPT LIB VER CREDIT
>----------------------------------------------------------------------
>- 8080 ? ? 1.0 Ron Cain
>
>cpc 8086 N ? 1.1 D. Betz (Capridge Syst).
> simtel:msdos/c/cc0?.zip
> micros.hensa.ac.uk::/micros/ibmpc/dos/e/e053
>
>cc86 8086 N CPM86 stdio 1.1 B.White?
> freebsd.cdrom.com:/.2/simtel/sigm/vol149 -- REMOVED?
>
>c80v2 Z80 N CPM stdio, 1.1 Van Zandt
> FLOAT
> freebsd.cdrom.com:/.2/simtel/sigm/vol224 -- REMOVED?
>
>smalc21a 8086 Y Yes 2.1/70 J.E.Hendrix
>
>smc21 8080 Y Yes 2.1 E.Boebert
>simtel:cpm/smallc21/
>
>sc88 8086 N Yes 2.0 R.Green(Byte)
>simtel:msdos/small_c/
>
>small-c 6809, N Yes 1.1++ C.Lewis
> 68k,
> 8080,
> VAX
>wuarchive.wustl.edu:/pub/usenet/comp.sources.unix/volume5/smallc/
>
>tcc 6502 N BBC stdio 1.1+ A.Travis
> micros.hensa.ac.uk:/micros/ibmpc/dos/f/f049
>
>smallc11 6811 E No 2.0 J.Dumaas(Motorola)
> bode.uu.alberta.ca:/pub/motorola/68hc11/devtools/
>
>smlc68k 68k N Amiga stdio 1.1 W.Kuesche
>micros.hensa.ac.uk:/micros/amiga/dos/q/q237
>
>Explanation:
>------------
>ARCHIVE = Internet archive name (or substring suitable for archie)
>TARGET = Target processor
>OPT = Peephole optimizer, Yes/No/External
>LIB = Standard C library support (strcpy, ...)
>VER = Base version, enhancement indicated by +
>CREDIT = Name associated with the archive
>
>The small-c is my favourite:
>Files have logical names instead of cc1.c, cc2.c, ...
>
>
>-Arne-
+----------------------------------------------------------------------------+

>From preston@tera.com (Preston Briggs)
Subject: Re: Register optimizations
Date: Sun, 12 Mar 1995 02:46:54 GMT

>I know there is a ton of literature on this but it would be nice if
>there were a suggestion or two for where a relative beginner should
>start. Many of us wrote 'tiny' compilers in school that generated
>assembly code and could be used for real problems. What would be
>a managable next step for improving these compilers and learning
>a little more about register allocation and code optimization?

I think the big thing is to have a plan for your compiler before you
start. If you want to do a lot of optimization (and who wouldn't :-),
then you'll want to plan for it from the beginning. Unfortunately,
it's hard to plan for directions you haven't yet explored, so I like
to read and learn from others experiences. I'll write a bit more
below, but I recommend, very highly, that you chase down some papers.
My favorite, as far as overall direction for an optimizing compiler,
especially for a research or hobbyist effort, is

    title="An Overview of the {PL.8} Compiler",
    author="Marc A. Auslander and Martin E. Hopkins",
    journal=sigplan,
    year=1982,
    month=jun,
    volume=17,
    number=6,
    note=pldi82, -- the SIGPLAN 1982 Symposium on Compiler Construction
    pages="22--31"

In brief, here's a direction to work toward. Think of your compiler
as having three parts: front-end, optimizer, and back-end.

The front end should scan, parse, do error checking, and emit
intermediate code. The back end should consume intermediate code,
choose instructions and do register allocation, and emit assembly
code. If you do a good job with the design of your intermediate form
(very hard work), you ought to be able to totally ignore optimization
and have the front end and back end communicate only through the
intermediate (no symbol table, auxilary files, etc).

The front end should be resolutely non-optimizing. Just emit
intermediate code in as simple and straightforward a fashion as
possible. When designing your intermediate code, allow an infinite
number of registers (let the register allocator worry about cleaning
up the mess). Have the front end use registers for everything it can
-- local variables, parameters, expression temporaries, addressing
computations, etc. Don't bother trying to track their lifetimes so
that you can reuse them; that's the job of the allocator.

I like an intermediate form that's quite low level, sort of resembling
RISC code. Typical operations would look like this

INTEGER-LOAD-IMMEDIATE 123 => r75
INTEGER-ADD r75 r100 => r43
FLOAT-SUB r16 r15 => r1024
JUMP label
label: COMPARE r1 r2 => r3
BRANCH r3 lt true-label false-label
INTEGER-LOAD r33 => r456 -- load fron where r33 points
INTEGER-DOUBLE r456 => r100 -- convert
DOUBLE-STORE r100 r200 -- store r100 where r200 points

and so forth. My preference for intermediate instructions is that
they be lower level (more primitive) than any assembly language.
Then, the task of instruction selection in the back end is to combine
primitives to give reasonable assembly instructions. Lots of papers
on this; I like the ones by Davidson and Fraser.

For register assignment, I like Chaitin's work. His stuff, along with
the paper above, really set the tone for everything I've tried to
build. To read about how it works and how to built it, get a copy of
my thesis (you'll also get a fair dose of my opinions about compiler
construction). You can grab it via anonymous ftp from cs.rice.edu, in
the directory public/preston

After you build the front end and back end, then you can start having
some real fun, building pieces of the optimizer. The optimizer should
be built in pieces. I use many passes, each doing one transformation.
Each pass inhales the intermediate code for 1 routine, builds a
control-flow graph, does something to the code, and then exhales the
result. Try and build a skeleton that does exactly this, and then you
can use it for every pass.

Start with something simple like dead code elimination. Lots of ways
to do this (and everything else) -- check the literature for the way
that appeals to you. Then maybe value numbering and/or constant
propagation. Then continue, in whatever direction your interests take
you. I recomend looking at the resulting code a lot, since it'll
initially have lots of room for improvement. Also be sure to scrap
the whole thing and start over now and then. Hard to do maybe, but
how else can you take advantage of all your experience?

>It'd be interesting in knowing what others feel (from experience)
>are the classic optimizations worth pursuing for the hobbyist/hacker. Put
>another way, if you had to limit your optimizer to 1000 lines of
>code what strategy would you choose?

1000 lines probably isn't enough to do everything efficiently.
Perhaps if you ignore issues of speed (mainly algorithmic complexity),
you could do quite a lot relying on lists for all your data structures
(especially if coding in Scheme or something similar).

Classic optimizations are dead code elimination, constant propagation,
value numbering, strength reduction, and elimination of partial
redundancies (which includes global common subexpressions and loop
invariant code motion).

If I were aiming at small and fast, I'd convert to SSA, do value
numbering and constant propagation in a walk over the dominator tree
(can actually be done as part of the conversion to SSA), then dead
code elimination, and finally register allocation. Unfortunately, the
register allocators I like are fairly hefty and would consume more
time and code space than the rest of the compiler. To balance the
effort, I'd go with a cheaper allocator or a more aggressive
optimizer.

Preston Briggs
+----------------------------------------------------------------------------+

>From jvitek@cui.unige.ch (Jan VITEK)
Subject: WWW archive of comp.compilers articles available.
Date: Thu, 13 Jul 1995 23:11:05 GMT

An archive of a large portion of the comp.compilers articles published
since 1986 is now available at the URL:

http://cuiwww.unige.ch/OSG/Vitek/Compilers/index.html

The archive is updated at least once a month. If you have suggestions for
improving the information available, please let me know.

Jan Vitek
University of Geneva -- Object Systems Group
+----------------------------------------------------------------------------+

>From brandis@inf.ethz.ch (Marc-Michael Brandis)
Subject: Thesis on 'Small and Fast Optimizing Compilers'
Date: Mon, 13 Mar 1995 15:10:16 GMT

I am happy to announce availability of my thesis 'Optimizing Compilers for
Structured Programming Languages' in electronic form. See below for the
abstract.

anonymous ftp:
ftp.inf.ethz.ch:doc/diss/th11024.ps.gz

This is a Postscript-file compressed with GNU zip. The formatting is such that
it will print on both Letter or A4 paper, but necessarily cannot be optimal for
either one.

Marc Brandis
Institut fuer Computersysteme, ETH Zuerich
CH-8092 Zuerich

Abstract:

Diss. ETH No 11024, 1995
Documentname: th11024.ps
Author: Marc M. Brandis
Organisation: Institute for Computer Systems, ETH Zurich
Title: Optimizing Compilers for Structured Programming Languages
Pages: 153

Modern processor architectures rely on optimizing compilers to
achieve high performance. Such architectures expose details of their
hardware to the compiler, which has to deal with them in generating
machine code. This development has led to complex and slow
compilers, which are difficult to understand, implement, and maintain.

        This thesis reports on methods to simultaneously reduce the
complexity and the compile-time of optimizing compilers by more than
a decimal order of magnitude. It presents a novel intermediate
program representation, which integrates data- and control-flow into
a single data-structure. This provides not just for simpler and
faster optimization algorithms, but also for more powerful
optimization techniques. The thesis also describes single-pass
algorithms to construct this intermediate program representation from
structured source code, as well as single-pass techniques to
transform programs with restricted kinds of unstructured control-flow
like in Oberon into structured form. The integration of these
techniques with the parser allows to implement fast and compact
front-ends for structured programming languages, that avoid the many
auxiliary data structures other optimizing compilers require.

        A description of several optimization algorithms and how they can
be implemented on this intermediate program representation shows the
feasibility of the approach. Most of these techniques have been
implemented in a prototypical optimizing compiler translating a
subset of the programming language Oberon for the PowerPC
architecture. Measurements on this compiler prove that both the
complexity and the compile-time of optimizing compilers can be
reduced by an order of magnitude when translating a structured
programming language and when using this novel intermediate
representation and the associated algorithms. The thesis concludes
with some feedback to the programming language designers, which
language constructs cause undue complications in optimizing compilers
and should therefore be omitted.
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Advice on writing a C compiler

konfrbr@eua.ericsson.se (Fredrik Bredberg)Fri, 11 Aug 1995 07:38:41 GMT

konfrbr@eua.ericsson.se (Fredrik Bredberg)
Fri, 11 Aug 1995 07:38:41 GMT