Re: time to write a compiler

marcel@vlsi.cs.caltech.edu (Marcel van der Goot)
Tue, 6 Nov 90 01:22:29 GMT

          From comp.compilers

Related articles
time to write a compiler roman@gaudi.ccsf.caltech.edu (1990-10-31)
time to write a compiler meissner@osf.org (1990-11-05)
Re: time to write a compiler holliday@csgrad.cs.vt.edu (1990-11-06)
Re: time to write a compiler marcel@vlsi.cs.caltech.edu (1990-11-06)
Re: time to write a compiler ge@sci.kun.nl (1990-11-07)
Re: time to write a compiler bright@nazgul.UUCP (1990-11-08)
Re: time to write a compiler pohua@polaris.crhc.uiuc.edu (1990-11-09)
Re: time to write a compiler beckmann@das.harvard.edu (1990-11-09)
| List of all articles for this month |

Newsgroups: comp.compilers,comp.lang.fortran
From: marcel@vlsi.cs.caltech.edu (Marcel van der Goot)
Keywords: Fortran, question
Organization: California Institute of Technology (CS dept)
References: <1990Oct31.180632.3201@nntp-server.caltech.edu>
Date: Tue, 6 Nov 90 01:22:29 GMT

In <1990Oct31.180632.3201@nntp-server.caltech.edu> Roman Salvador
(roman@gaudi.ccsf.caltech.edu) asks


> I would like to find out the time it takes (more or less) to write a
> compiler (i.e. a Fortran 90 one). Does anybody have any statistics or
> experiences?.


During the past one-and-a-half year (well, actually almost two years already)
I have been working on a C-based programming language for message-passing
multi-computers, ``mcc'' (soon to be released ...). The project involved
the design of the language, writing a compiler for it, and writing a
thesis about it. Unfortunately, I haven't really kept track of the time
the different parts took me (shame on me), so my statistics are rather
imprecise, but I thought my experiences might be of some help.


The following account is rather long. My apologies for that.




* Introduction


The language is an extension of ansi C, with new constructs for
defining, creating, and connecting processes, for sending and receiving
messages, and for selecting communications. C was chosen because
it is relatively small (we thought) and widely used. The compiler
is supposed to be rather machine-independent and produces (standard) C
rather than machine-code, hence no time was spent on building
machine-code generators or code-optimizers --- I assume you have to
actually generate assembly- (or machine-) code.


* My background when I started the project


4 years of undergraduate study in computer science in the Netherlands
(which rather emphasized formal methods, compared with US schools).
Included was a two-term class about compiler construction, one term
of which was spent extending a simple compiler. I was familiar with
lexical scanners and parser generators, but had used different ones
than lex and yacc.
At Caltech I had taken a one-term class about ``logic programming and
compiler construction,'' which was mostly devoted to the (construction
of a) compiler for ``strand'' (a parallel programming language, sort-of
but not quite a logic language).
I guess my specialty is concurrent programming, not compiler writing.


* Prototype


Coming up with the basic ideas for the new language constructs, plus
writing a prototype took me about 2 months. The prototype implemented
only a limited part of the language (e.g., only integers could be
sent between processes), and was mostly intended to demonstrate that
one could indeed compile the language reasonably well into standard C.
The prototype was written using lex and yacc, with most of the program
flow controlled by the code statements included in the grammar. Basically
no type-checking was done, and the output was produced in a rather ad-hoc
fashion.


* The real compiler


I then started to write the real compiler, for the whole language.
I had decided to do all the parsing and type-checking etc., rather than
just some pattern-matching with a cpp-type preprocessor. I had also
decided to do everything from scratch, rather than take, say, gcc, and
hack it. Long discussions about the wisdom of these decisions are possible.
(I have no idea whether there are F90 compilers around that you could
change to suit your needs, or whether you also have to do everything from
scratch.)


My main experiences with this:


- C is much bigger than I thought.
    Parsing C is hard (C is somewhat context-sensitive with respect to type
    names), type-checking is complicated by many automatic conversions and
    special cases.


- Most books on compiler construction don't explain enough.
    Lexical scanning and parsing are very well understood, everyone knows
    how to do it, where to start, what to be careful about, etc. But now
    you have to do type-checking. It sounds so trivial, and is always
    displayed as such, but I soon found that it isn't. With relatively
    large programs it is very easy to choose a data-structure and then
    find out a month later that you don't have enough information available
    to make one particular check for one particular operator.


- Lack of tools.
    We only had lex and yacc. I quickly found that putting all the compiler
    code in the grammar led to inconvenient control-flow, and decided
    to just build a parse-tree. I then wrote my own tool for describing
    operations in terms of this tree (when I have time, I may one day put
    this tool in the public domain). However, I think it is worth quite
    some effort to find out about other compiler writing tools that already
    exist. (And, if you have enough code, writing your own tools pays off.)


- Compiling to C instead of assembly didn't help as much as expected.
    The main reason, I think, is that the extensions all have to do
    with parallel programming, and therefore almost any new construct
    is more complicated to translate than any of the standard C constructs.


Writing this compiler (including the above-mentioned tool) took me
about 7.5 months. Very roughly, I'd say 2.5 months parsing, 3.5 months
type-checking, 1 month for the run-time system (0.5 ?). The compiler
was then used in a class about concurrent programming, which more or less
served to debug it. I'm proud to say that actually very few programming
errors were found. Most of them were in the run-time system, which
isn't very big but involves important things like process scheduling.
The run-time system is not really related to compiler construction, I
don't think you need anything like it for F90 (on the other hand, you need a
library). With about 6000 lines, the compiler is much bigger than anything
I wrote before (considering that I wrote it all, rather than modifying
existing code), and also bigger than I had expected.


* A better compiler


I did find some errors and short-comings in the compiler that were
not easy to fix. Things like inconvenient data structures, with not
quite the right information; sometimes awkward modularization (use of
program-generating tools sometimes makes this worse); etc.
I therefore basically rewrote the compiler, although I copied substantial
parts. This version is supposed to be more or less final. It took (well,
it isn't quite finished yet) me about 5 months, interwoven with about
3 months of writing my master's thesis.




* Advice


(If you conclude from the above that I obviously did everything
completely wrong then probably this advice isn't much worth --- let
me know if that's the case, though.)


- Spend time doing serious planning (that's rather obvious; I knew
    it, but still the sheer size of the project and the program took me
    by surprise).


- Look around for other tools than lexical scanner-generators and
    parser-generators. Quite likely it pays off.


- You can use a prototype to get some idea of what modules, data-structures,
    etc. you need, but write the actual program from scratch (for heated
    discussions about this, write to comp.software-eng).


- Be prepared to rewrite the whole program --- almost certainly you'll
    find at the end, after having used the compiler somewhat, that there
    are some things you'd like to add that don't fit in, or that parts of
    the compiler are needlessly complicated, etc.


- Make detailed documentation, even for parts that only you are going to
    look at. For instance, write down exactly what code you generate for each
    construct, for all different situations. Without such detailed
    descriptions it is hard to convince yourself (or others) that each part
    of the program is correct, and that all the parts fit well together.


- If possible, don't do it alone. Although a program this size can
    be written by a single person (I've been told that real programs
    are at least 10^5 lines --- how do they do that?), I don't think it
    is a good idea. It helps if there is someone with whom you can really
    discuss all the details of the program, rather than just general ideas.
    It helps to get you through slumps when someone else needs your code now.
    I think that with two persons the additional burden of having to
    cooperate shouldn't be too large, and is in fact probably beneficial.


Well, that was quite a lot. I'm sorry that I cannot give a more precise
account of the time spent on each of the parts. Hope it helps.


                                                                                  Marcel van der Goot
                                                                                  marcel@vlsi.cs.caltech.edu
[When writing my F77 compiler, I found the library, particularly the I/O
library, to be as much work as the compiler itself. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.