Why can't we build a C compiler?

think!compass!worley@EDDIE.MIT.EDU (Dale Worley)
Mon, 19 Dec 88 11:52:19 EST

          From comp.compilers

Related articles
[4 earlier articles]
Re: Why Can't We Build a C Compiler? nick@lfcs.ed.ac.uk (Nick Rothwell) (1988-12-20)
Re: Why Can't We Build a C Compiler? seanf@sco.uucp (1988-12-23)
Re: Why Can't We Build a C Compiler? daveb@lethe.uucp (1988-12-26)
Re: Why Can't We Build a C Compiler? olender@rachmaninov.CS.ColoState.EDU (1988-12-28)
Re: Why Can't We Build a C Compiler? frode@m2cs.naggum.se (Frode Odegard) (1988-12-29)
Re: Why Can't We Build a C Compiler? unido!gmdzi!jc@uunet.uu.net (1989-01-05)
Why can't we build a C compiler? think!compass!worley@EDDIE.MIT.EDU (1988-12-19)
Re: Why Can't We Build a C Compiler? jbs@fenchurch.mit.edu (1989-01-03)
Re: Why can't we build a C compiler? uokmax!glcowin@Central.Sun.COM (1989-01-18)
Re: Why can't we build a C compiler? limonce@pilot.njin.net (1989-01-24)
Re: Why can't we build a C compiler? waterloo.edu!cognos!rayt@RELAY.CS.NET (R.) (1989-01-25)
Re: Why can't we build a C compiler? kurt@tc.fluke.com (1989-01-25)
| List of all articles for this month |

Date: Mon, 19 Dec 88 11:52:19 EST
From: think!compass!worley@EDDIE.MIT.EDU (Dale Worley)

I think that part of the problem is the C is not all that well
defined. There are numerous tricky spots in the language that (until
the advent of ANSI C) were not standardized. Consider the different
tricks that were used to concatenate and stringize tokens. These were
so horrible that the ANSI committee eliminated them entirely and
invented the # and ## operators out of the whole cloth.


Another cute one is the following:


typedef int x;
struct x {
x x;
int counter;
};


Is this legal? According to Harbison and Steele, "structure and union
field names are in a different overloading class than objects and
typedef names". As I read this, it means that the fourth occurrence
of "x" above is legitimate. Am I right? Who knows?


Again, the requirement that declarations govern not the entire
containing block, but only the portion of the block following the
declaration leads to tricky points. In Ada this convention led to
many paragraphs defining exactly where the declared object became
accessible. In C the question was simply ignored.


Etc. etc. Compounding this is that the de-facto standard, Kernigan
and Ritchie, is not written as a genuine language reference manual,
but a tutorial. In practice, C has been defined by "what the compiler
does", which leads to numerous ambiguities and inconsistencies.


Compare the definition of C to that of Algol 68 -- In Algol 68 the
compilation task is relentlessly well-defined, although it's not
always clear that it's *possible*. But a compiler for a language of
C's complexity defined with the formality of Algol 68 would be a
class project, not an engineering feat.


          Finally, if after hundreds of attempts we can't build a little
          10,000 line utility for ourselves why in the world do we think we
          can build all the programs we work on every day?


          We are
          certainly kidding the folks that pay us and we're also doing a
          pretty good job of kidding ourselves.


Actually, the poor customer is doing OK -- He has to get the work out
the door, and a compiler that is 99.5% correct is far more useful to
him than none at all.


But this leads to a concept: Not only should we design a language to
be easily comprehensible to the user, but also easily comprehensible
to the compiler. (These goals should be synergistic, since things
that are hard to parse are likely to be hard to read as well.) Some
guidelines are:


Similar-looking tokens should not have different gramatical uses
depending on how they are declared. Examples are C identifiers
(typedef names and objects) and Algol 68 bold-words (mode names and
operators).


A declaration should be effective throughout the entire block in which
it appears, rather than starting at the point of declaration. This
makes it impossible to write a one-pass compiler, but it simplifies
the definition of the language semantics, and makes it *far* easier to
formalize the semantics.


Avoid features that can only be defined at the lower levels of
abstraction (e.g., tokenization, parsing). For instance, the C
preprocessor is *impossible* to define except as a pre-pass before
parsing. This makes it hard to build, e.g., an incremental compiler
for C. (Unfortunately, the preprocessor is really great for making
code portable. There is something that should be studied here...)


Dale
--
Not, of course, the opinions of my employer.
Dale Worley, Compass, Inc. mit-eddie!think!compass!worley
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.