Re: Bootstrapping theory? (Cliff Click)
5 Sep 1996 23:54:14 -0400

          From comp.compilers

Related articles
Bootstrapping theory? (Kjetil Valstadsve) (1996-09-03)
Re: Bootstrapping theory? jgm@CS.Cornell.EDU (Greg Morrisett) (1996-09-05)
Re: Bootstrapping theory? (1996-09-05)
Re: Bootstrapping theory? (1996-09-05)
Re: Bootstrapping theory? (1996-09-05)
Re: Bootstrapping theory? (Jens Vaasjo) (1996-09-25)
Re: Bootstrapping theory? (1996-09-29)
Re: Bootstrapping theory? jbuck@Synopsys.COM (1996-09-29)
| List of all articles for this month |

From: (Cliff Click)
Newsgroups: comp.compilers
Date: 5 Sep 1996 23:54:14 -0400
Organization: none
References: 96-09-014
Keywords: design

> Does anyone have pointers to literature describing the process and
> idea of writing a bootstrapped compiler? I've read the chapter in the
> Dragon.

Ohh - this is such a fun topic!

Assume you have an original compiler C0 (purchased elsewhere or hand
generated in assembly, etc), AND source to the new compiler src_C1.
src_C1 is acceptable input to C0 and the compiler src_C1 intends to
build. To remove yet another random factor, assume src_C1 will
build a compiler that runs on the same system as C0.

    C0 + src_C1 => C1, the new compiler built with C0 technology.

    C1 + src_C1 => C2, a "stage 2" compiler implementing the same
language as C1 but built using C1 technology (embodied in src_C1).

    C2 + src_C1 => C3, a "stage 3" or self-host compiler.
Like C2 it's built using the technology in src_C1 and it's a
compilation of src_C1, so C2 SHOULD BE IDENTICAL TO C1.
Bit-for-bit equal (except perhaps time stamps).

Failure to be equal means you either have a source of non-determinism
(uninitialized variable that still generates correct code) or a latent
bug. Root these out, as it makes it devilishly difficult to debug.

Now toss the unneeded C0 compiler. After tossing C0 you can't build
C1; like all software it's bits will "rot" with time, so toss it as
well. Keep C2. C3 is a clone of C2 built for correctness, so toss it

Rename C2 as C0; modify src_C1 into src_C2 and you are ready for the
next rev of your compiler. Repeat until your company goes out of
business or you win the Spec wars :-).

Fun things to do:

Originally ints are size 4 is in the src_C1; build C2 from that.
Next, replace the magic '4' in src_C1 with 'sizeof(int)', and call it
src_C2. Build C2' from *that*. Notice that src_C2 never mentions the
size 4, but C2' "knows" ints are size 4 because compiler C2 told it
that sizeof(int) is 4. Toss C2, use the self-hosted C2' as your next

Whee!!! Never does your source mention that sizeof(int) is 4, but
it's in_the_binary of C2'. Compile src_C2 with a different compiler
X, and the resulting compiler C2x knows that ints are the size that
compiler X thinks they are, instead of 4 or the size in C2'!

We had a go around here a few months ago with line terminators. Our
compiler builds on Unix, Mac & DOS. All 3 think lines end
differently. Somebody decided to replace 0x10 and 0x13 with '\n' and
'\r' and self-host. Somebody else flipped a switch during a later
self-host flipping the meaning of these characters in the generated
compiler (useful when porting code between platforms) but the
resultant compiler had it's sense of '\n' and '\r' flipped. Then it
got linked with a non-flipped library and zamee! A binary that's
really weird!

Gotta go,

Cliff Click, Ph.D. Compiler Researcher & Designer
RISC Software, Motorola PowerPC Compilers (512) 891-7240

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.