Writing A Plain English Compiler

"Gerry Rzeppa" <gerry.rzeppa@pobox.com>
Tue, 4 Nov 2014 15:50:41 -0600

          From comp.compilers

Related articles
Writing A Plain English Compiler gerry.rzeppa@pobox.com (Gerry Rzeppa) (2014-11-04)
Re: Writing A Plain English Compiler Pidgeot18@verizon.net (=?UTF-8?Q?Joshua_Cranmer_=f0=9f=90=a7?=) (2014-11-05)
Re: Writing A Plain English Compiler gerry.rzeppa@pobox.com (Gerry Rzeppa) (2014-11-06)
Re: Writing A Plain English Compiler bcas@freeuk.com (BartC) (2014-11-07)
Re: Writing A Plain English Compiler bcas@freeuk.com (BartC) (2014-11-07)
Re: Writing A Plain English Compiler gneuner2@comcast.net (George Neuner) (2014-11-07)
Re: Writing A Plain English Compiler bc@freeuk.com (BartC) (2014-11-08)
[19 later articles]
| List of all articles for this month |

From: "Gerry Rzeppa" <gerry.rzeppa@pobox.com>
Newsgroups: comp.compilers
Date: Tue, 4 Nov 2014 15:50:41 -0600
Organization: Compilers Central
Keywords: parse, practice
Posted-Date: 04 Nov 2014 23:27:27 EST

[And with that, this thread is over unless someone has something
to say that's related to compilers. Thanks, all. -John]

Good idea. Let's talk nitty-gritty compiler stuff and see if anyone is

Exactly how does one go about writing a Plain English compiler? I imagine
there are lots of ways, but here's how my elder son and I did it.

Having worked with a wide variety of traditional languages, and having
written a number of compilers together, we knew that a practical procedural
programming language could be concocted with just five types of statements:

1. Type definitions;
2. Global Variable definitions;
3. Routine Headers;
4. Conditional Commands; and
5. Unconditional Commands.

We then noticed that when talking with each other about Types, in English,
our sentences typically started with indefinite articles like "A" and "An"
and "Some", as in "A color has a hue..." or "An ellipse is a..." or "Some
pages are...".

We also noticed that when talking about Global Variables, we always used the
definite article, "The", as in "The screen's width..." or "The grand

We noticed, as well, that when describing a real-world Routine or procedure
to someone, we usually started our explanation with the preposition "To", as
in "To get to the store from here..." or "To bake a cake...".

Fourthly, we noticed that Conditional Commands, in our everyday speech, most
often began with the word "If", as in "If I'm not home in an hour..." or "If
the oven isn't hot enough..."

Lastly, we noticed that Unconditional Commands could start with almost any
verb, as in "Bake the cake" or "Fill the car up with gas". Since we wanted
our Plain English language to be user-extensible -- and since we certainly
didn't want to have to make a list of all the verbs any programmer might
want to use -- we decided to describe this category simply as "everything

The result of these ruminations was the following tentative definitions for
our language:

1. Type definitions, which always start with A, AN, or SOME;
2. Global Variable definitions which always start with THE;
3. Routine headers, which always start with TO;
4. Conditional Commands, which begin with "IF"; and
5. Unconditional Commands, which start with anything else.

At this point we needed just one more thing to get going on a simple
prototype: rules for the formation of names. We noticed that the names we
gave things in ordinary speech were typically nouns, often preceded with one
or more descriptive adjectives, such as "table" or "orange table" or "big
orange tables", often containing spaces, and almost always preceded by an
article in everyday speech, as in "the table" or "an orange table" or "some
big orange tables". So, taking an approach similar to our handling of
Routines above, we decided to try this definition:

6. A name is anything after an article, up to (but not including) any other
article or preposition.

We then set about formalizing these ideas in a somewhat sloppy EBNF, where,
for our own convenience, we used "A" to represent any indefinite article,
like so:

type = A name IS A type-name .

Where a "type-name" is the name of any built-in or user-defined Type. (To
allow user-defined Types to appear in any order in the source code of a
Plain English program, and to support recursive data structures without the
need for additional "keywords", we decided to have our compiler make more
than one pass of the source: collecting names the first time around,
resolving them later.) Continuing:

global = THE name IS A type-name .

No surprises there. Moving on:

routine = TO routine-name : commands

Where routine-name turns out to be not quite so simple: a routine-name, at
this stage, turned out to be a combination of those "anything else" words,
with optional parameters sprinkled in between. The parameters, it turns out,
were easy to define: they're essentially local variables which are defined
like global variables except that indefinite articles are used. So we have:

parameter = A name

Where "A" represents any indefinite article, as above.

Calling the groups of miscellaneous words preceding, in between, and
following parameters, "monikettes", we could then define a routine-name like

routine-name = { monikette | parameter }

Which left us, at this point, with just the two kinds of Commands to define
(which I'll take out of the above order):

command = unconditional | conditional .

Unconditional Commands take the form:

unconditional = { monikette | new-local | variable-reference}

Where a new local variable is defined like so:

new-local = A name

And a variable-reference to a previously defined global, local, or
parameter, looks like this:

variable-reference = THE name

And a Conditional Command takes the form:

conditional = IF decider-call , unconditional { ; unconditional } .

Where the decider-call is a reference to a special kind of Routine defined
as follows:

decider: TO DECIDE IF routine-name : commands

And that, I think, is enough for now. Questions? Comments? Anyone? Anyone?

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.