Re: A Plain English Compiler

Martin Ward <>
Tue, 28 Oct 2014 15:14:15 +0000

          From comp.compilers

Related articles
[16 earlier articles]
Re: A Plain English Compiler (2014-10-25)
Re: A Plain English Compiler (Derek M. Jones) (2014-10-25)
Re: A Plain English Compiler (Martin Ward) (2014-10-27)
Re: A Plain English Compiler (Kartik Agaram) (2014-10-27)
Re: A Plain English Compiler (Kaz Kylheku) (2014-10-27)
Re: A Plain English Compiler (Ivan Godard) (2014-10-27)
Re: A Plain English Compiler (Martin Ward) (2014-10-28)
Re: A Plain English Compiler (Stefan Monnier) (2014-10-28)
Re: A Plain English Compiler (Hans-Peter Diettrich) (2014-10-29)
Re: A Plain English Compiler (Gerry Rzeppa) (2014-10-30)
Re: A Plain English Compiler (Gerry Rzeppa) (2014-10-30)
Re: A Plain English Compiler (Gerry Rzeppa) (2014-10-30)
Re: A Plain English Compiler (Gerry Rzeppa) (2014-10-31)
[2 later articles]
| List of all articles for this month |

From: Martin Ward <>
Newsgroups: comp.compilers
Date: Tue, 28 Oct 2014 15:14:15 +0000
Organization: Compilers Central
References: 06-02-122 06-02-125 14-10-005 14-10-008 14-10-009
Keywords: syntax, practice
Posted-Date: 28 Oct 2014 13:29:21 EDT

On 27/10/14 19:25, Kartik Agaram wrote:
> Comparing all attempts at english-like pidgins to cobol seems like a cheap
> and overly broad shot. Hopefully now we can have a more substantive
> discussion.

Gerry Rzeppa pointed me to this thread:

which raises these questions:

1. Is it easier to program when you don't have to translate your
natural-language thoughts into an alternate syntax?

2. Can natural languages be parsed in a relatively sloppy manner (as
humans apparently parse them) and still provide a stable enough
environment for productive programming?

3. Can low-level programs (like compilers) be conveniently and
efficiently written in high level languages (like English)?

Gerry claims that all these questions can be answered in the
affirmative: I would disagree. But if some of the above *can* be done,
the effort is much greater than with a traditional "Algol-like"
programming language with a limited set of keywords and unambiguous,
precisely defined syntax and semantics.

Let's take one of the most basic concepts in imperative programming:
the assignment. In "algol like" languages you have to learn
a new symbol ":=" and learn that the thing on the right of the symbol
is assigned to the thing on the left, for example:

foo := bar

copies the value of variable "bar" into variable "foo".

Gerry claims that "you don't have to translate your natural-language
thoughts into an alternate syntax". What are *your* "natural-language
thoughts" for the assignment operation? Mine include:

MOVE bar TO foo (perhaps I have been reading too much COBOL!)
LET foo = bar (my BASIC background is showing...)
copy bar to foo
set foo to be equal to bar
make foo equal bar
assign bar to foo
assign foo from bar
get bar and put it into foo
put foo into bar (although this suggest that bar is a set)

As far as I can tell from the manual, none of these will work.
Instead you need to write:

put the foo into the bar

(The little words "the", "a", "an" which can often be omitted
in natural English appear to be absolutely required in this language:
"the" indicates a global variable, while "a" or "an" indicates
a local variable).

This reminds me of the old fashioned text-based adventure games
where you knew what you wanted to do: but getting the parser
to accept your command turned into a game of "guess the verb":

Another example: you can "divide the foo by the bar"
but cannot "divide the bar into the foo".

So: you *do* need to translate your natual language thoughts
into an alternative syntax: the minimalist subset of English
accepted by the parser.

Of course, since you have the source code, you can extend
the parser by adding different ways of saying the same thing.

Which leads us to the next problem:

Gerry writes that there are (at least) three ways of calling
the same routine:

Draw the text with the Osmosian font.
Draw the text using the Osmosian font.
Given the Osmosian font, draw the text.

A later commentator asks what happens if the user says:

Write the words with Osmosian.
Type the text with Osmosian.
Print the page in Osmosian.

If you are the only programmer, writing programs for yourself,
you would probably pick one of the options and stick to it.
(The provided source code appears to use a fairly consistent style).
However, if you are working on a team and have to read other
people's code (as most programmers are most of the time),
then you will need to be able to recognise *all* the different
ways of writing a standard statement or call: in this case,
adding more variations just adds to the amount of "legalese"
you have to remember.

You will also have to remember not just the syntax but the semantics
of each statement: this semantics may be subtly different from
the semantics of natural language. For example, in English:
"A plus B times C" means, as any schoolchild will tell you,
multiply B and C together and add the result to A.
But in "Plain English" it is different:

put an A plus a B times a C into the total

means add A to B and then multiply the result by C.

In fact, the manual is ambiguous on this point:
page 74 says that 'Say I find the word PLUS between a snoz and a froz.
I look for a routine that tells me how "to add a froz to a snoz",
and then I use that routine to reduce the expression'.
A little further on the manual adds: 'To process "a snoz TIMES a froz",
I use "to multiply a snoz by a froz"'. In the expression
"the A plus the B times the C" both the above conditions apply,
and the manual does not say whether to do "add an A to a B" first,
or to do "multiply a B by a C" first.

The problem is that English is ambiguous and the ambiguity
in the "Plain English" source code is not resolved by
the ambiguous English sentences in the manual!

The whole "Plain English" language is actually quite limited:
containing integer, string and record data types (no arrays,
floating point, hash tables, regular expressions, lists etc.)
together with simple IF statements, assignments, loops, and simple
functions and subroutines with parameters. This makes it slightly
more powerful than, say, the original Dartmouth BASIC or PL/0,
and rather *less* powerful than BBC BASIC or PASCAL.
Such a small language could probably be learned in its entirety
in a few days: but with "Plain English" we also have to memorise
all the different ways to write the same statement,
and all the subtle semantic differences between it and English.

A real test of the utility of "Plain English" would be to compare
the effort of implementing the "Plain English" compiler in
"Plain English" with the effort of implementing a compiler for,
say, a small extension to PL/0 or similar in itself: since PL/0
is roughly equivalent in computational power with the "Plain English"
language. For example, the chapter "Oberon0: A Case Study" in the book
"Object-Oriented Programming" by Prof. Dr. Hanspeter Mvssenbvck claims
that Oberon0 was implemented under Oberon in 1,300 lines of code.
An implementation of Oberon0 in Simpl ("Implementing Oberon0 Language
with Simpl DSL Tool" by Margus Freudenthal) required 1,987 lines.
The "Plain English" compiler and "noodle" require over 15,000 lines
of code (686kB of source code).

In this context the "LDTA 2011 Tool Challenge" looks interesting:


Dr Martin Ward STRL Principal Lecturer & Reader in Software Engineering Erdos number: 4
G.K.Chesterton web site:
Mirrors: and

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.