Re: language design tradeoffs

norvell@csri.toronto.edu (Theo Norvell)
Thu, 17 Sep 1992 16:50:39 GMT

From comp.compilers

Related articles
[10 earlier articles]
Re: language design tradeoffs drw@euclid.mit.edu (1992-09-16)
Re: language design tradeoffs rob@guinness.eng.ohio-state.edu (1992-09-17)
Re: language design tradeoffs bromage@mullauna.cs.mu.OZ.AU (1992-09-17)
Re: language design tradeoffs jch@rdg.dec.com (1992-09-17)
Re: language design tradeoffs firth@sei.cmu.edu (1992-09-17)
Re: language design tradeoffs nickh@CS.CMU.EDU (1992-09-17)
*Re: language design tradeoffs norvell@csri.toronto.edu* (1992-09-17)**
Re: language design tradeoffs jlg@cochiti.lanl.gov (1992-09-17)
Re: language design tradeoffs bks@s27w007.pswfs.gov (1992-09-17)
Re: language design tradeoffs raveling@Unify.com (1992-09-17)
Re: language design tradeoffs jlg@cochiti.lanl.gov (1992-09-18)
Re: language design tradeoffs e86jh@efd.lth.se (1992-09-19)
Re: language design tradeoffs maniattb@cs.rpi.edu (1992-09-19)
[18 later articles]

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	norvell@csri.toronto.edu (Theo Norvell)
Organization:	CSRI, University of Toronto
Date:	Thu, 17 Sep 1992 16:50:39 GMT
References:	92-09-048 92-09-095
Keywords:	design, parse

bromage@mullauna.cs.mu.OZ.AU (Andrew Bromage) writes:

>My favourite solution comes from BASIC, where the statement terminator :
>is an _extension_ to the normal business of using EOL do terminate a
>statement. More modern versions of similar ideas can be found in languages
>like Turing.
>...
>How do you allow the omission of the semicolon in languages like this at
>the end of a line?

      I'll only address Turing (and its relatives, like Euclid). In these
languages _all_ semicolons may be omitted, not only those at the end of a
line. (Ok...there is one weird circumstance in Euclid where a semicolon
can't be omitted.)

      The first thing to realize is that in these languages the end-of-line
is not a token. The end-of-line is in every way equivalent to a blank,
i.e. it serves only to separate tokens. Nothing fancy is needed in the
lexer.

      The next thing to realize is that simple statements and declarations
require no `terminator' (as in C). Nor is a `separator' needed between
statements (as in Pascal). Useful and rich languages with neither
separators nor terminators can be described with context free grammars.
Your favourite parsing technique can likely be used. Nothing fancy is
needed in the parser.

      The only thing that is needed is careful writing of the grammar. Here
is a simple example language. Note that the grammar is not only
unambiguous, but also LL(1) and LALR(1).

Program ::= Block end-of-file
Block ::= Statement Block
| Nothing
Nothing ::=
Statement ::= var id : Type
| type id = Type
| Ref := Exp
| read Ref
| write Exp
| if Exp Block else Block end
| while Exp Block end
Type ::= id | array Exp .. Exp of Type
Ref ::= id | Ref [ Exp ]
Exp ::= Ref Exp1 | ( Exp ) Exp1 | op Exp Exp1
Exp1 ::= Nothing | op Exp

      The most serious objection to this sort of syntax is that a typo might
transform a correct program into an incorrect program where the
corresponding mistake in a language that insisted on semicolons would be
detected. Careful language design can eliminate such problems.

      A less serious objection is that error recovery is less successful and
error messages less accurate. Experience has shown that this isn't a
major problem.

      Note that in Turing and Euclid semicolons are actually in the grammar
but are optional. So if the above points worry you enough, you can always
use them. The above grammar can be modified to have optional semicolons
as terminators, separators, or null statements. Any way there is no
ambiguity. In an interactive language, the semicolon can be useful to
indicate the end of a statement before typing the first token of the next.
For example if you type
a:= 2 b:= 3
write a+b ;
then the sum should be printed right after the semicolon key is hit.

Speculation: I think the original language to use this sort of syntax was
LISP. In its fist implementation, commas separated list items, but a bug
in the reader allowed them to be optional. Perhaps the presence of LISP
expert London on the Euclid committee lead to the observation that
separators are not needed in a language with terminators for compound
statements.

Theo Norvell
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: language design tradeoffs

norvell@csri.toronto.edu (Theo Norvell)Thu, 17 Sep 1992 16:50:39 GMT

norvell@csri.toronto.edu (Theo Norvell)
Thu, 17 Sep 1992 16:50:39 GMT