Related articles |
---|
C-interpreter, newlines as separators? ngorelic@speclab.cr.usgs.gov (Noel S. Gorelick) (1995-04-28) |
Re: C-interpreter, newlines as separators? norvell@cs.toronto.edu (1995-05-09) |
Newsgroups: | comp.compilers |
From: | norvell@cs.toronto.edu (Theo Norvell) |
Keywords: | parse |
Organization: | CS Lab, University of Toronto |
References: | 95-04-172 |
Date: | Tue, 9 May 1995 05:10:58 GMT |
>I am writing what is turning out to be a C-like interpreter. As an
>interpreter, the trailing semicolons are a nuisance, and seem kind of
>silly most of the time. I would like to be able to optionally use newlines
>as statement separators.
Good idea.
>What I think I need to do is to conditionally ignore newlines.
This may not be necessary. Often a better idea is to follow the
C/Algol/etc idea of treating blanks and newlines the same. You can
use an unambiguous grammar that requires no statement terminators or
separators. (As a few people have recently pointed out.)
For Pascal-like languages, the grammar can even be LL(1).
Consider this language as an example:
Program --> Block eof
Block --> Statement OptSemi Block | Empty
OptSemi --> ; | Empty
Statement -->
id := Exp
| id(ArgList)
| if Exp then Block else Block fi
| while Exp do Block od
| var id : Type
| proc id(ParamDecls) Block end
| fun id(ParamDecls) Block return Exp
etc
Exp --> monop Exp | Exp binop Exp | (Exp) | id(ArgList) | id
Empty -->
This grammar can be made unambiguous, by giving precedence and
associativity to the various unary and binary operators. In
fact it is very similar to the grammar of the Turing language,
which is LL(1).
Note that in assignment statements, I only allow very simple
left-hand-sides, this could be extended to allow subscripting, but
as soon as you allow a statement to begin with a parenthesis,
you will lose all hope of an LL grammar. Consider the Block
a := foo
(*p) := bar
It begins too much like
a := foo(*p)
But perhaps LALR is salvagible.
When you allow whole expressions to be statement by themselves,
even this hope fades, as any useful grammar will be ambiguous. Consider
x := f(x)
which could be parsed as x := f followed by (x). Also what about
x - y
which could be parsed as x followed by -y. Obviously allowing
C's "empty statement" is also disastrous as the empty string can be parsed
as any number of empty statements. But it is a useless statement,
as you can always use {}, which need not cause a problem.
By changing the function call syntax (I use square bracket and let the type
checker figure out the difference between subscripts and arguments)
and segregating unary from binary operators you can get to
LALR(1) as evidenced by the yacc grammar for a C-like language below.
In an interactive language there is still the problem that the
end of a statement may not be recognized as such until the first
token of the next statement is read. I suggest using some special
token (I'll call it "done") that the user uses to request that the value
of the preceding statment be printed. The nonterminal Program should be
modified to read
Program : Block done {printVal();} Program | eof ;
Here is the yacc grammar:
%start Program
%token id if then else while do typename return eof
%right colonEqual
%left '+' '-'
%left '*' '/'
%left '~' /* Unary minus */
%%
Program : Block eof
;
Block : Stmt OptSemi Block
| Empty
;
OptSemi : ';'
| Empty
;
Empty :
;
Stmt : '{' Block '}'
| Exp
| if Exp OptThen Stmt else Stmt
| while Exp OptDo Stmt
| typename id /* var declaration */
| typename id '[' PList ']' Block return Exp /* func decl */
;
OptThen : then | Empty
;
OptDo : do | Empty
;
PList : typename id OptSemi PList
| Empty
;
Exp : id OptArgList
| Exp colonEqual Exp
| Exp '+' Exp
| Exp '-' Exp
| Exp '*' Exp
| Exp '/' Exp
| '~' Exp
| '(' Exp ')'
;
OptArgList : ArgList | Empty
;
ArgList : '[' ExpList ']'
;
ExpList : Exp OptSemi ExpList
| Empty
;
Theo Norvell
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.