Related articles |
---|
LL parsing questions esink@turia.dit.upm.es (1990-10-16) |
Re: LL parsing questions firth@sei.cmu.edu (1990-10-17) |
Newsgroups: | comp.compilers |
From: | esink@turia.dit.upm.es (Eric Wayne Sink) |
Keywords: | parse, question, LL(1) |
Organization: | Compilers Central |
Date: | Tue, 16 Oct 90 10:21:01 +0100 |
Greetings compiler gurus:
I have a few questions which are probably a bit basic for this group,
but I'd like ideas anyway.
I believe that few people write parsers without YACC (or am I wrong ?).
Suppose you WERE writing a recursive descent parser without YACC, for
the purpose of understanding how they work. Take the example language
to be C; with the following simple grammar:
statement
: typename identifier ';'
| typename identifier '(' ')' ';'
| typename identifier '[' constant ']' ';'
;
Obviously this grammar is not useful or complete, I have made it only
complex enough to illustrate my problem. Now, try parsing the following
statements:
int x;
int x();
int x[5];
The recursive descent routines for 'statement' might look something like
this, as I am envisioning it:
int do_statement() {
int match = 0;
match = try_variable();
if (! match) match = try_function;
if (! match) match = try_array();
if (! match) error();
}
int try_variable() {
token typename, varname; semi;
typename = get_token();
varname = get_token();
semi = get_token();
if (typename == TYPE && varname == IDENTIFIER && semi ==
SEMICOLON) {
/* process it, put it into the symbol table, whatever */
return 1;
}
else {
return 0; /* fail */
}
}
The other routines are similar : the idea is that they read tokens and see
if they fit the correct pattern, returning a fail code if they do not.
try_variable is something I am calling an AND routine; for lack of correct
terminology. do_statement is something I call an OR routine, because any of
the alternatives will satisfy it.
The question is: suppose we are parsing
int wolf();
The parser above will try to parse it as a variable declaration, reading the
first 3 tokens; and finding that the third one failed. Therefore, it will
return failing, and do_statement will try the next alternative. However,
three tokens will have been read, and try_function needs ALL three of them
to determine if the pattern matches. They have been lost, unless some
method is found to retain them. How is this problem usually solved ? I
wrote a SIMPLE recursive descent parser in a class, but only needed a single
character of looking back - here we need 3 tokens, and this is a simple
example. The situation could be much worse in some grammars, couldn't it ?
Finally, am I going about this all the wrong way ?
Any comments on the ideas I have given here would be appreciated...
Thanks to all !
Eric W. Sink
Universidad Politecnica de Madrid
Departamento de Telematica
esink@turia.dit.upm.es
[Either you need to permit arbitrary pushback, or else merge your states,
since you can't tell what kind of declaration it is until after you've read
the identifier. Most grammars are not LL(1), although many can be rewritten
to be by adding lots of sub-productions. -John]
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.