Re: Grammar question

Zerksis D. Umrigar <umrigar@cs.binghamton.edu>
Wed, 28 Sep 1994 00:27:26 GMT

          From comp.compilers

Related articles
Grammar question feingold@avette.zko.dec.com (1994-09-20)
Re: Grammar question euambn@eua.ericsson.se (1994-09-22)
Re: Grammar question clark@zk3.dec.com (Chris Clark USG) (1994-09-26)
Re: Grammar question umrigar@cs.binghamton.edu (Zerksis D. Umrigar) (1994-09-28)
grammar question saleem@synopsys.com (1996-03-14)
| List of all articles for this month |

Newsgroups: comp.compilers
From: Zerksis D. Umrigar <umrigar@cs.binghamton.edu>
Keywords: parse, tools
Organization: Compilers Central
References: 94-09-114
Date: Wed, 28 Sep 1994 00:27:26 GMT

feingold@avette.zko.dec.com (Elan Feingold) needed a YACCable grammar for


Command -> [Expression] [FOO Identifier] [BAR Identifier] [FOOBAR Identifier]
Expression -> Identifier | Expression + Expression ... etc ...
Identifier -> IDENT | NUMBER | FOO | BAR | FOOBAR


where [Phrase] indicates that Phrase is optional. Note that the keywords
FOO, BAR and FOOBAR can also be used as Identifiers.


I felt that this could be done without tweaking the lexer for lookahead,
tho' the grammar would probably be extremely messy. Since I haven't seen
anyone else posting along the lines of what I had in mind, here goes:


The major problem is caused by the fact that Expression is optional. Hence
we cannot distinguish the category of the first FOO in the token sequences


        FOO FOO BAR BAR FOOBAR FOOBAR (no expression, 1st FOO is a keyword)
and
        FOO FOO BAR BAR FOOBAR FOOBAR IDENT (1st FOO is an expression)


until we see a distinguishing token (the end-of-file or the last IDENT).


YACC is no smarter than us and hence cannot make any decisions based on the
category of the first FOO until it has seen the distinguishing token. YACC
decisions correspond to reductions --- hence we should structure the grammar
so that no reductions are made until a distinguishing token is seen. In
general, for non-recursive non-terminals, we can avoid reductions by
substituting their right-hand sides for problematic uses in other
productions.


Unfortunately Expression is recursive. But if we assume that expression
operators cannot be Identifiers, then YACC can decide that a keyword like
FOO is used as a identifier when followed by an operator. Hence we can
split Expression into two cases:


a) It consists of simply a keyword (FOO, BAR or FOOBAR).


b) Anything else (which can be determined to be an expression without
requiring more than 1-token lookahead).


Case (b) can be handled simply by deciding that anything which contains an
IDENT or an operator is definitely an expression and reducing accordingly.
Case (a) corresponds to the case illustrated in the above example where no
reduction should be made until a distinguishing token is seen. This can be
accomodated by substituting the right-hand sides for Identifier in all its
ambiguous occurrences. (If the command tail allowed a list of indefinite
length of keyword-identifier pairs (corresponding to a recursive construct)
this approach would fail.)


The above construction gives rise to the following tedious grammar, which is
acceptable to bison without any conflicts. How easy it may be to insert
semantics into such a grammar is another matter.


------------------------------------------------------------------------
%token BAR FOO FOOBAR IDENT NUMBER OP


%%
Cmd : SureExp Tail /* SureExp cannot simply be a keyword. */
| FOO Tail /* FOO is used as an identifier. */
| BAR Tail /* BAR is used as an identifier. */
| FOOBAR Tail /* FOOBAR is used as an identifier. */
| Tail /* Exp is missing. */
;
Tail : /* empty */
| FOOBAR Identifier
| BAR Identifier
| BAR FOO FOOBAR Identifier
| BAR BAR FOOBAR Identifier
                                | BAR FOOBAR FOOBAR Identifier
                                | BAR IDENT FOOBAR Identifier
| FOO Identifier
| FOO FOO FOOBAR Identifier
| FOO BAR FOOBAR Identifier
| FOO FOOBAR FOOBAR Identifier
| FOO IDENT FOOBAR Identifier
| FOO FOO BAR Identifier
| FOO BAR BAR Identifier
| FOO FOOBAR BAR Identifier
| FOO IDENT BAR Identifier
| FOO FOO BAR FOO FOOBAR Identifier
| FOO FOO BAR BAR FOOBAR Identifier
| FOO FOO BAR FOOBAR FOOBAR Identifier
| FOO FOO BAR IDENT FOOBAR Identifier
| FOO BAR BAR FOO FOOBAR Identifier
| FOO BAR BAR BAR FOOBAR Identifier
| FOO BAR BAR FOOBAR FOOBAR Identifier
| FOO BAR BAR IDENT FOOBAR Identifier
| FOO FOOBAR BAR FOO FOOBAR Identifier
| FOO FOOBAR BAR BAR FOOBAR Identifier
| FOO FOOBAR BAR FOOBAR FOOBAR Identifier
| FOO FOOBAR BAR IDENT FOOBAR Identifier
| FOO IDENT BAR Identifier FOOBAR Identifier
                                ;
SureExp : Exp OP Term
| IDENT
| NUMBER
;
Exp : Exp OP Term
| Term
;
Term : Identifier
| NUMBER
;
Identifier : IDENT
| FOO
| BAR
| FOOBAR
;


------------------------------------------------------------------------


-zerksis umrigar
(umrigar@cs.binghamton.edu)
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.