Re: Q: writing YACC rules

collins@cs.wm.edu (Bob Collins)
31 Aug 1998 03:19:43 -0400

          From comp.compilers

Related articles
Q: writing YACC rules Bert.Aerts@esat.kuleuven.ac.be (Bert Aerts) (1998-08-24)
Re: Q: writing YACC rules clark@quarry.zk3.dec.com (Chris Clark USG) (1998-08-30)
Re: Q: writing YACC rules collins@cs.wm.edu (1998-08-31)
Re: Q: writing YACC rules thetick@magelang.com (Scott Stanchfield) (1998-08-31)
Re: Q: writing YACC rules chrisd@reservoir.com (Chris Dodd) (1998-09-01)
| List of all articles for this month |
From: collins@cs.wm.edu (Bob Collins)
Newsgroups: comp.compilers
Date: 31 Aug 1998 03:19:43 -0400
Organization: Computer Science @ William & Mary
References: 98-08-178 98-08-196
Keywords: yacc, debug, comment

Chris Clark USG <clark@quarry.zk3.dec.com> wrote:


> Bert Aerts wrote:
> > I'm writing a parser for quite a while now, but everytime I have to
> > extend the syntax, I bounce into errors that the parser is reducing my
> > token sequences always according to the wrong rule - albeit a rule
> > that has a strong resemblence with the correct one.
>
> To which our esteemed moderator suggest the possibility of shift-
> reduce or reduce-reduce conflicts.
>
> For a tool like yacc, I would just like to add that, this is one case
> to be wary of precedence declarations (left, right, and nonassoc) and
> the use of %prec in rules. These declarations hide shift-reduce
> conflicts so that you don't even see them, which can cause exactly the
> behavior you describe.


I agree completely.


Sometimes it is helpful to have explicit control of parsing while
avoiding the chain of unit reductions one gets when putting precedence
into the grammar productions. So I wrote my own LALR(1) parser
generator to solve this problem. Below are pieces of an expression
grammar where parser-time disambiguation directives (%shift, %reduce,
and %error) allow users to explicitly control parsing based on the
next token. Here are the productions and directives for AND and AND
THEN:


  1 <expr> ::= <expr> <and-op> <expr>
  2 %reduce and
  3 %shift = /= < <= > >= not in + - & * / mod rem **
  4 %error or xor %end
  5 <and-op> ::= and | and then %end


In Ada, one cannot mix ANDs, ORs, and XORs without parentheses, hence
the %error directive in line 4. AND has lowest precedence, so all the
other operators appear in the %shift directive in line 3. AND is left
associative so we use the %reduce directive in line 2. (The %end
directive signifies the end of a production and its parser-generator
time directives.)


Here are the productions and directives for the next higher precedence
level, the relational operators (6-9) and the membership operators
(10-13):


  6 <expr> ::= <expr> <rel-op> <expr>
  7 %reduce = and or xor /= < <= > >= not in
  8 %shift + - & * / mod rem ** %end
  9 <rel-op> ::= = | /= | < | <= | > | >= %end


10 <expr> ::= <expr> <in-op> <range-or-subtype>
11 %reduce and or xor = /= < <= > >= not in
12 %shift + - & * / mod rem ** %end
13 <in-op> ::= in | not in %end


Again, using the same nonterminal <expr>, we can explicitly control
the precedence level and associativity. The parse tables generated are
quite efficient since expressions are parsed without unit
reductions. Yet LALR(1) errors that occur between expressions and
other parts of the grammar are found and displayed explicitly, since
the directives only apply to the next symbol.


Here is one last example, for the highest precedence operators,
that shows that exponentiation ** is -not- associative:


14 <expr> ::= <expr> ** <expr>
15 %reduce and or xor = /= < <= > >= not in + - & * / mod rem
16 %error ** abs not %end


Having parsed one **, it is an error to see a second ** as shown
on line 16. Of course, those who are not really familiar with
what shifts and reduces mean (like my students, at first) will
have trouble with this explicit control. In addition, it is hard
to know of all the next tokens to account for. Yet one can use
the parser generator like a compiler to
      (a) generate conflicts and lookahead token and then
      (b) decide how to resolve each conflict for each token.


Of course this part of the grammar will terminate with some sort
of non-<EXPR>s:


17 <expr> ::= <numeric-literal> | null | <string-literal>
18 | <aggregate> | <name> | <qualified-expr>
19 | <allocator> | ( <expr> ) %end


--
Bob Collins <mailto:collins@cs.wm.edu> <http://ratbert.cs.wm.edu>
[You can do this in yacc by assigning precedence to tokens you use
only in precedence declarations, and using those in %prec lines, for
example:


%left addprec mulprec
...
expr: expr PLUS expr %addprec |
expr TIMES expr %mulprec ;


-John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.