Trying to understand the bison parser algorithm.

"vrml3d.com" <comments@vrml3d.com>
28 Sep 2000 22:14:57 -0400

          From comp.compilers

Related articles
Trying to understand the bison parser algorithm. comments@vrml3d.com (vrml3d.com) (2000-09-28)
Re: Trying to understand the bison parser algorithm. cfc@world.std.com (Chris F Clark) (2000-10-01)
| List of all articles for this month |

From: "vrml3d.com" <comments@vrml3d.com>
Newsgroups: comp.compilers
Date: 28 Sep 2000 22:14:57 -0400
Organization: Compilers Central
Keywords: parse, yacc

I'm looking at:


http://www.gnu.org/manual/bison/html_chapter/bison_8.html#SEC68


and trying to make sense of an example involving this grammar:


expr: term '+' expr
                | term
                ;


term: '(' expr ')'
                | term '!'
                | NUMBER
                ;


and lookahead tokens.


In the example, they posit that 1 + 2 have been read and shifted, and that a
lookahead token of ')' should result in reduction, whereas a lookahead token
of '!' should result in a shift.


Now, 1 and 2 are tokens but they are not symbols. But based on what they
are saying about when to shift, and when to reduce, I figure the following
should occur:


1. The token '1' is read as the symbol NUMBER
2. The lookahead token is now '+'. Since there is no NUMBER '+' rule
sequence, the NUMBER on the stack must be reduced to a term.
3. The stack now contains term '+' and the lookahead token is '2', which is
a NUMBER.
4. Because there is no '+' NUMBER sequence, 1 + 2 is a syntax error.


OK, I know 4 is not a correct statement, so I must invent something to make
1 + 2 parse properly. Let's say that in cases like 4, we are allowed to
shift the symbol, but must immediately reduce to make sure the stack still
contains a valid sequence. Then...


5. The stack now contains term '+' expr


OK, that makes sense, but now it's at odds with their explanation, which
seems to imply that the stack should contain NUMBER '+' NUMBER before we
attempt to shift ')' or '!'. OTOH, because neither NUMBER '+' nor '+'
NUMBER are valid sequences, this cannot be.


In other words, I can't make any sense out of their explanation. It seems
to contradict itself. Can somebody point me to a better explanation,
prefereably with a flow chart? Simpler is better. I don't want to get
bogged down with operator precedence and other bells and whistles. Also,
I've tried looking at the source of a Bison generated parser for a simple
expression evaluator, and it was difficult to analyze, what with all the
gotos and stuff. I suppose that if I don't get an answer I could plow
through it, but I figure it must be easier than that.


--Steve







Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.