Trying to understand the bison parser algorithm.

"vrml3d.com" <comments@vrml3d.com>
28 Sep 2000 22:14:57 -0400

From comp.compilers

Related articles
*Trying to understand the bison parser algorithm. comments@vrml3d.com (vrml3d.com)* (2000-09-28)**
Re: Trying to understand the bison parser algorithm. cfc@world.std.com (Chris F Clark) (2000-10-01)

| List of all articles for this month |

From:	"vrml3d.com" <comments@vrml3d.com>
Newsgroups:	comp.compilers
Date:	28 Sep 2000 22:14:57 -0400
Organization:	Compilers Central
Keywords:	parse, yacc

I'm looking at:

http://www.gnu.org/manual/bison/html_chapter/bison_8.html#SEC68

and trying to make sense of an example involving this grammar:

expr: term '+' expr
                | term
                ;

term: '(' expr ')'
                | term '!'
                | NUMBER
                ;

and lookahead tokens.

In the example, they posit that 1 + 2 have been read and shifted, and that a
lookahead token of ')' should result in reduction, whereas a lookahead token
of '!' should result in a shift.

Now, 1 and 2 are tokens but they are not symbols. But based on what they
are saying about when to shift, and when to reduce, I figure the following
should occur:

1. The token '1' is read as the symbol NUMBER
2. The lookahead token is now '+'. Since there is no NUMBER '+' rule
sequence, the NUMBER on the stack must be reduced to a term.
3. The stack now contains term '+' and the lookahead token is '2', which is
a NUMBER.
4. Because there is no '+' NUMBER sequence, 1 + 2 is a syntax error.

OK, I know 4 is not a correct statement, so I must invent something to make
1 + 2 parse properly. Let's say that in cases like 4, we are allowed to
shift the symbol, but must immediately reduce to make sure the stack still
contains a valid sequence. Then...

5. The stack now contains term '+' expr

OK, that makes sense, but now it's at odds with their explanation, which
seems to imply that the stack should contain NUMBER '+' NUMBER before we
attempt to shift ')' or '!'. OTOH, because neither NUMBER '+' nor '+'
NUMBER are valid sequences, this cannot be.

In other words, I can't make any sense out of their explanation. It seems
to contradict itself. Can somebody point me to a better explanation,
prefereably with a flow chart? Simpler is better. I don't want to get
bogged down with operator precedence and other bells and whistles. Also,
I've tried looking at the source of a Bison generated parser for a simple
expression evaluator, and it was difficult to analyze, what with all the
gotos and stuff. I suppose that if I don't get an answer I could plow
through it, but I figure it must be easier than that.

--Steve

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Trying to understand the bison parser algorithm.

"vrml3d.com" <comments@vrml3d.com>28 Sep 2000 22:14:57 -0400

"vrml3d.com" <comments@vrml3d.com>
28 Sep 2000 22:14:57 -0400