yacc: Getting type of token on error.

madings@execpc.com (Steve Mading)
13 Dec 1998 13:56:32 -0500

          From comp.compilers

Related articles
yacc: Getting type of token on error. madings@execpc.com (1998-12-13)
Re: yacc: Getting type of token on error. qjackson@wave.home.com (Quinn Tyler Jackson) (1998-12-18)
Re: yacc: Getting type of token on error. vmakarov@cygnus.com (Vladimir Makarov) (1998-12-18)
| List of all articles for this month |

From: madings@execpc.com (Steve Mading)
Newsgroups: comp.compilers
Date: 13 Dec 1998 13:56:32 -0500
Organization: ExecPC -- (800)-EXECPC-1
Keywords: yacc, errors

(Forgive me if the answer to this is brain-dead simple, but I cannot
seem to find it in the O'Reilly lex&yacc book anywhere, and I'm not
very experienced with yacc (yet).)

If I match an error rule, how can I figure out what the type of the
first token was that matched the error? For example, I'd like to be
able to make an error message of the form: "Line 999: Expecting a foo,
bar, or baz, but found a biz instead." However, when I match an
error, I can't figure out the type of token that was matched.

Also, another place where this is useful is that there is one kind of
error that isn't really an error if I matched a "foo", but it's an
error if I matched anything else. So I need to be able to tell the
type of the token matched and do some manual ugliness to fix things up
and go on. (I cannot re-write the grammar to handle it the 'right'
way because doing so requires one lookahead token, which yacc cannot
do, so I get shift/reduce conflicts all over.)

Background on why I need this:

I am making a yacc parser for a syntax spec called "STAR", invented by
biochemists for storing their data. Unfortunately it seems that they
did not take into account making the syntax easy to parse when they
made it. In the first part of my project I was just making a parser
that quits on one error. Now I am trying to modify it so that it has
error recovery to find multiple errors in one run. This is a very
annoying task because the syntax doesn't give me statement terminators
like semicolons or newlines. Each "part of speech" is assumed to be
done when the next "part of speech" begins. This gives me nothing to
syncronize on when I find an error. This is made even more complex
because there is an optional terminator keyword for ending loops in
this syntax. A loop is a list of values. You know you are at the end
of the loop when you hit the "stop" keyword or (and here's the
clincher) the stop keyword is optional and any sort of token that
isn't a legal value is presumed to be the start of the next "thingy"
and the loop is now over. (This leads to a need for lookahead).

If I'm really careful how I make the grammar rules I can avoid the
need for lookahead, but only if I don't try to put in error recovery.
Error recovery makes it all blow apart. If I see an error token in
the loop values, then that means the loop is now over (since a loop
ends as soon as you hit something that is not a legal value). The
rest of the values listed are considered to be outside the loop, where
they become errors. So in a loop of 100 values, if the 10th value is
erroneous, then the other 90 values get flagged as errors too.

Man, I hate this language. (STAR that is, not yacc)
Steve Mading: madings@execpc.com http://www.execpc.com/~madings
[You didn't miss anything, I don't know how to do it either, other
than carefully keeping a shadow stack of everything you're parsing.
You might try one of the backtracking versions of yacc. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.