From: | "Joel E. Denny" <jdenny@clemson.edu> |
Newsgroups: | comp.compilers |
Date: | Fri, 19 Feb 2010 12:48:51 -0500 (EST) |
Organization: | Compilers Central |
References: | 10-02-024 10-02-029 10-02-047 10-02-055 10-02-062 10-02-064 |
Keywords: | bison, errors |
Cc: | compilers@iecc.com |
Posted-Date: | 21 Feb 2010 00:20:34 EST |
X-X-Sender: | jdenny@vega |
On Sun, 14 Feb 2010, Stephen Horne wrote:
> In LR(1), it is *easy* to give a message of the form "expected one of
> <token list>, but <token> was found."
> Yacc and Bison don't support reporting errors in this form AFAIK, but
> the tool isn't the same as the algorithm the tool uses.
According to Bison's commit history, Bison first implemented and
documented YYERROR_VERBOSE in 1993. For roughly a decade now, the
directive %error-verbose has been the preferred way to activate
YYERROR_VERBOSE. Of course, the list of expected tokens is wrong for some
cases because of default reductions and LALR. That is, sometimes the list
of expected tokens can contain incorrect tokens, can omit correct tokens,
or both.
Bison 2.5, which is under development, has a working implementation of
canonical LR(1) and IELR(1), which is a minimal LR(1) algorithm. It also
provides an option to disable default reductions, so I'll just assume
they're disabled for most of this explanation.
Canonical LR(1) fixes %error-verbose for deterministic parsers except for
some non-LR(1) grammars employing %nonassoc. That is, canonical LR(1)
does not see past any immediate reductions to discover that the
nonassociative token is eventually an error.
Of course, canonical LR(1) has other problems. It typically increases the
number of parser states by an order of magnitude. Given modern hardware,
that may not bother you. However, the increase in parser states can also
cause duplication of parser conflicts. In my experiments, they can
increase by an order of magnitude or more as well. The result is a severe
impact on the time it takes to debug a grammar and resolve conflicts
during development.
The purpose of IELR(1) (and minimal LR(1) in general) is to accept exactly
the same language as canonical LR(1) but with parser tables that are
nearly the size of LALR(1). To do this, it merges parser states in the
manner of LALR(1) except where merging will cause a loss of language
recognition power. However, merging parser states is one of the causes of
incorrect messages from %error-verbose. That is, even though IELR(1)
parsers accept exactly the same set of sentences as canonical LR(1)
parsers, they may not behave in exactly the same manner when rejecting a
sentence.
To fix %error-verbose, I've created an extension to Bison's deterministic
parsing algorithm. That extension fixes %error-verbose completely for
canonical LR(1), IELR(1), and LALR(1). Thus, when this extension is
activated, IELR(1) parsers behave exactly like canonical LR(1) parsers for
both syntactically correct and incorrect sentences. Moreover, default
reductions can be left enabled without harming %error-verbose. My current
plan is to release this extension for deterministic parsers written in C
in Bison 2.5.
> For generalised LR, of course, the picture is a bit more complex
Bison 2.5 will likely not include a %error-verbose fix for Bison's GLR
parsing algorithm. Also, because %error-verbose currently examines only
one parser state, canonical LR(1) (even when there's no %nonassoc) is not
sufficient for GLR.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.