From: | Kaz Kylheku <kkylheku@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Wed, 17 Feb 2010 01:49:24 +0000 (UTC) |
Organization: | A noiseless patient Spider |
References: | 10-02-024 10-02-029 10-02-047 10-02-055 10-02-062 10-02-064 10-02-067 |
Keywords: | errors, parse |
Posted-Date: | 19 Feb 2010 01:45:58 EST |
On 2010-02-15, Ira Baxter <idbaxter@semdesigns.com> wrote:
> "Stephen Horne" <sh006d3592@blueyonder.co.uk> wrote in message
>> On Sat, 13 Feb 2010 18:24:28 -0700, wclodius@los-alamos.net (William
>> Clodius) wrote:
>>
>> In LR(1), it is *easy* to give a message of the form "expected one of
>> <token list>, but <token> was found." -
>>
>> Yacc and Bison don't support reporting errors in this form AFAIK, but
>> the tool isn't the same as the algorithm the tool uses.
>
> One more reason not to use these tools, or at least get a groundswell
> in favor of some open source person to integrate such error reporting.
With error productions and yychar, you can indeed implement fairly
friendly error messages which indicate context (what was being parsed,
what might be expected next) and the problem (the token that was
encountered instead).
You can add error productions to the grammar, and there is a
``yychar'' variable which gives you the lookahead token. It's value is
zero if the cause of the syntax error is a premature end of input.
I have recent practical experience with this.
Running example:
$ txr -c '@(coll)foo@(repeat)'
txr: (cmdline:1): syntax error
txr: (cmdline:1): misplaced "repeat" in coll clause
txr: (cmdline:1): unexpected end of input
txr: (cmdline:2): unexpected end of input
The second error message is generated by a generic function
invoked form an error production for the syntax of the clause.
elem : TEXT { $$ = string_own($1); }
| var { $$ = $1; }
| list { $$ = $1; }
| regex { $$ = cons(regex_compile($1), $1); }
| COLL elems END { $$ = list(coll_s, $2, nao); }
| COLL elems
UNTIL elems END { $$ = list(coll_s, $2, $4, nao); }
| COLL error { $$ = nil;
yybadtoken(yychar, lit("coll clause")); }
;
If an error occurs following COLL, then the yybadtoken function is called
(this name is not some standard Yacc thing, but my invention).
The call establishes that the context for the problem is "coll clause", and
the identity of the bad lookahead token, if any, is the value of yychar.
If yychar is zero, it's reported differently:
$ txr -c '@(coll)foo'
txr: (cmdline:2): syntax error
txr: (cmdline:2): unterminated coll clause
Now obviously these are not messages of the form
``expected <token list> but found <token>''.
There indeed doesn't appear to be a way in Yacc to access the state and
transition info to be able to produce the list of tokens representing
valid shifts.
In cases where the token list is large, it's a terrible idea to even
generate the entire list as part of an error message. For instance, in the
above situation, we can look at y.output to see what the tokens are:
state 10
51 elem: COLL . elems END
52 | COLL . elems UNTIL elems END
53 | COLL . error
error shift, and go to state 52
TEXT shift, and go to state 2
IDENT shift, and go to state 3
COLL shift, and go to state 10
REP shift, and go to state 13
'{' shift, and go to state 16
'(' shift, and go to state 17
'/' shift, and go to state 18
'*' shift, and go to state 19
Yes, so an elem within a @(coll) can be a piece of literal text, an
identifier (like @(foo)), a nested @(coll), @(rep), the start of a
brace-enclosed variable @{ represented by a '{' token, etc.
But the user does not need to be hit in the face with this laundry list of
everything which is valid at the error point; in this situation, it would be
a bad user interface. Presumably, the user more or less knows the language
and knows what kind of stuff goes into this clause; we don't need the error
message to be a mini-lecture on that topic. So in a case like this
where the possibilities are numerous, we are not missing anything by not
having the support.
Nevertheless, in some other error situations the list of possible tokens is
small. An error handler could look at the list of possible valid tokens and
decide that, say, if the list has fewer than four elements, they could be
listed in a nice error message.
``Here, a W occurs where either an X, Y or Z should occur''.
It would indeed be somewhat nice not to have to hard-code this kind of
behavior into the grammar productions in the ``low branching factor''
parts of the grammar.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.