Re: Error reporting, was Infinite look ahead required by C++?

Kaz Kylheku <kkylheku@gmail.com>
Wed, 17 Feb 2010 01:49:24 +0000 (UTC)

          From comp.compilers

Related articles
[3 earlier articles]
Re: Infinite look ahead required by C++? idbaxter@semdesigns.com (Ira Baxter) (2010-02-13)
Re: Infinite look ahead required by C++? wclodius@los-alamos.net (2010-02-13)
Re: Error reporting, was Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-14)
Re: Error reporting, was Infinite look ahead required by C++? idbaxter@semdesigns.com (Ira Baxter) (2010-02-15)
Re: Error reporting, was Infinite look ahead required by C++? haberg_20080406@math.su.se (Hans Aberg) (2010-02-16)
Re: Error reporting, was Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-17)
Re: Error reporting, was Infinite look ahead required by C++? kkylheku@gmail.com (Kaz Kylheku) (2010-02-17)
Re: Error reporting, was Infinite look ahead required by C++? haberg_20080406@math.su.se (Hans Aberg) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? jdenny@clemson.edu (Joel E. Denny) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? jdenny@clemson.edu (Joel E. Denny) (2010-02-21)
Re: Error reporting, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-28)
[2 later articles]
| List of all articles for this month |

From: Kaz Kylheku <kkylheku@gmail.com>
Newsgroups: comp.compilers
Date: Wed, 17 Feb 2010 01:49:24 +0000 (UTC)
Organization: A noiseless patient Spider
References: 10-02-024 10-02-029 10-02-047 10-02-055 10-02-062 10-02-064 10-02-067
Keywords: errors, parse
Posted-Date: 19 Feb 2010 01:45:58 EST

On 2010-02-15, Ira Baxter <idbaxter@semdesigns.com> wrote:
> "Stephen Horne" <sh006d3592@blueyonder.co.uk> wrote in message
>> On Sat, 13 Feb 2010 18:24:28 -0700, wclodius@los-alamos.net (William
>> Clodius) wrote:
>>
>> In LR(1), it is *easy* to give a message of the form "expected one of
>> <token list>, but <token> was found." -
>>
>> Yacc and Bison don't support reporting errors in this form AFAIK, but
>> the tool isn't the same as the algorithm the tool uses.
>
> One more reason not to use these tools, or at least get a groundswell
> in favor of some open source person to integrate such error reporting.


With error productions and yychar, you can indeed implement fairly
friendly error messages which indicate context (what was being parsed,
what might be expected next) and the problem (the token that was
encountered instead).


You can add error productions to the grammar, and there is a
``yychar'' variable which gives you the lookahead token. It's value is
zero if the cause of the syntax error is a premature end of input.


I have recent practical experience with this.


Running example:


    $ txr -c '@(coll)foo@(repeat)'
    txr: (cmdline:1): syntax error
    txr: (cmdline:1): misplaced "repeat" in coll clause
    txr: (cmdline:1): unexpected end of input
    txr: (cmdline:2): unexpected end of input


The second error message is generated by a generic function
invoked form an error production for the syntax of the clause.


elem : TEXT { $$ = string_own($1); }
          | var { $$ = $1; }
          | list { $$ = $1; }
          | regex { $$ = cons(regex_compile($1), $1); }
          | COLL elems END { $$ = list(coll_s, $2, nao); }
          | COLL elems
              UNTIL elems END { $$ = list(coll_s, $2, $4, nao); }
          | COLL error { $$ = nil;
                                                                    yybadtoken(yychar, lit("coll clause")); }
          ;


If an error occurs following COLL, then the yybadtoken function is called
(this name is not some standard Yacc thing, but my invention).
The call establishes that the context for the problem is "coll clause", and
the identity of the bad lookahead token, if any, is the value of yychar.


If yychar is zero, it's reported differently:


    $ txr -c '@(coll)foo'
    txr: (cmdline:2): syntax error
    txr: (cmdline:2): unterminated coll clause


Now obviously these are not messages of the form
``expected <token list> but found <token>''.


There indeed doesn't appear to be a way in Yacc to access the state and
transition info to be able to produce the list of tokens representing
valid shifts.


In cases where the token list is large, it's a terrible idea to even
generate the entire list as part of an error message. For instance, in the
above situation, we can look at y.output to see what the tokens are:


state 10


      51 elem: COLL . elems END
      52 | COLL . elems UNTIL elems END
      53 | COLL . error


        error shift, and go to state 52
        TEXT shift, and go to state 2
        IDENT shift, and go to state 3
        COLL shift, and go to state 10
        REP shift, and go to state 13
        '{' shift, and go to state 16
        '(' shift, and go to state 17
        '/' shift, and go to state 18
        '*' shift, and go to state 19


Yes, so an elem within a @(coll) can be a piece of literal text, an
identifier (like @(foo)), a nested @(coll), @(rep), the start of a
brace-enclosed variable @{ represented by a '{' token, etc.


But the user does not need to be hit in the face with this laundry list of
everything which is valid at the error point; in this situation, it would be
a bad user interface. Presumably, the user more or less knows the language
and knows what kind of stuff goes into this clause; we don't need the error
message to be a mini-lecture on that topic. So in a case like this
where the possibilities are numerous, we are not missing anything by not
having the support.


Nevertheless, in some other error situations the list of possible tokens is
small. An error handler could look at the list of possible valid tokens and
decide that, say, if the list has fewer than four elements, they could be
listed in a nice error message.


``Here, a W occurs where either an X, Y or Z should occur''.


It would indeed be somewhat nice not to have to hard-code this kind of
behavior into the grammar productions in the ``low branching factor''
parts of the grammar.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.