Re: Relaxed pattern recognition

Chris Dodd <cdodd@acm.org>
13 Dec 2003 20:59:44 -0500

          From comp.compilers

Related articles
Relaxed pattern recognition mars@realsoftware.com (2003-12-08)
Re: Relaxed pattern recognition Ralf.Laemmel@cwi.nl (Ralf Laemmel) (2003-12-13)
Re: Relaxed pattern recognition haberg@matematik.su.se (2003-12-13)
Re: Relaxed pattern recognition cdodd@acm.org (Chris Dodd) (2003-12-13)
| List of all articles for this month |

From: Chris Dodd <cdodd@acm.org>
Newsgroups: comp.compilers
Date: 13 Dec 2003 20:59:44 -0500
Organization: Compilers Central
References: 03-12-072
Keywords: parse
Posted-Date: 13 Dec 2003 20:59:44 EST

mars@realsoftware.com (Mars Saxman) wrote in news:03-12-072@comp.compilers:
> Using the For statement from Basic, here's an example of what I have
> in mind in pseudo-Bison-syntax:
>
> line: token* ENDL;
> for_line: FOR token* ENDL;
> for_stmt: FOR ident '=' exp TO exp ENDL;
> next_line: NEXT token* ENDL;
> next_stmt: NEXT [ident] ENDL;
> forloop: for_line line* next_line;


Some of what you want to do can be done using bison's built-in errorr
recoverly with error rules. The best way to think of the error rule
in bison is as a token* like you have in your rules. So you can write
bison code like:


lines: lines line | line ;
line: for_line | next_line | error ENDL ;
for_line: for_stmt | FOR error ENDL ;
for_stmt: FOR ident '=' exp TO exp ENDL ;
next_line: next_stmt | NEXT error ENDL ;
next_stmt: NEXT ENDL | NEXT ident ENDL ;
forloop: for_line lines next_line ;


Basically, the error special token takes the place of the token* you
have in your original grammar and serves to absorb some number of tokens
to allow the parser to resynchronize after a syntax error.


The main problem with "error" is that it sometimes tends to absorb too
much -- while the error rules will never be triggered unless there is
a syntax error, it can somtimes get the wrong one and absorb too many
tokens. In this specific case, it will be all right as all the error
tokens are followed by ENDL tokens, so it wil never absorb an ENDL,
but if you had another error rule, say for exp, that wasn't followed
by an ENDL, an error in an exp might absorb an ENDL and still fail to
recover, triggering the next error token up the tree (perhaps the one
in for_line), which would then absorb tokens up to the next ENDL --
causing you to miss an entire line that might otherwise per parseable.


When you combine this with a parser that maintains some kind of symbol
table (particularly when the symbol table feeds back to the lexer so
it can decide on which token to return), this can result in the parser
missing important symbol declarations due to near by syntax errors,
which can then result in a long cascade of spurious errors, such as
you often see coming out of some C compilers.


-chris


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.