Related articles |
---|
wanted: error-correcting C parser keller@trurl.informatik.uni-dortmund.de (1995-02-13) |
Re: wanted: error-correcting C parser umrigar@cs.binghamton.edu (Zerksis D. Umrigar) (1995-02-15) |
Re: wanted: error-correcting C parser grosch@cocolab.sub.com (1995-02-18) |
Re: wanted: error-correcting C parser parrt@parr-research.com (Terence John Parr) (1995-02-20) |
Newsgroups: | comp.compilers |
From: | Terence John Parr <parrt@parr-research.com> |
Keywords: | C, parse, errors |
Organization: | Compilers Central |
References: | 95-02-114 |
Date: | Mon, 20 Feb 1995 06:36:01 GMT |
keller@trurl.informatik.uni-dortmund.de (Robert E. Keller) writes:
> I'm looking for a parser which understands C (preferably ANSI-C)
> (or some "reasonable" subset) and which, if it encounters a syntax
> error, delivers a set of all those tokens/symbols which are correct
> in place of the wrong symbol (not caring for what comes after the wrong
> symbol, of course);
>
> example:
> given
>
> extern short c;
> int fnc(int a, int b){return a+;}
> ..
>
> the described parser comes forward with s.th. like
>
> error: line 2, symbol #14 (";");
> correct symbols: a b c ( ' ! <.. some more unary operators etc. ..>
> correct tokens: NUMBER
I've got two things for you: (i) an exact solution and (ii)
a new way to handle parser errors called PARSER EXCEPTION HANDLING:
I just ran your input through the ANSI C front end I built
for ANTLR (the predicated-LL(k) parser generator of PCCTS) and this
is what I got:
line 2: syntax error at ";" missing { LPARENTHESIS AMPERSAND MINUS STAR PLUSPLUS MINUSMINUS ONESCOMPLEMENT NOT SIZEOF OCTALINT DECIMALINT HEXADECIMALINT FLOATONE FLOATTWO IDENTIFIER STRING CHARACTER }
This is what happens without me telling ANTLR anything about error
correction. The parser also recovered automatically and continued
with the parse. It is the default mode.
On the other hand, a more clear error message would be
line 2: syntax error at ";" missing { MOP AOP "expression atom" ... }
and could easily be generated by giving ANTLR an error class which
groups tokens (printed during error messages); e.g.,
#errclass "expression atom" { STRING CHARACTER ... }
However, if you want something even more powerful, you should use the really
slick PARSER EXCEPTION HANDLING. You essentially provide handlers for
different error signals such as NoViableAlt or MismatchedToken.
return_stat
: "return" e:expression
;
exception[e]
catch MismatchedToken :
catch NoViableAlt :
<<
/* do error reporting and recovery here */
>>
which means "any time the parser gets a mismatched token or can't
find any alternative of a rule to match the current input specifically
during the parse of expression, jump all the back up and do this error
handler."
There are lots of details to make much of this happen automatically too
and there are many more features. These critters are avaiable
using either the C or the C++ interface.
> any reference to parser generators capable of generating such a parser
> is welcome, too;
My shameless plug would be for PCCTS which you can check out at
ftp.parr-research.com in pub/pccts
The ANSI C grammar is in pub/pccts/contrib/ansi.tar and the
latest version is 1.31, but 1.32b3 is up on the net already.
(there has been a shuffling of IP addresses at the site where
ftp.parr-research.com points...your name server may not
have been updated yet; it should be ok).
The documentation is complete, but spread over the set of
release notes. I'm working steadily on the book and will
have it out this Summer if it kills me (which it might).
Regards,
Terence
parrt@parr-research.com
http://www.parr-research.com/~parrt
PS If people are interested, I can outline more of the
parser exception handling. It's proven very effective
thus far--even better than hand-built stuff because ANTLR
handles lots of details for you (such as managing the
call stack).
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.