Re: Unusual error recovery in Yacc parser

James Kanze US/ESC 60/3/141 #40763 <kanze@lts.sel.alcatel.de>
27 Feb 1996 16:27:51 -0500

          From comp.compilers

Related articles
Unusual error recovery in Yacc parser smueller@MICROSOFT.com (Stephan Mueller) (1996-02-23)
Re: Unusual error recovery in Yacc parser kanze@lts.sel.alcatel.de (James Kanze US/ESC 60/3/141 #40763) (1996-02-27)
| List of all articles for this month |
From: James Kanze US/ESC 60/3/141 #40763 <kanze@lts.sel.alcatel.de>
Newsgroups: comp.compilers
Date: 27 Feb 1996 16:27:51 -0500
Organization: Compilers Central
References: 96-02-259
Keywords: yacc, parse

Stephan Mueller <smueller@MICROSOFT.com> writes:


|> Suppose we have a Yacc (actually Bison) parser for a language
|> consisting of, in general, arbitrary keywords in arbitrary order. We
|> have one rule for handling syntax errors, which is to completely
|> ignore any keyword found at an inappropriate time. How do I do this
|> best?


|> Getting more concrete: I have a Bison-based parser for RTF. RTF (Rich
|> Text Format) text consists of plain text, easily identifiable
|> keywords, and groups delimited by '{' and '}' containing RTF. Some
|> groups are 'special' in that they may contain only a subset of the
|> roughly 800 keywords currently defined in Rtf. There are numerous
|> problems in parsing RTF, but I've overcome most of these with a rather
|> smart lexer. The lexer returns a separate token for each possible
|> keyword. Things that look like keywords but are unknown are discarded
|> by the lexer, so every keyword seen by the grammar is legal in some
|> context.


I had an article in SigPLAN some years ago (about 5?) on handling
ambiguous tokens. I don't have the exact reference handy, but the
basic ideas was to modify the generated parser (using an sed script)
to call a special routine (yylex2) for an alternative token when it
couldn't shift, and only go into the error handling if this routine
returned 0. I've not tried this with bison, but I imagine that the
same basic idea would work.


In your case, you would probably want to create a special token for
yylex2 to return, which could appear anywhere, and would be ignored by
the grammar. (I'm not sure how difficult this would be, but I have a
sneeking suspicion that it is not as easy as it sounds.)
Alternatively, it shouldn't be too difficult to modify the generated
parser to simply go on and read the next token whenever it cannot
shift.


Another solution I've used since the article is to use awk to extract
lists of legal tokens for each state from the yy.output file, and have
the lexer look up whether the token would be legal in the current
state. Given the number of keywords, this could result in an awful
lot of tables, however.


--
James Kanze Tel.: (+33) 88 14 49 00 email: kanze@gabi-soft.fr
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.