Re: Q: Error detection/recovery in LEX/YACC (Help)

neitzel@ips.cs.tu-bs.de (Martin Neitzel)
Mon, 20 Dec 1993 23:34:23 GMT

          From comp.compilers

Related articles
Q: Error detection/recovery in LEX/YACC (Help) friesend@herald.usask.ca (1993-12-17)
Re: Q: Error detection/recovery in LEX/YACC (Help) collison@osf.org (1993-12-20)
Re: Q: Error detection/recovery in LEX/YACC (Help) neitzel@ips.cs.tu-bs.de (1993-12-20)
Re: Q: Error detection/recovery in LEX/YACC (Help) neitzel@ips.cs.tu-bs.de (1993-12-23)
Re: Q: Error detection/recovery in LEX/YACC (Help) hage@netcom.com (1994-01-18)
Re: Q: Error detection/recovery in LEX/YACC (Help) neitzel@ips.cs.tu-bs.de (1994-01-27)
| List of all articles for this month |

Newsgroups: comp.compilers
From: neitzel@ips.cs.tu-bs.de (Martin Neitzel)
Keywords: lex, yacc, errors, bibliography
Organization: Inst. f. Informatik, TU Braunschweig, FRG
References: 93-12-081
Date: Mon, 20 Dec 1993 23:34:23 GMT

Oops, this got awfully long. Here is a short table of contents:


* classic literature on lex&yacc
* a very helpful visualization of yacc's workings
* scanners: lex vs hand-written
* some notes on error recovery


Some suggestions if you are just starting with yacc (& lex):


* Read the existing documentation carefully (and parts of it more than
    once). The classical papers by their authors come with every unix
    documentation set. They are not part of the online man pages, but
    can be found as "supplementary docments" (sometimes as copies of the
    original (eg. Ultrix), sometimes somewhat revamped (eg. SunOS)).
    You should read them not only because they are the "official
    references", but also because they are nice tutorials with many
    practical hints. (Many of them become apparent only after some
    practice.)


%r 39
%K CSTR
%R Comp. Sci. Tech. Rep. No. 39
%I Bell Laboratories
%C Murray Hill, New Jersey
%A M. E. Lesk
%T Lex \(em A Lexical Analyzer Generator
%D October 1975


%r 32
%K CSTR
%R Comp. Sci. Tech. Rep. No. 32
%I Bell Laboratories
%C Murray Hill, New Jersey
%A S. C. Johnson
%T Yacc \(em Yet Another Compiler-Compiler
%D July 1975


    J. R. Levine et.al. "Lex & Yacc" (O'Reilly&assoc.) is a marvellous
    book, not only for beginners. Schreiner/Friedmann is very
    comparable. I prefer "Levine et.al." on a slight margin.


    The following item is -again- quite old but nevertheless the most
    readable introduction to LR parsing and YACC theory I ever came
    across. If you want to know how "default reductions" affect you,
    this is worthy to be read:


%T LR Parsing
%A A. V. Aho
%A S. C. Johnson
%J Comp. Surveys
%V 6
%N 2
%P 99-124
%D June 1974


* Get Bob Corbett's "byacc" and the small but effective modifications
    by J. Roskind. The latter will give you a tree-ish trace which was,
    at least for me, the key for understanding the initially intricate
    mechanism for error recovery ("Ah, *now* yyerror() gets called, but
    we havn't started resyncing yet...and *now* we are synced
    again...and *here* comes the {action}..."). If you have trouble
    locating Roskind's "skeleton.c" replacement, I can mail it to you.
    [It's in the compilers archive as part of Roskind's C++ grammar -John]


* Unlike the case for yacc, it is often somewhat hard to make a
    decision for or against the use of lex. Scanners are simple and
    quick to formulate with lex. However, if the generated scanner
    can't be taken as is and you can't get what you want by fiddling
    around with the macros and start post-processing the generated
    source code, which is not uncommon, you'll get a somewhat
    complicated setup involving sed-Skripts and Makefiles which I
    wouldn't inflict on absolute beginners. If the tokens are simple, a
    hand written scanner will be just a 20-line C routine and easier to
    maintain, debug, and adapt to the environment. (It certainly pays
    off to know what a "%*c%[a-z]" format means to sscanf!)


* "eyacc" was a yacc variant developed at Berkeley for the
    implementation of their pc/pi/pix Pascal compiler. On most systems,
    the man page just said: "eyacc is a yacc variant providing better
    error recovery" and left you alone. It has nowadays been dropped
    from most unix distributions. A better man page would have told
    you:


"Eyacc is an old version of yacc(1). Eyacc fully enumerates
test actions in its parser when an error token is in the
look-ahead set. This prevents the parser from making
undesirable reductions when an error occurs before the error
is detected. The table format is different in eyacc than it
was in the old yacc, as minor changes had been made for
efficiency reasons."


    before leaving you alone, too. Now, you can read the whole story
    if you read in three sources:


          ``Practical LR Error Recovery'' by Susan L. Graham, Charles
          B. Haley and W. N. Joy; SIGPLAN Conference on Compiler Con-
          struction, August 1979.


          ``Practical Syntactic Error Recovery'' by Susan L. Graham and
          Steven P. Rhodes; Communications of the ACM, Vol.18 No.11,
          November 1975


          and of course the eyacc source itself, if you are lucky to have a
          BSD source license. I take the freedom to reveil an excerpt from
          the READ_ME:


"[note also that] these changes are really useful only with a
fairly large set of error recovery routines which are part of
both "pi" and "pxp". The routines are language independent,
but the method will only work on languages which have some
redundancy in them... it is probably ill suited for C, but
would work fine in ALGOL-60, ALGOL-W, EUCLID, LIS, SP/K, PL/1,
ALPHARD, CLU, ..."


    So, how does all this relate to you? To put it in a nutshell: the
    default reductions in yacc prevent many useful diagnostics like
    "what production is currently active?" or "which tokens are legal
    here?". Many programmers have unsuccessfully tried to reconstruct
    such information from yacc-generated parsers. I found some code for
    computing valid lookahead symbols here and there, but it was never
    correct. The eyacc history shows that it requires quite some work
    and expertise to do a truly proper job.


This isn't to say that you can't generate parsers with yacc which can deal
with errors. Just indicating the current yyline and yytext and barfing
"syntax error" goes already a long way. The "error" mechanism is not as
complicated as it seems. In my experience, beginners are often
over-zealous with productions like


assignment : lhs '=' rhs ';'
| lhs '=' error { puts ("rhs kaput"); } ';'


where yacc can provide much better resyncing if you restrict yourself to a
catch-all "statement: ... | assignment | error". Make yourself clear what
happens (with the Corbett/Roskind combi mentioned above) before sinking
time into guessing and sprinkling productions which might never get
reduced anyway. Look into yacc grammars written by old-hands. You'll
find very good examples not only on effective error handling but also on
the nature of actions. In my early yacc days, I was too keen on
delegating action stuff to external procedures, nowadays I just go ahead
and plug the tree structures together inline:


idlist : idlist ',' id { FOO *l; for (l=$1; l->next; l=l->next);
l->next = $3; $$=$1;
/* wham bam thank you mam */ }


The hardest thing to do at error recovery is often the prevention of
partly undefined data structures and memory leaks due to lost data
structures which have already been allocated and built at lower levels,
but not connected to a main parse tree.


Martin Neitzel


(PS: Digging through the eyacc sources reminded me on one of few
comments in Kernighan's "pic" yacc source. It just said:
/* WITCHCRAFT */ and lo' and behold, it was! :-)
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.