Related articles |
---|
Q: Error detection/recovery in LEX/YACC (Help) friesend@herald.usask.ca (1993-12-17) |
Re: Q: Error detection/recovery in LEX/YACC (Help) collison@osf.org (1993-12-20) |
Re: Q: Error detection/recovery in LEX/YACC (Help) neitzel@ips.cs.tu-bs.de (1993-12-20) |
Re: Q: Error detection/recovery in LEX/YACC (Help) neitzel@ips.cs.tu-bs.de (1993-12-23) |
Re: Q: Error detection/recovery in LEX/YACC (Help) hage@netcom.com (1994-01-18) |
Re: Q: Error detection/recovery in LEX/YACC (Help) neitzel@ips.cs.tu-bs.de (1994-01-27) |
Newsgroups: | comp.compilers |
From: | neitzel@ips.cs.tu-bs.de (Martin Neitzel) |
Keywords: | lex, yacc, errors, bibliography |
Organization: | Inst. f. Informatik, TU Braunschweig, FRG |
References: | 93-12-081 |
Date: | Mon, 20 Dec 1993 23:34:23 GMT |
Oops, this got awfully long. Here is a short table of contents:
* classic literature on lex&yacc
* a very helpful visualization of yacc's workings
* scanners: lex vs hand-written
* some notes on error recovery
Some suggestions if you are just starting with yacc (& lex):
* Read the existing documentation carefully (and parts of it more than
once). The classical papers by their authors come with every unix
documentation set. They are not part of the online man pages, but
can be found as "supplementary docments" (sometimes as copies of the
original (eg. Ultrix), sometimes somewhat revamped (eg. SunOS)).
You should read them not only because they are the "official
references", but also because they are nice tutorials with many
practical hints. (Many of them become apparent only after some
practice.)
%r 39
%K CSTR
%R Comp. Sci. Tech. Rep. No. 39
%I Bell Laboratories
%C Murray Hill, New Jersey
%A M. E. Lesk
%T Lex \(em A Lexical Analyzer Generator
%D October 1975
%r 32
%K CSTR
%R Comp. Sci. Tech. Rep. No. 32
%I Bell Laboratories
%C Murray Hill, New Jersey
%A S. C. Johnson
%T Yacc \(em Yet Another Compiler-Compiler
%D July 1975
J. R. Levine et.al. "Lex & Yacc" (O'Reilly&assoc.) is a marvellous
book, not only for beginners. Schreiner/Friedmann is very
comparable. I prefer "Levine et.al." on a slight margin.
The following item is -again- quite old but nevertheless the most
readable introduction to LR parsing and YACC theory I ever came
across. If you want to know how "default reductions" affect you,
this is worthy to be read:
%T LR Parsing
%A A. V. Aho
%A S. C. Johnson
%J Comp. Surveys
%V 6
%N 2
%P 99-124
%D June 1974
* Get Bob Corbett's "byacc" and the small but effective modifications
by J. Roskind. The latter will give you a tree-ish trace which was,
at least for me, the key for understanding the initially intricate
mechanism for error recovery ("Ah, *now* yyerror() gets called, but
we havn't started resyncing yet...and *now* we are synced
again...and *here* comes the {action}..."). If you have trouble
locating Roskind's "skeleton.c" replacement, I can mail it to you.
[It's in the compilers archive as part of Roskind's C++ grammar -John]
* Unlike the case for yacc, it is often somewhat hard to make a
decision for or against the use of lex. Scanners are simple and
quick to formulate with lex. However, if the generated scanner
can't be taken as is and you can't get what you want by fiddling
around with the macros and start post-processing the generated
source code, which is not uncommon, you'll get a somewhat
complicated setup involving sed-Skripts and Makefiles which I
wouldn't inflict on absolute beginners. If the tokens are simple, a
hand written scanner will be just a 20-line C routine and easier to
maintain, debug, and adapt to the environment. (It certainly pays
off to know what a "%*c%[a-z]" format means to sscanf!)
* "eyacc" was a yacc variant developed at Berkeley for the
implementation of their pc/pi/pix Pascal compiler. On most systems,
the man page just said: "eyacc is a yacc variant providing better
error recovery" and left you alone. It has nowadays been dropped
from most unix distributions. A better man page would have told
you:
"Eyacc is an old version of yacc(1). Eyacc fully enumerates
test actions in its parser when an error token is in the
look-ahead set. This prevents the parser from making
undesirable reductions when an error occurs before the error
is detected. The table format is different in eyacc than it
was in the old yacc, as minor changes had been made for
efficiency reasons."
before leaving you alone, too. Now, you can read the whole story
if you read in three sources:
``Practical LR Error Recovery'' by Susan L. Graham, Charles
B. Haley and W. N. Joy; SIGPLAN Conference on Compiler Con-
struction, August 1979.
``Practical Syntactic Error Recovery'' by Susan L. Graham and
Steven P. Rhodes; Communications of the ACM, Vol.18 No.11,
November 1975
and of course the eyacc source itself, if you are lucky to have a
BSD source license. I take the freedom to reveil an excerpt from
the READ_ME:
"[note also that] these changes are really useful only with a
fairly large set of error recovery routines which are part of
both "pi" and "pxp". The routines are language independent,
but the method will only work on languages which have some
redundancy in them... it is probably ill suited for C, but
would work fine in ALGOL-60, ALGOL-W, EUCLID, LIS, SP/K, PL/1,
ALPHARD, CLU, ..."
So, how does all this relate to you? To put it in a nutshell: the
default reductions in yacc prevent many useful diagnostics like
"what production is currently active?" or "which tokens are legal
here?". Many programmers have unsuccessfully tried to reconstruct
such information from yacc-generated parsers. I found some code for
computing valid lookahead symbols here and there, but it was never
correct. The eyacc history shows that it requires quite some work
and expertise to do a truly proper job.
This isn't to say that you can't generate parsers with yacc which can deal
with errors. Just indicating the current yyline and yytext and barfing
"syntax error" goes already a long way. The "error" mechanism is not as
complicated as it seems. In my experience, beginners are often
over-zealous with productions like
assignment : lhs '=' rhs ';'
| lhs '=' error { puts ("rhs kaput"); } ';'
where yacc can provide much better resyncing if you restrict yourself to a
catch-all "statement: ... | assignment | error". Make yourself clear what
happens (with the Corbett/Roskind combi mentioned above) before sinking
time into guessing and sprinkling productions which might never get
reduced anyway. Look into yacc grammars written by old-hands. You'll
find very good examples not only on effective error handling but also on
the nature of actions. In my early yacc days, I was too keen on
delegating action stuff to external procedures, nowadays I just go ahead
and plug the tree structures together inline:
idlist : idlist ',' id { FOO *l; for (l=$1; l->next; l=l->next);
l->next = $3; $$=$1;
/* wham bam thank you mam */ }
The hardest thing to do at error recovery is often the prevention of
partly undefined data structures and memory leaks due to lost data
structures which have already been allocated and built at lower levels,
but not connected to a main parse tree.
Martin Neitzel
(PS: Digging through the eyacc sources reminded me on one of few
comments in Kernighan's "pic" yacc source. It just said:
/* WITCHCRAFT */ and lo' and behold, it was! :-)
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.