Related articles |
---|
[4 earlier articles] |
Re: Error reporting in LR parsers heirich@cs.ucsd.edu (1989-08-08) |
Re: Error reporting in LR parsers rusty@garnet.Berkeley.EDU (1989-08-10) |
Re: Error reporting in LR parsers djones@megatest.com (1989-08-10) |
Re: Error reporting in LR parsers djones@megatest.com (1989-08-10) |
Re: Error reporting in LR parsers djones@megatest.com (1989-08-10) |
Re: Error reporting in LR parsers markg@well.sf.ca.us (1989-08-15) |
Re: Error reporting in LR parsers eachus@mbunix.mitre.org (1989-08-14) |
Date: | Mon, 14 Aug 89 21:24:23 EDT |
From: | eachus@mbunix.mitre.org (Robert Eachus) |
Posted-From: | The MITRE Corp., Bedford, MA |
Newsgroups: | comp.compilers |
In-Reply-To: | <1989Aug12.201931.4857@esegue.uucp> |
Organization: | The MITRE Corporation, Bedford, Mass. |
In article <1989Aug12.201931.4857@esegue.uucp> Mark Grand writes:
>Generating a list of acceptable tokens before allowing YACC to perform a
>default reduction is expensive. A cheaper way (assumimg a fast
>implementation of memcpy) is to take a snapshot of YACC's state stack every
>time it gets a new token. That way you can generate a list of the expected
>tokens from the snapshot and only have to do it when actually needed.
There is an even faster way. Pat Prange and I used it in a
parser generator and driver {LALR} which is available on Multics. If
the next action to be taken by the compiler is a reduction (other than
into an accepting state) keep a list of states but do nothing. If it
is the first read (shift) following an accepting state hold it to one
side also. When a succeeding legal token has been read and is ready
to shift onto the stack (or an accepting state is reached), do the
pending read (if any) then do all pending reductions up to the current
read.
With this scheme, error correction can start from the point just
before the last legal token was read. This allows all sorts of
possible error recovery strategies to be tried. We had twelve
combinations that we tried including deleting the previously read
token and swaping the previous and current tokens. All this was
"free" in that it required a few bytes of storage in the drivers
stack, and about 3 or 4 extra instructions per successful shift or
reduction. (Failures are another story, we allowed ourselves about a
tenth of a CPU second per correction, but it was worth it since the
compilers could often continue through several difficult syntax errors
and do a meaningful error scan thorugh an entire program.
Two of my favorite corrections came from the Ada Compiler
Validation Suite:
legal-statement;
IFB THEN ...
was corrected to:
legal-statement;
if IFB then ...
and
X: INTEGER ran ge 1..10;
was corrected to:
X: INTEGER range 1..10;
All without special error tables, and without looking at the
spellings of non-terminals! In fact the parser tables for the Ada
grammar above were about 3500 36-bit words, including 36 words for
error recovery (closures for constructs to be deleted if panic mode
was used to throw a sequence of invalid tokens).
As far as I know this tool is still used to maintain the parsers
for several Honeywell compilers (especially for the DPS6 and DPS6+,
but not the Ada compiler from DDC), and the source (highly Multics
specific PL/1) is on every existing machine running Multics. (There
just aren't too many of them any more...sigh!)
Robert I. Eachus
eachus@mbunix.mitre.org
Return to the
comp.compilers page.
Search the
comp.compilers archives again.