Re: Is there an error correcting parser generator out there?

clark@quarry.zk3.dec.com (Chris Clark USG)
Wed, 5 Oct 1994 20:29:49 GMT

          From comp.compilers

Related articles
[8 earlier articles]
Re: Is there an error correcting parser generator out there? johnm@po.EECS.Berkeley.EDU (1994-10-01)
Re: Is there an error correcting parser generator out there? adrian@platon.cs.rhbnc.ac.uk (1994-10-04)
Re: Is there an error correcting parser generator out there? andrew@bugalugs (1994-10-05)
Re: Is there an error correcting parser generator out there? rockwell@nova.umd.edu (1994-10-05)
Re: Is there an error correcting parser generator out there? rfg@netcom.com (1994-10-05)
Re: Is there an error correcting parser generator out there? thoni@softlab.se (1994-10-05)
Re: Is there an error correcting parser generator out there? clark@quarry.zk3.dec.com (1994-10-05)
Re: Is there an error correcting parser generator out there? johnm@po.EECS.Berkeley.EDU (1994-10-06)
| List of all articles for this month |

Newsgroups: comp.compilers
From: clark@quarry.zk3.dec.com (Chris Clark USG)
Keywords: parse, errors, LL(1), LR(1)
Organization: Digital Equipment Corporation
References: 94-09-142 94-10-023
Date: Wed, 5 Oct 1994 20:29:49 GMT

There have been a lot of good points in this debate about error recovery.


One point which has been glossed over, but is important, is the difference
between error reporting and error recovery. Error reporting is trying to
give the user enough information (without any extraneous details) so that
the user can see what they have done wrong. Error recovery is attempting
to fix the parsers internal state (including editting the token stream)
so that the parser can continue with the fewest possible future spurious
errors.


Now, I'd like to respond to some specific points, as a co-author of an LR
parser generator.


In article 94-09-187, jon@mauney.com (Jon Mauney) writes:
|> A) LL parsers have a very simple method of operation, which makes it
|> easy to extract the information needed for high-quality error
|> recovery. LL parsers predict what symbols are upcoming in
|> the left context, and push the prediction onto a stack,
|> from which it is quite easy to extract.
|> LR parsers have parsing information cleverly encoded
|> in CFSM states; error recovery needs to cleverly extract that
|> information again. So an error recovery algorithm for an LR parser
|> needs more code or more auxiliary tables than an equivalent algorithm
|> for an LL parser.


A valid point (actually an LL parser could also be encoded as a machine,
but that does not detract from the general point). It is true that in
some states an LR parser may have more than one possible context. This is
what makes LR languages a superset of LL languages. However, it is
possible to mentally model LR parsing as if it were just a set of parallel
LL parsers, each for the different context. If one does that, then any
places where the grammar is LL the parser has the same context information
as an LL parser. With that point of view, one can make LR error recovery
which works exactly like LL error recovery. (Of course, in practise one
does have to root the information out of the states of the machine (or
keep it in auxillary tables), but the information is there.)


For example, Yacc++ generates an auxillary table which contains the
context information on productions the programmer has indicated as
interesting. This allows the programmer to generate help messages based
upon only those productions. In a grammar, the expression production
might be considered interesting but not the factor or term productions.
In contrast, Yacc++ generates the list of expected tokens directly from
the state machine.


In article 94-10-005, Terence Parr <parrt@everest.ee.umn.edu> writes:
|> I've been promising something slick for ANTLR called "parser exception
|> handling" (PEH) for awhile now; it's in a testing stage as we speak. For
|> years, I've been trying to dream up ways to give the programmer good control
|> of error recovery, without lots of hassle. After long discussions with many
|> folks, the simple (and now obvious) leap from C++ exception handling to
|> parser exception handling has been made.


I think PEH is going to be a significant advancement in error reporting
and recovery. Exceptions are a very natural way to handle the problem of
needing higher level context in a lower level situation.


In article 94-10-023, adrian@platon.cs.rhbnc.ac.uk (A Johnstone) writes:
|> I just wanted to comment on the implied suggestion that RD parsers
|> militate against formality. Just because RD parsers are simple enough
|> to write by hand doesn't mean that ll(1) and ll(k) parser generators
|> don't exist (cf RDP and especially PCCTS). Personally I have always
|> been an RD convert for the very simple reason that debugging output
|> from LALR generators is a nightmare. When you are developing a new
|> language, as opposed to hacking in someone else's C grammar this is a
|> real issue.


I think that debugging the output of LALR generators has been difficult
only because most generator authors did not spend much time on presenting
the state machine. The problem is worse if you pack the states like yacc
does. [As one might suspect, Yacc++ has addressed this problem.]


The other side of it, is just because the generator is LR does not mean it
cannot generate RD code. It takes a little thought, and I've not seen
anyone do it, but I know it can be done.


|> IMHO LALR was originally justified because LALR grammars are `easier'
|> to write, not requiring left factoring. However, the advantages
|> disappear PDQ when you need to start adding semantic actions, which
|> often ends up requiring factoring... In addition, error recovery
|> certainly is a whole lot easier in RD parsers. On top of all that,
|> YACC's basic BNF syntax makes grammars much larger than they need to
|> be anyway.


There is only a problem if you want to put your semantic actions in the
middle of productions, which is not necessary, except for semantic actions
which match PCCTS's semantic predicates. If your LR parser generator
supports semantic predicates, then there is no reason to left factor at
all.


I will agree that EBNF is better than BNF, which is why Yacc++ supports
"direct translation" of regular expressions, an implementation method with
carries the EBNF advantage down to the resulting state machine.


Just my opinions,
-Chris
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.