Re: Backtracking yacc

Gary Merrill <sasghm@unx.sas.com>
Mon, 14 Sep 1992 13:02:00 GMT

          From comp.compilers

Related articles
Backtracking yacc jarmo@ksvltd.FI (Jarmo Raiha) (1992-09-10)
Re: Backtracking yacc ipser@solomon.technet.sg (1992-09-11)
Re: Backtracking yacc sasghm@unx.sas.com (Gary Merrill) (1992-09-11)
Re: Backtracking yacc sasghm@unx.sas.com (Gary Merrill) (1992-09-14)
Re: Backtracking yacc ipser@solomon.technet.sg (1992-09-16)
Re: Backtracking yacc bromage@mullauna.cs.mu.OZ.AU (1992-09-17)
Re: Backtracking yacc Jasper.Kamperman@cwi.nl (1992-09-17)
Re: Backtracking yacc sasghm@unx.sas.com (1992-09-17)
Re: Backtracking yacc diamond@jit081.enet.dec.com (18-Sep-1992 1420) (1992-09-18)
Re: Backtracking yacc harwood@progress.com (Tom Harwood) (1992-09-18)
[8 later articles]
| List of all articles for this month |
Newsgroups: comp.compilers
From: Gary Merrill <sasghm@unx.sas.com>
Organization: Compilers Central
Date: Mon, 14 Sep 1992 13:02:00 GMT
Keywords: yacc, parse, LL(1)
References: 92-09-062 92-09-059



> Such theorems are rarely of practical value; in practice you will find
> that rewriting an LR(k) grammar to be LR(1) (or whatever) will involve
> contorting the original syntax in ways that are perverse. While such a
> contortion might be acceptable for parsing alone, attempting to add simple
> semantic actions or, worse, trying to maintain the grammar for an evolving
> language can lead to premature greying. The Roskind C++ grammar is a
> notorious example.


I can second this with some enthusiasm. I in fact have written an LL(1)
parser generator that I transformed into a "grammar munger", and
ultimately into a tool that does the following:


            1. Sucks in a grammar (in more or less yacc-like format).
            2. Applies a load of transformations in an attempt to get the
                  grammar as close to LL(1) as possible.
            3. Generates an (LL(1)-like) parse table that contains conflicts.
            4. Emits C code for a recursive descent parser using that table,
                  where this code contains built-in backtracking code.


(Don't get the idea that this is a well-cooked product. I haven't tested
it that much, but it does seem to work. And at least it produces a table
and parser when run on our yacc grammar for C++ -- about 750 productions.)
I had hopes that the resulting RD parser would be maintainable in the
absence of the generating tool, but I suppose I should have known better.
The resulting grammar was quite counter-intuitive in having odd
"artificial" semantic categories. I expect things might be a bit better
with an LR(1)-ish grammar as the result, but not enough so it would make
much difference.


In cases such as these I think the primary theorem is something like, "You
can have a grammar that is pretty natural, or you can have a grammar that
really works: take your pick."


> As for backtracking, there are additional problems: again, what about
> semantic processing? If you are merely parsing or constructing a parse
> tree, you can get away with throwing away an incorrect guess. But as soon
> as you introduce side effects of even the simplest nature, you are left
> with the task of figuring out how to undo actions associated with the
> wrong guess. Even for a language as simle as Ansi C, you must perform
> certain semantic actions just to parse and C++ is even worse.


Yes, exactly. This is why my changes to Berkeley yacc include treating
the "usual" ( { ... } ) actions as *conditional* and implementing
*unconditional* actions ( [ ... ] ) that will be executed even during
trial parsing.


> A far better approach is to have available a set of tools which allow the
> disambiguation at the point where it is first encountered. LADE, for
> example, allows context-sensitive tests to be performed. These can be used
> to either a) test arbitrary semantic information to decide on a path to
> take, or b) parse ahead in parse-only mode to find some disambiguating
> symbol, then return and resume normal, action-associated, parsing. While b
> approaches a back-tracking capability, it is not, in fact, backtracking
> since it returns to the originating point regardless of what symbols were
> found in the scan- ahead.


Such tests can, of course, be performed in the lexical analyzer and you
can use the results of these tests to drive the invocations of your
parser. However, it would be *much* nicer to have a parser tool that
would take care of all this. I have not looked at LADE, but I have
literature pertaining to it on my desk. I think that if we decide to
replace our parser at some time in the future, however, we likely will use
RD.


Note that there are *still* some constructs in some languages (even in
C++) that require *some* semantic processing even during trial parsing in
order to parse correctly. (Think of defining a type in an argument
declaration list and then using that type in a later declaration in the
list -- this is an error in C++, but in order to perform a truly correct
parse you need to record the type definition so you can diagnose it and
not issue a spurious diagnostic on its subsequent use.)
---
Gary H. Merrill [Principal Systems Developer, C Compiler Development]
SAS Institute Inc. / SAS Campus Dr. / Cary, NC 27513 / (919) 677-8000
sasghm@theseus.unx.sas.com ... !mcnc!sas!sasghm
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.