From: | Chris F Clark <cfc@shell01.TheWorld.com> |
Newsgroups: | comp.compilers |
Date: | 21 Jul 2006 17:34:02 -0400 |
Organization: | The World Public Access UNIX, Brookline, MA |
References: | 06-07-024 06-07-027 06-07-035 06-07-046 06-07-050 |
Keywords: | parse, LL(1) |
Posted-Date: | 21 Jul 2006 17:34:02 EDT |
Preamble: excuse this note for being excessively harsh. Dodi normally
posts correct information, and as such deserves to be held to a high
standard, cognizant of the respect he has earned. However, he pushed
a hot button of mine by suggesting that left-recursion was some
hueristic thing. At the same time I made a mistake when originally
writing this assuming that Quinn's grammar was unambiguous, which it
is not. I read Quinn's grammar and assumed it was Andru's. It is a
mistake I commonly make because there are hueristics that change
Quinn's grammar into Andru's, and these hueristics are commonly
applied by LR parser generators and camoflague the fact that Quinn's
grammar is actually ambiguous.
Quinn wrote (q>):
q> expr ::= expr "+" expr;
q> expr ::= '[0-9]+';
Andru corrected (a>);
a> term ::= '[0-9]+';
a> expr ::= expr '+' term |
a> term;
I wrote (> >):
> > An LL parser generator cannot handle the case correctly, as the FIRST
> > of both alternatives is 0-9.
Dodi replied:
> What's "correct" handling of such a case? LR parser generators also use
> heuristics in ambiguous situations.
Now, I mistook Quinn's ambiguous grammar for Andru's correct grammar.
Thus, for Quinn's gramar you are correct. For Andru's grammar, the
situation is different.
Left recursion is not an inherently ambiguous situation. It is
perfectly valid in LR parsing to use left recursion and it does not
require heuristics to do so. The correct handling of non-ambiguous
left recursive cases is perfectly well defined. Ambiguous grammars do
require heuristics and considering them invalid is perfectly rational.
However, even for unambiguous left-recursive grammars the LL algorithm
fails to construct a parser, while the LR algorithm will properly
construct a parser.
> > Thus, while it would be nice if an LL parser would wait to consume a
> > token before making a recursive call, the LL algorithm requires one
> > to make the recursive call before consuming the token, because
> > making the call is necessary to allow the called procedure to
> > consume the token.
> But the procedure already *has* been called, so it IMO is allowed to
> consume the token.
Yes, but now it must make a recursive call. It must make the
recursive call because, if it were the name of a different
non-terminal it would have to call that non-terminal (and that might
lead indirectly back to a recursive call on this non-terminal).
> > It also isn't a solution simply not to make the recursive call, for
> > if you don't make the recursive call, you simply have eliminated the
> > recursive rule from the grammar. If you simply wait to make the
> > recursive call until after the token has been consumed, you have
> > changed the grammar in another way.
> I agree that parser generators tend to change the grammar, in order to
> resolve conflicts. The sample grammar contains no indication about the
> grouping of the operation, but what does that *mean*? Does it mean
> that the grammar is useless, invalid, or simply ambiguous? Provided
> that the grammar is considered to be ambiguous, a parser generator
> either can reject it, or do whatever it likes.
The problem here is there is no conflict in Andru's LR case, because
the LR algorithm correctly handles left-recursion. In Quinn's case,
you are right, there is no indication of the grouping of the operation
and the grammar is ambiguous. (So, if you are chastising me for
seeing Andru's grammar while reading Quinn's, I stand perfectly
chastised.)
By the way, the definition of correct parse is also not technology
specific. Every grammar is either ambiguous or has exactly one
correct parse. Now, some parsing technologies cannot find the correct
parse for some grammars--that includes some gramars which there are no
LL or LR parsers for as well as ones which have LR parsers and not LL
ones and grammars with LL(k) ones and not LR(1) ones, etc. However,
that doesn't change the definition of correct parse.--it's a rigorous,
precise, mathematical concept.
> > The point is that in an LL parser, you need to determine by looking at
> > the lookahead (FIRST sets in this case) how many calls must be made,
> > because you must make exactly the right series of calls before
> > consuming the token.
> >
> > However, Dodi's assertion that LR parsers should (will) properly
> > handle this case is correct.
>
> It's not a matter of the parser, but instead of the parser generator.
Ok, let me make that more clear. No parser generator executing the LL
algorithm for constructing a parser, can construct a parser from a
grammar with left-recursion. A parser generator running any LR
(e.g. LR(0), SLR, LALR, canonical LR, minimal state LR, etc.)
algorithm can construct parsers for left-recursive grammars, presuming
that grammar has no conflicts (and Andru's grammar is conflict-free).
There is a fundamental difference in the LL and LR parser generator
algorithms which allow LR parsers to be constructed for left-recursive
cases, while LL parsers are fundamentally limited from handling such
cases.
And that's the nit which is a hot button for me. It's fundamentally
easier to write an LL parser generator because the algorithm is
simpler, but handles fewer cases, not that writing an LR parser
generator is fundamentally that hard. Moreover, recursive descent
parsers are popular because their output is so "readable". Being able
to use left-recursion (and the oft maligned precedence and
associativity declarations) can make the grammar itself simpler and
more readable. That makes them some of the reasons to prefer LR
parsing. Therefore, I get overly defensive when those advantages are
misunderstood or belittled.
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.