21 Jul 2006 17:34:02 -0400

From: | Chris F Clark <cfc@shell01.TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | 21 Jul 2006 17:34:02 -0400 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 06-07-024 06-07-027 06-07-035 06-07-046 06-07-050 |

Keywords: | parse, LL(1) |

Preamble: excuse this note for being excessively harsh. Dodi normally

posts correct information, and as such deserves to be held to a high

standard, cognizant of the respect he has earned. However, he pushed

a hot button of mine by suggesting that left-recursion was some

hueristic thing. At the same time I made a mistake when originally

writing this assuming that Quinn's grammar was unambiguous, which it

is not. I read Quinn's grammar and assumed it was Andru's. It is a

mistake I commonly make because there are hueristics that change

Quinn's grammar into Andru's, and these hueristics are commonly

applied by LR parser generators and camoflague the fact that Quinn's

grammar is actually ambiguous.

Quinn wrote (q>):

q> expr ::= expr "+" expr;

q> expr ::= '[0-9]+';

Andru corrected (a>);

a> term ::= '[0-9]+';

a> expr ::= expr '+' term |

a> term;

I wrote (> >):

*> > An LL parser generator cannot handle the case correctly, as the FIRST*

*> > of both alternatives is 0-9.*

Dodi replied:

*> What's "correct" handling of such a case? LR parser generators also use*

*> heuristics in ambiguous situations.*

Now, I mistook Quinn's ambiguous grammar for Andru's correct grammar.

Thus, for Quinn's gramar you are correct. For Andru's grammar, the

situation is different.

Left recursion is not an inherently ambiguous situation. It is

perfectly valid in LR parsing to use left recursion and it does not

require heuristics to do so. The correct handling of non-ambiguous

left recursive cases is perfectly well defined. Ambiguous grammars do

require heuristics and considering them invalid is perfectly rational.

However, even for unambiguous left-recursive grammars the LL algorithm

fails to construct a parser, while the LR algorithm will properly

construct a parser.

*> > Thus, while it would be nice if an LL parser would wait to consume a*

*> > token before making a recursive call, the LL algorithm requires one*

*> > to make the recursive call before consuming the token, because*

*> > making the call is necessary to allow the called procedure to*

*> > consume the token.*

*> But the procedure already *has* been called, so it IMO is allowed to*

*> consume the token.*

Yes, but now it must make a recursive call. It must make the

recursive call because, if it were the name of a different

non-terminal it would have to call that non-terminal (and that might

lead indirectly back to a recursive call on this non-terminal).

*> > It also isn't a solution simply not to make the recursive call, for*

*> > if you don't make the recursive call, you simply have eliminated the*

*> > recursive rule from the grammar. If you simply wait to make the*

*> > recursive call until after the token has been consumed, you have*

*> > changed the grammar in another way.*

*> I agree that parser generators tend to change the grammar, in order to*

*> resolve conflicts. The sample grammar contains no indication about the*

*> grouping of the operation, but what does that *mean*? Does it mean*

*> that the grammar is useless, invalid, or simply ambiguous? Provided*

*> that the grammar is considered to be ambiguous, a parser generator*

*> either can reject it, or do whatever it likes.*

The problem here is there is no conflict in Andru's LR case, because

the LR algorithm correctly handles left-recursion. In Quinn's case,

you are right, there is no indication of the grouping of the operation

and the grammar is ambiguous. (So, if you are chastising me for

seeing Andru's grammar while reading Quinn's, I stand perfectly

chastised.)

By the way, the definition of correct parse is also not technology

specific. Every grammar is either ambiguous or has exactly one

correct parse. Now, some parsing technologies cannot find the correct

parse for some grammars--that includes some gramars which there are no

LL or LR parsers for as well as ones which have LR parsers and not LL

ones and grammars with LL(k) ones and not LR(1) ones, etc. However,

that doesn't change the definition of correct parse.--it's a rigorous,

precise, mathematical concept.

*> > The point is that in an LL parser, you need to determine by looking at*

*> > the lookahead (FIRST sets in this case) how many calls must be made,*

*> > because you must make exactly the right series of calls before*

*> > consuming the token.*

*> >*

*> > However, Dodi's assertion that LR parsers should (will) properly*

*> > handle this case is correct.*

*>*

*> It's not a matter of the parser, but instead of the parser generator.*

Ok, let me make that more clear. No parser generator executing the LL

algorithm for constructing a parser, can construct a parser from a

grammar with left-recursion. A parser generator running any LR

(e.g. LR(0), SLR, LALR, canonical LR, minimal state LR, etc.)

algorithm can construct parsers for left-recursive grammars, presuming

that grammar has no conflicts (and Andru's grammar is conflict-free).

There is a fundamental difference in the LL and LR parser generator

algorithms which allow LR parsers to be constructed for left-recursive

cases, while LL parsers are fundamentally limited from handling such

cases.

And that's the nit which is a hot button for me. It's fundamentally

easier to write an LL parser generator because the algorithm is

simpler, but handles fewer cases, not that writing an LR parser

generator is fundamentally that hard. Moreover, recursive descent

parsers are popular because their output is so "readable". Being able

to use left-recursion (and the oft maligned precedence and

associativity declarations) can make the grammar itself simpler and

more readable. That makes them some of the reasons to prefer LR

parsing. Therefore, I get overly defensive when those advantages are

misunderstood or belittled.

Hope this helps,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

23 Bailey Rd voice : (508) 435-5016

Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.