25 Jul 2006 00:40:16 -0400

From: | Hans-Peter Diettrich <DrDiettrich1@aol.com> |

Newsgroups: | comp.compilers |

Date: | 25 Jul 2006 00:40:16 -0400 |

Organization: | Compilers Central |

References: | 06-07-059 06-07-065 |

Keywords: | parse |

SM Ryan schrieb:

*> # 1. LL parsers cannot handle left recursion, whereas LR parsers cannot*

*> # handle right recursion.*

*>*

*> A left recursive grammar is not an LL(k) grammar, but the grammar can*

*> be mechanically transformed to rid it of left recursion. The resulting*

*> grammar might still not be LL(k).*

*>*

*> LR(k) can handle any deterministic grammar. Left recursive productions*

*> only need one production on the stack; right recursion piles up the*

*> entire recursive nest on the stack and then reduces all of them at*

*> once: right recursion requires a deeper stack.*

Thanks for your explanations, but I'm still not fully convinced ;-)

So far I only used CoCo/R to create recursive descent parsers, so that

I had to handle all non-LL(1) cases manually. Perhaps hereby I have

applied some modifications to the grammar, when I made work e.g. my C

parser without problems with left recursive rules.

That's why I thought that LR parsers have similar problems with right

recursion (epsilon moves?), where a parser generator also would have

to apply some built-in rules, in order to resolve these problems. But

this only is a feeling, I'm not very familiar with LR parsers, because

I couldn't yet find a working parser generator with Pascal output.

At least I understand now, that right recursion should be *avoided* in

LR grammars, in order to keep the stack size low.

*> # 2. Most languages are ambiguous at the syntax level, so that implicit*

*> # disambiguation (longest match...) or explicit semantical constraints*

*> # must be introduced. (see: dangling else...).*

*>*

*> Only poorly designed programming languages are ambiguous. (Natural*

*> languages are ambiguous and don't use deterministic language theory.)*

Counterexample:

The argument list of a subroutine is nothing but a list, which can be

expressed using either left or right recursion in a grammar.

Perhaps I misused the term "ambiguous" here, when I meant that different

parse trees can be constructed for a sentence of a language?

*> Many programming language grammars are ambiguous because the people*

*> writing the grammars are lazy and/or uneducated in these matters. The*

*> dangling else nonproblem was solved some 40 years ago. Anyone who*

*> thinks this is still a problem is just too lazy to write a well known*

*> solution.*

The dangling else problem can be solved by adding implicit general rules

to the interpretation of a language (or grammar?). Of course there exist

ways to prevent the occurence of such problems, just in the language

specification. But AFAIR it's impossible to prove, in the general case,

that a language is inambigous.

*>*

*> # 3. Only LL(1) recursive descent parsers are readable, that's why no*

*> # LL(k) parser generators exist, in contrast to LR(k) parser generators.*

*>*

*> Recursive descent is not LL(k). Recursive descent is an implementation*

*> technique not a language class.*

Okay, but what's the relationship between leftmost/rightmost derivation

and a language?

*> There are recursive ascent parsers for*

*> LR(k) grammars. LR(k) parsers can be written as recursive finite state*

*> transducers, with right recursion and embedding requiring recursion*

*> and left recursion merely looping; if a language uses left recursion*

*> only (type iii), the LR(k) state diagram is easily convertible to a*

*> finite transducer for the language.*

I'm not sure what you want to tell me. AFAIR LR(k) (languages?

grammars?) can be transformed into LR(1), for which a finite state

machine can be constructed easily. I assume that a transformation from

LL(k) into LL(1) would be possible as well, using essentially the same

transformation algorithms.

My point is that table driven parsers are unreadable, due to the lost

relationship between the parser code and the implemented language or

grammar. Do there exist LR parsers or parser generators at all, which

preserve the relationship between the parser code and the grammar?

*>*

*> # 4. When at least one terminal must be consumed, before a recursive*

*> # invocation of a rule, no infinite recursion can occur. (every loop will*

*> # terminate after all terminals in the input stream have been consumed)*

*>*

*> Confusing implementation with language class. Any grammar that*

*> includes a rule such as A -> A | empty is ambiguous therefore*

*> nondeterministic therefore not LR(k) therefore not LL(k).*

I wanted to present an proof, whether a given grammar is (non-?)

deterministic, regardless of the reason for that property.

*> There are many other things that keep a grammar or language from*

*> being LL(k); LL(k) does not include all deterministic languages.*

*> # Ad (1+2): We should keep grammars apart from languages. Most languages*

*> # require recursive grammars, but allow for both left or right recursive*

*> # grammars.*

*>*

*> A language definition that does not depend on vague handwaving (one or*

*> two such definitions actually exist) bases the semantics on the parse*

*> tree. Since right and left recursion build different parse trees, this*

*> issue is very important in definitions with riguous semantics.*

With regards to programming languages, there exist many constructs that

do not impose or require the construction of an specific (unique) parse

tree. As long as the parser must not know about the definition of an

identifier, the placement of the definitions in an parse tree is

irrelevant, when it doesn't change the semantics of the language

(visibility constraints...).

*>*

*> # Languages with "expr '+' expr" or "list ',' list" can be parsed in*

*> # multiple ways. Unless there exist additional (semantical) restrictions,*

*> # correct and equivalent left or right recursive grammars can be*

*> # constructed for such languages.*

*>*

*> Or just write an unambiguous production. It's not any harder to do it*

*> right than to do it lazy and wrong.*

*>*

*> If the semantics of a subtract production are the value of the right*

*> subtree is subtracted from the value of left subtree, then*

*> 3 - 2 - 1*

*> with left recursion is*

*> = (3 - 2) - 1 = 1 - 1 = 0*

*> with right recursion is*

*> = 3 - (2 - 1) = 3 - 1 = 2*

This is a property of the asymmetric subtraction operation, which

doesn't apply to the symmetric addition or multiplication operations. Of

course it's a good idea to enforce a unique sequence of *numerical*

operations in program code, whereas in mathematical formulas such

additional restrictions should *not* be built into a grammar.

*>*

*> Rather different answers but unless your software is controlling a*

*> satellite to Venus, I guess sloppiness can be repaired in the next*

*> patch release.*

No doubt, but IMO you want to introduce more restrictions than required.

A compiler is allowed to apply certain *valid* transformations on an

parse tree, in so far I cannot see a reason why any grammar or parser

for that language must enforce the construction of one-and-only-one

valid parse tree.

*> Algol-60 used left recursion except exponentiation so that the value*

*> could be determined from the parse tree without a lot misreadable*

*> prose.*

*>*

*> # And when a human is allowed to disambiguate a grammar for such a*

*> # language himself, a parser generator should be allowed to do the same ;-)*

*>*

*> hy bother inserting ambiguity and then remove it again with obscure*

*> rules? Eschew ambiguity from the onset.*

Here you're talking about the construction of languages, not about the

construction of parsers ;-)

DoDi

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.