Re: LL(1) grammar for dangling else?

leichter@zodiac.rutgers.edu
Fri, 23 Jun 1995 03:58:37 GMT

          From comp.compilers

Related articles
LL(1) grammar for dangling else? MLF@VM.CPD.UA.ES (Mikel L. Forcada) (1995-05-28)
Re: LL(1) grammar for dangling else? lumpe@aragorn.unibe.ch (Markus Lumpe) (1995-05-30)
Re: LL(1) grammar for dangling else? leichter@zodiac.rutgers.edu (1995-06-23)
| List of all articles for this month |

Newsgroups: comp.compilers
From: leichter@zodiac.rutgers.edu
Keywords: LL(1), parse
Organization: Rutgers University Department of Computer Science
References: 95-05-133
Date: Fri, 23 Jun 1995 03:58:37 GMT
Status: RO

| Mikel L. Forcada" <MLF@VM.CPD.UA.ES> writes:
| the dangling-else problem is a classical one. In every book,
| a non-ambiguous grammar based on "matched" and "unmatched"
| statements is proposed, but this grammar is not LL(1).
| Is it true that there doesn't exist any LL(k) grammar for
| this language?


Yes, it's true. The dangling-else problem can be modeled by the following
language:


{a^n b^m | n >= m}


(Think of "a" as "if" and "b" as "else". The rest of the stuff in the state-
ment is irrelevant to this question.)


The following is a very informal argument: Consider parsing a string in this
language. Initially, we are looking for the start symbol S. If n is large
enough (n > k) the lookahead is always just a^k. There can, of course, be
only one production for S chosen with lookahead a^k. What symbol does it
leave us looking for? We might as well assume S: If it instead leaves us
with S1, add another a to the input string and repeat. There are only
finitely many non-terminals, so eventually we loop. (The loop might involve
going from S to S1 and back to S, or through any finite string, but the
argument is the same.) Eventually, we "chew off" enough a's that a b enters
the lookahead set. But by then we "don't know how many a's" we saw, because
we don't know how many times we looped. This makes it impossible to check
the number of b's.


This is essentially the same argument as one uses to show that a^n b^n isn't
regular. Of course, it's not quite equivalent; you have to look at the
context-free alternative of generating a "b" to match the "a" - i.e., the
first production could have been:


S <- a S b


But this can't help! If we have lookahead a^k, the string *could* in fact, be
a^n - so we can't possibly choose this production. We have to put the choice
off, with something like:


S <- a S B
B <- b
B <- \epsilon


But then "aab" has two parses, depending on which B generates the "b".


In essense, what the argument comes down to is this: With enough leading a's,
we have to commit ourselves to productions before seeing any b's. Since
there may be no b's at all, or as many as there are a's, the production we
commit ourselves to must eventually generate either zero b's, or at least
one. But this happens independently for each leading a. Once we have more
than one of these "eventual choices", the grammar becomes ambiguous.


-- Jerry
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.