Re: Can this be proven for LL(1) grammars?

Kaz Kylheku <>
Fri, 7 Nov 2014 19:43:44 +0000 (UTC)

          From comp.compilers

Related articles
Can this be proven for LL(1) grammars? (Lasse =?iso-8859-1?q?Hiller=F8e?= Petersen) (2014-11-06)
Re: Can this be proven for LL(1) grammars? (Kaz Kylheku) (2014-11-07)
Re: Can this be proven for LL(1) grammars? (SLK Mail) (2014-11-08)
| List of all articles for this month |

From: Kaz Kylheku <>
Newsgroups: comp.compilers
Date: Fri, 7 Nov 2014 19:43:44 +0000 (UTC)
Organization: NNTP Server
References: 14-11-006
Keywords: theory, parse
Posted-Date: 07 Nov 2014 15:55:25 EST

On 2014-11-06, Lasse HillerC8e Petersen <> wrote:
> Given a context free grammar G that is LL(1), is the answer to the
> following questions "yes"? I intuitively believe it is, except for (2),
> but I am not sufficiently confident of my skills in proving this in full
> generality, so I would much appreciate expert advice.
> 1. From G, for each nonterminal symbol R define a grammar GR, where the
> start rule is changed to R. Rules that are no longer reachable are of
> course useless and discarded. Is each GR also LL(1)?

This is true, and the informal proof is to work backwards: you cannot
take some grammar GR which is not LL(1) and embed it in a larger
context free grammar G which fixes the problem.

> 2. From all the GR grammars from (1), define G' with a new start rule S:
> GR1 | GR2 ... |GRn. Is G' also LL(1)? Given that there could be rules Gi
> and Gj such that their FIRST sets intersect, I suppose the answer is no.


    G1 -> blah blah blah 'x'
    G2 -> blah blah blah 'y'

Oops! Of course, the language is LL(1) parseable, but requires
a refactored grammar in which we merge the similar left parts:

    G1_G2 -> blah blah blah G1_G2'
    G1_G2' -> 'x' | 'y'

If we are building a syntax tree as a semantic action, we can handle this
language by encoding the difference between the 'x' and 'y' variant in the
tree. We don't have G1 and G2 phrase structures, but a G1_G2 that
generates multiple forms.

> What if instead, S is defined only from a selection of rules with
> disjoint FIRST sets?

This might not help because of the threat of FIRST/FOLLOW conflicts.

The problem is that G1 and G2 might share some constituent, call it C.
The G1 and G2 rules contribute different material to FOLLOW(C):

    G1 -> 'x' C Z
    G2 -> 'y' C W

G1 and G2 nicely derive a starting nonterminal, so their FIRST sets are clearly
disjoint (in compliance with your stated constraint on S). But the combintion
of G1 and G2 causes a merging of FOLLOW(C) from G1, and FOLLOW(C) from G2.
Moreover, if C can derive empty (C -> N5), FIRST(C) is also being affected. In
general, conflicts can arise from this. Even though there is no
FIRST(C)/FOLLOW(C) conflict in the G1 and G2 rules in isolation, possibly the
FIRST(C) from G1 could conflict with FOLLOW(C) in G2 or vice versa,
under the merge of these sets.

> 3. From G, define a new grammar G', such that for each nonterminal R, a
> rule R: "R" is added, where "R" is a new terminal symbol. Is G' also LL
> (1)?

This lets you replace any phrase structure in the language with
a "super token" which represents that phrase structure like

This seems to fit in with sentential form representations that you
would use in generation or derivation of sentences.

This introduction cannot introduce a new FIRST/FOLLOW conflict.
R -> "R" means that the terminal "R" is in FIRST(R). Can it also
appaer in FOLLOW(R)? Only if grammar derives R R somewhere, and R happens to
derive N5. But in that case, there is already an existing FIRST/FOLLOW
conflict, unless perhaps the *only* rule for R was previously R -> N5.

If we have:

  S -> R R
  R -> N5

Then we add:

  R -> 'R'

We have a FIRST/FOLLOW conflict. Given the input 'R' it is ambiguous
whether it is S -> N5 'R' or S -> 'R' N5.

What's not clear to me is that the original before the new rule doesn't already
have a conflict. I think not because the symbol N5 only appears in FIRST sets,
not FOLLOW sets. In the original unaugmented grammar, we have FIRST(R) = { N5 }
and FOLLOW(R) = { }. The grammar clearly matches the input in exactly
one way, by deriving R -> N5 twice.

Suppose that R actually derives something in addition to N5, and is embedded
in a correct grammar with no conflicts. Since there is no FIRST/FOLLOW
conflict that means that either R does not derive N5, or FIRST(R)
and FOLLOW(R) do not intersect. If they do not intersect, that means that
FOLLOW(R) was not derived from R: there is nothing in the grammar of
the form R R or R X R where symbols X potentially disappear by deriving N5.

Then if we add R -> 'R', it is harmless.

Informally, since R already derives *something* non-empty without conflicts,
then making R derive a new nonterminal, unique to R, will not create
a conflict. (R already succesfully derives tokens that are not necessarily
unique to the R rule, so why wouldn't it handle an additional unique one.)

> 4. Do (1) and (3) hold for other types of CFG?

I suspect (1) may pop out from what it means for a grammar to be "context
free", but since I already hand-waved through the first comment about it, I'm
going to leave it at that. :) I look forward to other responses.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.