Related articles |
---|
Can this be proven for LL(1) grammars? lhp+news@toft-hp.dk (Lasse =?iso-8859-1?q?Hiller=F8e?= Petersen) (2014-11-06) |
Re: Can this be proven for LL(1) grammars? kaz@kylheku.com (Kaz Kylheku) (2014-11-07) |
Re: Can this be proven for LL(1) grammars? slkpg4@gmail.com (SLK Mail) (2014-11-08) |
From: | Kaz Kylheku <kaz@kylheku.com> |
Newsgroups: | comp.compilers |
Date: | Fri, 7 Nov 2014 19:43:44 +0000 (UTC) |
Organization: | Aioe.org NNTP Server |
References: | 14-11-006 |
Keywords: | theory, parse |
Posted-Date: | 07 Nov 2014 15:55:25 EST |
On 2014-11-06, Lasse HillerC8e Petersen <lhp+news@toft-hp.dk> wrote:
> Given a context free grammar G that is LL(1), is the answer to the
> following questions "yes"? I intuitively believe it is, except for (2),
> but I am not sufficiently confident of my skills in proving this in full
> generality, so I would much appreciate expert advice.
>
> 1. From G, for each nonterminal symbol R define a grammar GR, where the
> start rule is changed to R. Rules that are no longer reachable are of
> course useless and discarded. Is each GR also LL(1)?
This is true, and the informal proof is to work backwards: you cannot
take some grammar GR which is not LL(1) and embed it in a larger
context free grammar G which fixes the problem.
> 2. From all the GR grammars from (1), define G' with a new start rule S:
> GR1 | GR2 ... |GRn. Is G' also LL(1)? Given that there could be rules Gi
> and Gj such that their FIRST sets intersect, I suppose the answer is no.
Indeed:
G1 -> blah blah blah 'x'
G2 -> blah blah blah 'y'
Oops! Of course, the language is LL(1) parseable, but requires
a refactored grammar in which we merge the similar left parts:
G1_G2 -> blah blah blah G1_G2'
G1_G2' -> 'x' | 'y'
If we are building a syntax tree as a semantic action, we can handle this
language by encoding the difference between the 'x' and 'y' variant in the
tree. We don't have G1 and G2 phrase structures, but a G1_G2 that
generates multiple forms.
> What if instead, S is defined only from a selection of rules with
> disjoint FIRST sets?
This might not help because of the threat of FIRST/FOLLOW conflicts.
The problem is that G1 and G2 might share some constituent, call it C.
The G1 and G2 rules contribute different material to FOLLOW(C):
G1 -> 'x' C Z
G2 -> 'y' C W
G1 and G2 nicely derive a starting nonterminal, so their FIRST sets are clearly
disjoint (in compliance with your stated constraint on S). But the combintion
of G1 and G2 causes a merging of FOLLOW(C) from G1, and FOLLOW(C) from G2.
Moreover, if C can derive empty (C -> N5), FIRST(C) is also being affected. In
general, conflicts can arise from this. Even though there is no
FIRST(C)/FOLLOW(C) conflict in the G1 and G2 rules in isolation, possibly the
FIRST(C) from G1 could conflict with FOLLOW(C) in G2 or vice versa,
under the merge of these sets.
> 3. From G, define a new grammar G', such that for each nonterminal R, a
> rule R: "R" is added, where "R" is a new terminal symbol. Is G' also LL
> (1)?
This lets you replace any phrase structure in the language with
a "super token" which represents that phrase structure like
3 + MULTIPLICATIVE_EXPR.
This seems to fit in with sentential form representations that you
would use in generation or derivation of sentences.
This introduction cannot introduce a new FIRST/FOLLOW conflict.
R -> "R" means that the terminal "R" is in FIRST(R). Can it also
appaer in FOLLOW(R)? Only if grammar derives R R somewhere, and R happens to
derive N5. But in that case, there is already an existing FIRST/FOLLOW
conflict, unless perhaps the *only* rule for R was previously R -> N5.
If we have:
S -> R R
R -> N5
Then we add:
R -> 'R'
We have a FIRST/FOLLOW conflict. Given the input 'R' it is ambiguous
whether it is S -> N5 'R' or S -> 'R' N5.
What's not clear to me is that the original before the new rule doesn't already
have a conflict. I think not because the symbol N5 only appears in FIRST sets,
not FOLLOW sets. In the original unaugmented grammar, we have FIRST(R) = { N5 }
and FOLLOW(R) = { }. The grammar clearly matches the input in exactly
one way, by deriving R -> N5 twice.
Suppose that R actually derives something in addition to N5, and is embedded
in a correct grammar with no conflicts. Since there is no FIRST/FOLLOW
conflict that means that either R does not derive N5, or FIRST(R)
and FOLLOW(R) do not intersect. If they do not intersect, that means that
FOLLOW(R) was not derived from R: there is nothing in the grammar of
the form R R or R X R where symbols X potentially disappear by deriving N5.
Then if we add R -> 'R', it is harmless.
Informally, since R already derives *something* non-empty without conflicts,
then making R derive a new nonterminal, unique to R, will not create
a conflict. (R already succesfully derives tokens that are not necessarily
unique to the R rule, so why wouldn't it handle an additional unique one.)
> 4. Do (1) and (3) hold for other types of CFG?
I suspect (1) may pop out from what it means for a grammar to be "context
free", but since I already hand-waved through the first comment about it, I'm
going to leave it at that. :) I look forward to other responses.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.