# Chomsky Normal Form and parsing

## ebcrmb@ebc.ericsson.se (BO/EBC/KX/ZG Michael Bergman 682 4958)Tue, 23 Feb 1993 17:17:35 GMT

From comp.compilers

Related articles
Chomsky Normal Form and parsing ebcrmb@ebc.ericsson.se (1993-02-23)
Re: Chomsky Normal Form and parsing hdev@dutiag.twi.tudelft.nl (1993-02-23)
Re: Chomsky Normal Form and parsing eifrig@beanworld.cs.jhu.edu (1993-02-24)
Re: Chomsky Normal Form and parsing hdev@dutiak.twi.tudelft.nl (1993-02-24)
| List of all articles for this month |

 Newsgroups: comp.compilers From: ebcrmb@ebc.ericsson.se (BO/EBC/KX/ZG Michael Bergman 682 4958) Keywords: parse, question Organization: Compilers Central Date: Tue, 23 Feb 1993 17:17:35 GMT

Hi everyone,
I have some more entangled(?) questions on CFG's.

This time it's Chomsky and Greibach normal forms I'm interested in.

"Languages and Machines" by Tomas A. Sudkamp [1988]
ISBN 0-201-15768-3

contains some interesting stuff. A friend of mine had this at hand, so I
cite him:

Definition Chomsky Normal Form:

Grammar G = (V,Sigma,P,S) (Variables, alphabet, productions, start) All
productions are in one of the forms:

1. A -> BC
2. A -> a
3. S -> epsilon

B and C != S, A is arbitrary (specifically, it can be S)
(Note that S is the only production that can have an epsilon.)

Theorem:

"Let G =(V,Sigma,P,S) be a CFG. There is an algorithm to construct a
grammar G' = (V,Sigma,P',S) in Chomsky NF that is equivalent to G."

Note that there are no other restrictions here, according to Sudkamp this
works on all CFG's.
These steps are used in the conversion:
1. Make S non-recursive
2. Eliminate all epsilon except the one in S (if there is one)
3. Eliminate all chain rules
4. Remove useless symbols (the ones not used in any production).

(After this, one can eliminate left-recursion to make it predictive
parseable.)

The question is: what's the practical use of this?
Can we parse this Chomsky Normal Form for sure using some parsing method?
Which one in that case? (I do suspect the answer is no... :-(
If not, are there some "well" defined restrictions one can put on the CFG
(e.g. unamiguity) to make the parsing possible or even more feasible?

There is also the Greibach NF which probably is better for LL/predictive
parsing, it is inherently non-left recursive for example. The problem is
that the conversion is *nasty*, having exponential behaviour in the worst
case. Is there a theorem stating that it is always possible to convert to
Greibach NF? Apparently, the theorem above says that for Chomsky NF.

I just found a comment on this in a comp.compilers article from May 1992:

Jonathan Eifrig (eifrig@blaze.cs.jhu.edu) writes:

> We want don't so much want to recognize a
> *language* so much as generate a parse tree of the input for a particular
> grammar. The *grammar* itself is important; if one has to contort the
> "natural" grammar (i.e., the one that matches the semantic grouping) of
> language in order to accomodate our parsing scheme, this is a serious dis-
> advantage. This is why, for example, we don't just convert our ambiguous
> grammars into Chomsky normal form and apply the obvious dynamic
> programming parsing scheme. We'll see in a minute that both LL(k) and
> LR(1) parsing schemes suffer from this problem.

I guess that says it all unless someone else has any comments.

However, since it is possible to do the conversion at least
semi-automatically, maybe one could try it and see if the result is
acceptable or not. I haven't tried this myself so I don't know what it
looks like (but I can imagine it will be pretty difficult to read).

So - what's the "dynamic programming parsing scheme"? I'll be happy with
an overview of the method + a pointer to any refs if there are any.

M
--
Michael Bergman Email: ebcrmb@ebc.ericsson.se
KX/ZGE Tfn: +46 8 6824958