Wed, 24 Feb 1993 00:29:38 GMT

Related articles |
---|

Chomsky Normal Form and parsing ebcrmb@ebc.ericsson.se (1993-02-23) |

Re: Chomsky Normal Form and parsing hdev@dutiag.twi.tudelft.nl (1993-02-23) |

Re: Chomsky Normal Form and parsing eifrig@beanworld.cs.jhu.edu (1993-02-24) |

Re: Chomsky Normal Form and parsing hdev@dutiak.twi.tudelft.nl (1993-02-24) |

Newsgroups: | comp.compilers |

From: | eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig) |

Keywords: | parse, theory |

Organization: | The Johns Hopkins University CS Department |

References: | 93-02-127 |

Date: | Wed, 24 Feb 1993 00:29:38 GMT |

In article 93-02-127 ebcrmb@ebc.ericsson.se (BO/EBC/KX/ZG

*>I have some more entangled(?) questions on CFG's.*

*>*

*>This time it's Chomsky and Greibach normal forms I'm interested in.*

*>*

*> [definition of CNF grammars deleted]*

*>*

*>The question is: what's the practical use of this?*

*>Can we parse this Chomsky Normal Form for sure using some parsing method?*

*>[added later -John]*

*>I just found a comment on this in a comp.compilers article from May 1992:*

Jonathan Eifrig (eifrig@blaze.cs.jhu.edu) writes:

*> We want don't so much want to recognize a*

*> *language* so much as generate a parse tree of the input for a particular*

*> grammar. The *grammar* itself is important; if one has to contort the*

*> "natural" grammar (i.e., the one that matches the semantic grouping) of*

*> language in order to accomodate our parsing scheme, this is a serious dis-*

*> advantage. This is why, for example, we don't just convert our ambiguous*

*> grammars into Chomsky normal form and apply the obvious dynamic*

*> programming parsing scheme. We'll see in a minute that both LL(k) and*

*> LR(1) parsing schemes suffer from this problem.*

Wow! Let me tell you, seeing an article you posted almost a year

ago makes one sit up and take notice! Stop the compliments, before they

fool me into writing a book. :-)

As far as the parsing scheme goes, it's very simple. (Like most

algorithms, its obvious, once you see the trick.) Consider a CNF with k

non-terminal symbols, m terminal symbols, and n productions. To answer

"Is string a in the language generated by the grammar?", we do the

following:

Let a[i,j] be the substring of the input from position i to

position j in the input (0 < i,j <= n). We build a table telling us, for

every i and j, which (if any) of the grammar symbols could generate the

string a[i,j]. Once we have the table, we just check to see if the start

symbol could have generated a[1,n].

We build the table by induction on the length of the substrings

a[i,j]. For substrings of length 1, this is easy: A could generate

a[i,i+1] = a if and only if A -> a is a production of the grammar. For

longer strings, we just check exhaustively: For each production A ->BC, we

check to see if there exists a k (between i and j) such that B generates

a[i,k] and C generates a[k+1,j]; since these substrings are shorter that

a[i,j] these answers are already in the table.

A relatively straightforward counting argument shows that this

brute-force approach is even "efficient," at least as far a theorists are

concerned: it takes worst-case O(n^3) time. For a fixed-size grammar,

it's a little better: O(n^2). This is well-known automata theory stuff.

(In fact, it was one of the questions on my qualifier! :-)) For a more

detailed explanation, see my favorite baby automata book "An Introduction

to Formal Languages and Automata" by Linz.

Of course, these are the wrong sorts of parsing questions to ask,

in practice. As I argued above, there's very little use for a parser that

just answers "Yes" or "No"; what one _really_ needs is a parser that

generates a parse tree of the input.

--

Jack Eifrig (eifrig@cs.jhu.edu) The Johns Hopkins University, C.S. Dept.

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.