27 Feb 1996 16:40:41 -0500

Related articles |
---|

HELP : Algorithm to generate context free grammar lnustoc.cztmb9@eds.com (Nitin Shah) (1996-02-26) |

Re: HELP : Algorithm to generate context free grammar torbenm@diku.dk (1996-02-27) |

Re: HELP : Algorithm to generate context free grammar chrisbr@cogsci.ed.ac.uk (1996-02-27) |

From: | torbenm@diku.dk (Torben AEgidius Mogensen) |

Newsgroups: | comp.theory,comp.compilers |

Date: | 27 Feb 1996 16:40:41 -0500 |

Organization: | Department of Computer Science, U of Copenhagen |

Distribution: | inet |

References: | 96-02-306 |

Keywords: | parse |

Nitin Shah <lnustoc.cztmb9@eds.com> writes:

*>I am looking for an algorithm that will generate context-free grammar*

*>from given set of strings. For example,*

*>Given a language set L = {aaabbbbb,aab}.*

*>One of the grammar is*

*>G -> AB*

*>A -> aA|a*

*>B -> b|bB*

*>So, I am looking for an algorithm that takes set of strings of the*

*>language as input and gives its context-free grammar.*

It is not entirely clear what conditions the grammar must fulfil. In

your example, the grammar also generates the strings ab, aabbb

etc. which are not in your set. Also, as John said:

*>[It's easy enough to generage a CFG for any finite set of strings, by*

*>enumerating them, but I doubt that's what you want. I've seen some*

*>efforts to guess a grammar from examples of valid strings, but they*

*>don't seem to have been very successful. -John]*

In this solution the grammar is

G -> aaabbbbb | aab

which exactly matches the set you have.

You evidently want the grammar to generate all the strings in the set,

but if you allow it to generate strings not in the set, the question

is what criteria one should choose for the grammar. Simplicity of

grammar? Small number of extra strings? Both?

If you want a simple grammar, the following will do for any set of

strings in the alphabet {a,b}

G -> aG | bG | \epsilon

It obviously generates many strings not in the set (but so does your

grammar above). If you want to reduce the number of extra strings, a

simple count won't do, as it is infinite in both cases. Subset-

ordering might do, but if you put no bound on the size of the grammar,

you end up with the enumeration solution (which is a precise fit).

Hence, you must state some criteria, e.g.

The grammar of size at most K, which generates all strings in

the set M and such that all smaller grammars which do so will

generate more (subsetwise) extra strings.

A better idea may be to state a set N of strings that you want _not_

to be generated. That way you can use the following criteria:

The smallest grammar that generates all strings in M and none

in N.

If M and N are finite, this is computable (as context free parsing is

computable and there are only a finite number of grammars smaller than

the enumeration of M). However, I can see no effective way of

computing it (though it may be possible approximate it by using

methods from compression).

Hence, my posting is not really an answer to your question, but rather

an indication that the question wasn't well defined.

Torben Mogensen (torbenm@diku.dk)

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.