Re: HELP : Algorithm to generate context free grammar (Torben AEgidius Mogensen)
27 Feb 1996 16:40:41 -0500

          From comp.compilers

Related articles
HELP : Algorithm to generate context free grammar (Nitin Shah) (1996-02-26)
Re: HELP : Algorithm to generate context free grammar (1996-02-27)
Re: HELP : Algorithm to generate context free grammar (1996-02-27)
| List of all articles for this month |

From: (Torben AEgidius Mogensen)
Newsgroups: comp.theory,comp.compilers
Date: 27 Feb 1996 16:40:41 -0500
Organization: Department of Computer Science, U of Copenhagen
Distribution: inet
References: 96-02-306
Keywords: parse

Nitin Shah <> writes:

>I am looking for an algorithm that will generate context-free grammar
>from given set of strings. For example,
>Given a language set L = {aaabbbbb,aab}.

>One of the grammar is

>G -> AB
>A -> aA|a
>B -> b|bB

>So, I am looking for an algorithm that takes set of strings of the
>language as input and gives its context-free grammar.

It is not entirely clear what conditions the grammar must fulfil. In
your example, the grammar also generates the strings ab, aabbb
etc. which are not in your set. Also, as John said:

>[It's easy enough to generage a CFG for any finite set of strings, by
>enumerating them, but I doubt that's what you want. I've seen some
>efforts to guess a grammar from examples of valid strings, but they
>don't seem to have been very successful. -John]

In this solution the grammar is

G -> aaabbbbb | aab

which exactly matches the set you have.

You evidently want the grammar to generate all the strings in the set,
but if you allow it to generate strings not in the set, the question
is what criteria one should choose for the grammar. Simplicity of
grammar? Small number of extra strings? Both?

If you want a simple grammar, the following will do for any set of
strings in the alphabet {a,b}

G -> aG | bG | \epsilon

It obviously generates many strings not in the set (but so does your
grammar above). If you want to reduce the number of extra strings, a
simple count won't do, as it is infinite in both cases. Subset-
ordering might do, but if you put no bound on the size of the grammar,
you end up with the enumeration solution (which is a precise fit).

Hence, you must state some criteria, e.g.

The grammar of size at most K, which generates all strings in
the set M and such that all smaller grammars which do so will
generate more (subsetwise) extra strings.

A better idea may be to state a set N of strings that you want _not_
to be generated. That way you can use the following criteria:

The smallest grammar that generates all strings in M and none
in N.

If M and N are finite, this is computable (as context free parsing is
computable and there are only a finite number of grammars smaller than
the enumeration of M). However, I can see no effective way of
computing it (though it may be possible approximate it by using
methods from compression).

Hence, my posting is not really an answer to your question, but rather
an indication that the question wasn't well defined.

Torben Mogensen (

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.