Re: Calculating maximum number of grammar productions

"Oliver Wong" <owong@castortech.com>3 Jun 2006 18:53:08 -0400

From comp.compilers

Related articles
Calculating maximum number of grammar productions ali@hodroj.net (2006-05-15)
Re: Calculating maximum number of grammar productions owong@castortech.com (Oliver Wong) (2006-06-03)
| List of all articles for this month |

 From: "Oliver Wong" Newsgroups: comp.compilers Date: 3 Jun 2006 18:53:08 -0400 Organization: GlobeTrotter References: 06-05-051 Keywords: parse, theory Posted-Date: 03 Jun 2006 18:53:08 EDT

<ali@hodroj.net> wrote in message news:06-05-051@comp.compilers...
> Hello,
>
> Given a LALR grammar and a set of language tokens. Is there an
> algorithm to find out how many productions are possible for a specific
> token-wide length statement. For example:
>
> Consider the following grammar:
>
> Expr:
> Expr + Expr | Expr - Expr | 1 | 0
>
> for example: a 5 token-wide length production would be one that has 5
> tokens in it,
> like: 1 + 1 - 0
>
> For a simple grammar like above, you can come up with a mathematical
> formula using factorials or permutations to calculate how many possible
> productions you can have, However, what if you have language like C ?
> assuming--to make things simple-- for every token type there is only
> one associated string (ie: a variable name is always "var", a function
> name is always "function"), what would be a suitable algorithm to
> calculate the possible number of productions for a specific number of
> tokens in a statement?

It might help to first convert the grammar to chomsky normal form (see
http://en.wikipedia.org/wiki/Chomsky_normal_form)

<quote>
Every grammar in Chomsky normal form is context-free, and conversely, every
context-free grammar can be efficiently transformed into an equivalent one
which is in Chomsky normal form.
[...]
throughout the derivation of a string, each string of terminals and
nonterminals is always either the same length or one element longer than the
previous such string. The derivation of a string of length n is always
exactly 2n-1 steps long. Furthermore, since all rules deriving nonterminals
transform one nonterminal to exactly two nonterminals, a parse tree based on
a grammar in Chomsky normal form is a binary tree, and the height of this
tree is limited to at most the length of the string.
</quote>

So basically, you walk the grammar tree, counting all possible derivations,
always stopping when you reach height n, because you know if you go to a
deeper height, you will end up with a string exceeding length n.

- Oliver

Post a followup to this message