# Re: testing epsilon free CFGs for ambiguity

## Chris F Clark <cfc@world.std.com>6 Oct 2001 16:31:45 -0400

From comp.compilers

Related articles
Re: testing epsilon free CFGs for ambiguity cfc@world.std.com (Chris F Clark) (2001-10-06)
| List of all articles for this month |

 From: Chris F Clark Newsgroups: comp.theory,comp.compilers Date: 6 Oct 2001 16:31:45 -0400 Organization: Compilers Central References: <3BBE007B.3532637C@unb.ca> Keywords: parse, theory Posted-Date: 06 Oct 2001 16:31:45 EDT

Ralph Boland asked why CFG ambiguity is non-computable and whether the
following statement is true:
> 2) The reason that the ambiguity problem is not generally
> computable is because we cannot place a bound on the
> size of the smallest example sentence that is ambiguous.
> FURTHERMORE, THIS IS TRUE EVEN IF THE GRAMMAR
> IS EPSILON FREE!!!

While I am not an expert on grammar computability, I have worked on
the problem a small amount and I believe the above statement is true.
I will detail why in a minute.

Digression;
Note, the only proof I have read on the non-computability of
ambiguity was in Aho and Ullman's _The Theory of Parsing,
Translation, and Compiling_ Vol I. pp 203-204. ISBN
0-13-914556-7. The proof shows how to construct a grammar whose
ambiguity solves the Post Corespondence problem, which in turn can
be shown to solve the Turing Machine halting problem. Thus, if
there existed an algorithm, we would have an algorithm that solved
the halting problem.

Ok, now, why minimal length examples of ambiguity are aribitrarily
large:

Consider a(n epsilon free) grammar of the form:

(In "yacc" notation, with nonterminal in lower case, terminals in
upper, g is the goal non-terminal)

g: a | b;

a: a1 | a1 a;
a1: X a2;
a2: X a3;
...
an-1: X an;
an: X;

b: b1 | b1 b1 b;
b1: X b2;
b2: X b3;
...
bm-1: X bm;
bm: X;

This grammar matches an "a" non-terminal, if the sentence is of the
form X**(i*n), and a "b" non-terminal, if the sentence is of the form
X**m(2*j-1). The grammar is ambiguous if and only if there is a
solution to i*n=m*(2*j-1) in the positive integers. Thus, if n is
even and not a multiple of m, the grammar is unambiguous and
otherwise, it is ambiguous. (I hope I got my elementary algebra
right.) Note, by playing around with the "a" and "b" sets of rules,
one can form also sorts of arbitrary mathematical puzzles. For
example, what happens if the b rules are of the form:

b1: X b2 b2;
b2: X b3 b3;
...

Question to other comp.theory and comp.compilers readers is it
possible to make a series of "b" rules that exhibits faster than
exponential growth? If I understand the preceding example, it seems
trivial to produce rules that grow according to any exponential curve,
but I'm not sure how to grow faster than that. Moreover, to be
equivalent to the halting problem, I would think a much faster growth
would be required.

Note, this problem interests me a lot, because I have a class of
grammars that I call LR(infinite), which essentially use a clever form
of unbounded lookahead to determine grammar membership. All grammars
within the set are unambiguous, but generating a parser for a grammar
in the set involves solving the ambiguity problem in a similar manner
to the question posed above. Now, I believe that for all unambiguous
grammars the algorithm eventual terminates in a finite (but
arbitrarily large) number of steps. I believe the problem is that
there is no computable bound that can be put on the length of the
number of steps the algorithm will take given a grammar of a
particular size (e.g. number of terminals, non-terminals, length of
right-hand-sides, and number of rules). Thus, for certain ambiguous
grammars, the program simply fails to terminate, always thinking that
some more lookahead with resolve the conflict. (And it seems obvious
that if I could restrict the ambiguity problem to a fixed amount of
lookahead based on an exponential equation from the grammar size, the
problem is solvable, simply prcompute that number and run for only