Re: detecting ambiguous grammars

"Joachim Durchholz" <>
1 Mar 2001 14:16:19 -0500

          From comp.compilers

Related articles
detecting ambiguous grammars (Thant Tessman) (2001-02-15)
Re: detecting ambiguous grammars (David Pereira) (2001-02-17)
Re: detecting ambiguous grammars (Chris F Clark) (2001-02-23)
Re: detecting ambiguous grammars (Joachim Durchholz) (2001-03-01)
Re: detecting ambiguous grammars (2001-03-04)
Re: detecting ambiguous grammars (Thant Tessman) (2001-03-04)
Re: detecting ambiguous grammars (David Pereira) (2001-03-08)
Re: detecting ambiguous grammars (2001-03-10)
Re: detecting ambiguous grammars (Thant Tessman) (2001-03-12)
Re: detecting ambiguous grammars (2001-03-22)
[12 later articles]
| List of all articles for this month |

From: "Joachim Durchholz" <>
Newsgroups: comp.compilers
Date: 1 Mar 2001 14:16:19 -0500
Organization: Compilers Central
References: 01-02-080
Keywords: parse, theory, comment
Posted-Date: 01 Mar 2001 14:16:19 EST

Thant Tessman <> wrote:
> Is there a general algorithm for detecting whether a grammar is
> ambiguous? I've read posts that suggest the answer is no, but could
> someone point me at the reason why?

The question whether an arbitrary context-free grammar is ambiguous is
undecidable. Taking this from CS speak to everday wording, this means:
1) I'm talking about context-free grammars, i.e. they must be more
powerful than regular expressions (where the question is decidable). The
problem is that you need CFGs to parse embedded structures.
2) "undecidable" means: you can write an algorithm that checks whether a
given grammar is ambiguous. However, this algorithm may take an
arbitrarily long time to terminate, and for some grammars it will never
reach a conclusion and continue to run forever.

> A naive approach would be to systematically generate all possible
> strings less than a given length and just check to see if any of the
> strings are equivalent.

This is not feasible in practice. Say, the productions in your grammar
have three alternatives each; then generating all strings produced
from N substitutions will take 3^N steps. This will exceed the number
of atoms in the universe for roughly N >= 45.

> Of course this only works for statements less than the given string
> length. (Is there a way to prove that this is sufficient for some
> class of grammars?)

Yes. But there are better algorithms for some classes of grammars. For
example, LR(k) and LL(k) are never ambiguous, and to check that a
given grammar is LR or LL you just send it through a parser generator.
Various other grammar classes have been researched. Actually one of
the things that make grammar classes interesting is that they are
guaranteed to be unambiguous. There isn't much research in ambiguous
grammars, at least not for programming languages. (Natural language
research is different. I once saw some remarks that sounded as if the
Earley algorithm has been reinvented in the natural language
community. <g>)

Note that the Earley algorithm will parse any string against any
context-free grammar, but it will not tell you whether the grammar is
ambiguous, it will tell you whether that particular string can be
parsed in an ambiguous manner (and give you all the relevant parse
trees as well).

For example, one language (Eiffel) has a rule that says "Semicolons
are optional unless they are required to make the syntax unambigous";
two of the four well-known Eiffel compilers use an Earley parser and
have no difficulties implementing that rule. (The "official" Eiffel
grammar is also ambiguous, and in fact it's difficult to specify a
full language grammar if it has a semicolon rule like the above.)

> But how about generating all possible strings where each
> non-terminal is expanded at least three times. Why three times?
> Because that's what seems to work in the hand expansions I've done.

This isn't going to convince anybody. Particularly if there are good
alternatives. Besides, if you have found an ambiguity, this still
doesn't tell you how to change it to get rid of the ambiguity without
changing the language. (This is is a general problem: the same
question arises when you try to make a grammar LALR(1) or LL(k), to
make it palatable to yacc or Antlr.)

> The grammar has produced two instances of the string "id + id + id"
> (as well as some identical E3 strings) which shows that the grammar
> is ambiguous. Can folks suggest why this approach may or may not be
> a wild goose chase? (I'm going to be using a top-down parser and my
> goal is not efficiency, but the exploration of the implications of
> various grammar designs.)

You may want to look at Reiner Pesch's recent post dated
2000-02-12. The YCA tool that he announced will detect non-LALR(1)
constructs, which means you get not only ambiguities but also some
other problems, but it will catch all ambiguities. If you're doing
top-down parsing, you don't need a tool. The rules for top-down
parsing are rather simple. Read "Compiler Construction" by Niklas
Wirth, he explains it very clearly. (It was the very first compiler
text that I read, and I found it easily understandable.)

[It seems to me you could prune the 3^45 search pretty radically by
doing a depth first traversal and abandoning a branch when the parser
says it's not a possible prefix, since most random strings of symbols
fail quickly. Has anyone tried to see if this leads to anything
useful? -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.