Re: detecting ambiguous grammars

Chris F Clark <cfc@world.std.com>
27 Mar 2001 23:31:40 -0500

From comp.compilers

Related articles
[6 earlier articles]
Re: detecting ambiguous grammars davidpereira@home.com (David Pereira) (2001-03-08)
Re: detecting ambiguous grammars dlester@cs.man.ac.uk (2001-03-10)
Re: detecting ambiguous grammars thant@acm.org (Thant Tessman) (2001-03-12)
Re: detecting ambiguous grammars christian.bau@isltd.insignia.com (2001-03-22)
Re: detecting ambiguous grammars thant@acm.org (Thant Tessman) (2001-03-26)
Re: detecting ambiguous grammars joachim_d@gmx.de (Joachim Durchholz) (2001-03-26)
*Re: detecting ambiguous grammars cfc@world.std.com (Chris F Clark)* (2001-03-27)**
Re: detecting ambiguous grammars genew@shuswap.net (2001-03-27)
Re: detecting ambiguous grammars genew@shuswap.net (2001-03-27)
Re: detecting ambiguous grammars thant@acm.org (Thant Tessman) (2001-03-31)
Re: detecting ambiguous grammars cfc@world.std.com (Chris F Clark) (2001-03-31)
Re: detecting ambiguous grammars kenarose@earthlink.net (Ken Rose) (2001-03-31)
Re: detecting ambiguous grammars vbdis@aol.com (2001-03-31)
[4 later articles]

| List of all articles for this month |

From:	Chris F Clark <cfc@world.std.com>
Newsgroups:	comp.compilers
Date:	27 Mar 2001 23:31:40 -0500
Organization:	The World Public Access UNIX, Brookline, MA
References:	01-02-080 01-03-020 01-03-032 01-03-078 01-03-084 01-03-102 01-03-119
Keywords:	parse
Posted-Date:	27 Mar 2001 23:31:40 EST

Christian Bau wrote:
> If you wrote a program in a language defined by an ambigous
> grammar,

Thant Thessman wrote:
> There is another option. In lex, if a string of letters matches more
> than one rule, the rule that occurs first in the rule section is
> chosen. A grammar specification could work the same way. Although
> ambiguities wouldn't be explicitly enumerated, it is always clear how
> to disambiguate.

Actually, that is exactly what yacc does. It declares it to be an
error, but applies the rule that the first (or perhaps last, I can
never remember) rule in the grammar at a given conflict is used.
That's quite workable for simple ambiguities. However, for nested
ambiguities it tends to do the wrong thing. Ziemwat Laski (sp?)
proposed a scheme that did the resolution from the top down. That
tends to work a little bit better. However, it cannot be applied
locally in the tables without considering context.

A similar approach was suggested in a TOPLAS article a few years ago.
In that article, the author (whose name currently escapes me)
suggested using preference controls that determined which of two
possible ambiguous parses was the desired one. (If I recall
correctly, the author did it by showing examples, i.e. parse this tree
like this.)

Thant Thessman again:
> To rephrase my question: Does the advantage of addressing
> ambiguities up front outweigh the advantage of potentially more
> expressive grammars?

Ullman certainly thought not in his paper "Deterministic Parsing of
Ambiguous Grammars". That paper is the basis of the precedence and
associativity directives in yacc. He thought expressivity and
conciseness were important atributes.

However, 20/20 hindsight suggests that in many cases the global
directives are "too big" of a hammer. You can't casually use them to
disambiguate a grammar and then modify the grammar to include more
cases, because if you do they may hide unintended ambiguities in the
new parts of the grammar.

As a result, personally, I've toyed with a variation on the theme
which would make the precedence and associativity directives more
localized. For example, it would allow the grammar author to give a
precedence to a specific token in a rule (which solves certain unary
minus problem cases). With the localized precedence directives, you
can even express some of the concepts of the preference system.

The modern solution (invented, as far as I can tell, by Terence Parr)
is to use syntactic predicates. With syntactic predicates, the
matching is done in two parts, first the "guard" rule is matched. The
guard rules are matched in order, so that there is always a precise
and well known disambiguation of conflicting guards. If the guard
rule matches, then the protected rule is applied. Note, that the
guard and protected rules don't have to match the exact same strings.
For examle, the guard rule will generally match only enough lookahead
to disambiguate the problem tokens. That may be fewer tokens than the
protected rule matches (or in some cases it will be more tokens,
effectively extending the lookahead).

My personal feeling is that most language designers have not given
enough consideration to how their language will parse, which is why
the issue happens in the first place. A grammar writer should not
have to worry whether the grammar is ambiguous or not. Almost the
entire language should be nearly trivially LL(1) and SLR(1). The one
place I change that rule is in expressions, where expressions should
be expressed as an ambiguous grammar with precedence and associativity
rules for the operators that could be easily turned into a pure LL(1)
or SLR(1) grammar. Disasters like the C declaration syntax or its
worse variant in C++ should be relegated to history books.

To show that I try to practice what I preach, I believe the entire
Yacc++ syntax is both LL(2) and SLR(1). The reason it is LL(2) and
not LL(1) is that we allow all the keywords to be used as identifiers
and got rid of the leading % characters before declarations. As a
result one has to look at the first two tokens of any declaration/
statement to see if it is keyword followed by a colon, in which case
the keyword is simply an identifier.

We made the Yacc++ grammar simple like that because ambiguous grammars
are not only hard on parsers, they are hard on human readers and
writers. So, with a tool that supports predicates, associativity and
precedence directives, full LR parsing, embedded regular expressions,
and other features specifically designed to help make parsing
difficult languages easier, we went to the effort to make our lanugage
easy to parse.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: detecting ambiguous grammars

Chris F Clark <cfc@world.std.com>27 Mar 2001 23:31:40 -0500

Chris F Clark <cfc@world.std.com>
27 Mar 2001 23:31:40 -0500