Re: Conflict resolution in EBNF

Chris F Clark <>
20 Oct 2001 21:48:15 -0400

          From comp.compilers

Related articles
Re: Conflict resolution in EBNF csanna@cs.Technion.AC.IL (Anna Nikitin) (2001-10-16)
Re: Conflict resolution in EBNF (Sönke Kannapinn) (2001-10-20)
Re: Conflict resolution in EBNF (2001-10-20)
Re: Conflict resolution in EBNF (2001-10-20)
Re: Conflict resolution in EBNF (Chris F Clark) (2001-10-20)
Re: Conflict resolution in EBNF (2001-10-20)
Re: Conflict resolution in EBNF (2001-10-27)
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: 20 Oct 2001 21:48:15 -0400
Organization: Compilers Central
References: <> 01-10-077
Keywords: parse
Posted-Date: 20 Oct 2001 21:48:15 EDT

Anna Nikitin wrote:
> I'm working on a new compiler-compiler .... The parser for the
> language works directly on ECFG, it doesn't transform ECFG into normal
> CFG.

While you don't explicitly say, because you talk about popping symbols
from the stack, I presume you are writing a tool that is essentially
an LR parser underneath, exscept that it can handle RE's directly.
Often such tools are called ELR parsers--although that includes the
class of parsers where the RE's are handled by conversion into
auxillary rules. In Yacc++, we call your (and our) technique "direct
translation of regular expressions". It does have some advantages
over converting the RE's to the equivalent BNF before generating the

> Therefore two new kinds of Reduce/Reduce conflict may appear....
> In the first case we can use precedence rules (as in Yacc), i.e. give
> priorities to the alternatives. If the conflict occurs we simply
> choose the alternative with the highest priority. But in the second
> case it's not so clear what to do. I don't think that I can use
> precedence here but I suppose some other techniques may help. Does
> somebody know such techniques? I'll appreciate any help in solving
> this problem.

There are several "general" rules that you can use to help your tool
disambiguate. Note, I am going to give my grammar examples in Yacc++
notation, not to be contrary, but because I want to talk about
semantic actions which are naturally enclosed in braces, which means I
can't use braces to indicate closure (0 or more repetitions).

1) The most important rule for RE's is that if two RE's generate the
      same set of strings (e.g. generate the same finite automata
      states), they are the same RE. You can generalize this to if there
      is a conflict between two RE's from one rule (i.e. both RE's are
      expansions of the same non-terminal) and both accept the same input
      text (in this conflict state), you can ignore conflicts between

      A: a* // never any conflicts even for aa, aaaa, ...
        | (aa)* // both rules reduce to non-terminal A

    There are two caveats to the above rule. If you attach semantic
    actions to individual alternatives, then the rule only applies it
    the semantic actions are the same.

      A: a* {action1} // conflicts must be resolved
        | (aa)* {action2} // semantic actions make rules differ

    Similarly, if you "clump" tokens by the operators (often indicated
    by using parenthesis in the grammar), so that the resulting stack
    contains 1 entry for each clump (rahter than one entry for each
    token the rhs matches), you need to resolve the conflicts.
    Moreover, if you do this, it is better to convert your operators
    into auxillary rules, because the auxilllary rules will auto-
    matically provide the correct "clumping".

    Clumped interpretation:
    A: a (a)* // two stack entries one for a and one for a*
      | a (a)* a // three stack entries a a* a

    Unclumped interpretation:
    A: a (a)* // number of stack entries is strictly count of
      | a (a)* a // tokens matched and operators make no difference

2) Prefer longest match, then apply precedence and associativity, and
      finally use ordering of rules in grammar to select conflicting

      A: a* // longest match use 2nd rule for aab, aaaab, ...
        | (aa)* b // rule ordering use 1st rule for a, aa (no b), aaa ...

    A variation on this rule we have found useful for Yacc++ is to
    prefer a RE over a nested rule. The idea is to have the resulting
    parser stack have as few non-terminals as possible (and thus the
    largest RE on the stack). Note, if you are concerned with stack
    overflow, the reverse rule might make sense.

    A: a* // prefer a* (one reduce of a long string)
      | a A // over aAAAAA (lots of reduces of short strings)
      | A a // or AAAAa

    We have also found that the precedence and associativity concepts
    can be extended and improved, making them both more powerful and
    less error prone. We have several ideas along this line. However,
    ther isn't space in this note to describe those ideas.

3) If all else fails, declare am ambiguity and make the grammar
      writer rework the grammar to express what they want. In 12 years
      of Yacc++ use (with hunders of users), we have probably had only a
      handfull of complaints about things Yacc++ decided were ambiguous.
      Most of them were truly bugs in the tool, where the tool had gotten
      confused and generated incorrect tables and decided that something
      was ambiguous only because it was going to misparse the input.

Hope this helps,

Chris Clark Internet :
Compiler Resources, Inc. Web Site :
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

PS. Is there any particular reason, you are inventing your own
compiler-compiler technology? Perhaps there is a way that we could
cooperate and share technology. Yacc++ has done (direct translation)
ELR parsing for some time and is quite stable.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.