Related articles |
---|
Re: Conflict resolution in EBNF csanna@cs.Technion.AC.IL (Anna Nikitin) (2001-10-16) |
Re: Conflict resolution in EBNF Soenke.Kannapinn@wincor-nixdorf.com (Sönke Kannapinn) (2001-10-20) |
Re: Conflict resolution in EBNF alexc@world.std.com (2001-10-20) |
Re: Conflict resolution in EBNF knyblad@baan.com (2001-10-20) |
Re: Conflict resolution in EBNF cfc@world.std.com (Chris F Clark) (2001-10-20) |
Re: Conflict resolution in EBNF friwi@prosun.first.gmd.de (2001-10-20) |
Re: Conflict resolution in EBNF dr_feriozi@prodigy.net (2001-10-27) |
From: | Chris F Clark <cfc@world.std.com> |
Newsgroups: | comp.compilers |
Date: | 20 Oct 2001 21:48:15 -0400 |
Organization: | Compilers Central |
References: | <Pine.WNT.4.33.0110141148380.-834121@milton.iecc.com> 01-10-077 |
Keywords: | parse |
Posted-Date: | 20 Oct 2001 21:48:15 EDT |
Anna Nikitin wrote:
> I'm working on a new compiler-compiler .... The parser for the
> language works directly on ECFG, it doesn't transform ECFG into normal
> CFG.
While you don't explicitly say, because you talk about popping symbols
from the stack, I presume you are writing a tool that is essentially
an LR parser underneath, exscept that it can handle RE's directly.
Often such tools are called ELR parsers--although that includes the
class of parsers where the RE's are handled by conversion into
auxillary rules. In Yacc++, we call your (and our) technique "direct
translation of regular expressions". It does have some advantages
over converting the RE's to the equivalent BNF before generating the
tables.
> Therefore two new kinds of Reduce/Reduce conflict may appear....
> In the first case we can use precedence rules (as in Yacc), i.e. give
> priorities to the alternatives. If the conflict occurs we simply
> choose the alternative with the highest priority. But in the second
> case it's not so clear what to do. I don't think that I can use
> precedence here but I suppose some other techniques may help. Does
> somebody know such techniques? I'll appreciate any help in solving
> this problem.
There are several "general" rules that you can use to help your tool
disambiguate. Note, I am going to give my grammar examples in Yacc++
notation, not to be contrary, but because I want to talk about
semantic actions which are naturally enclosed in braces, which means I
can't use braces to indicate closure (0 or more repetitions).
1) The most important rule for RE's is that if two RE's generate the
same set of strings (e.g. generate the same finite automata
states), they are the same RE. You can generalize this to if there
is a conflict between two RE's from one rule (i.e. both RE's are
expansions of the same non-terminal) and both accept the same input
text (in this conflict state), you can ignore conflicts between
them.
A: a* // never any conflicts even for aa, aaaa, ...
| (aa)* // both rules reduce to non-terminal A
There are two caveats to the above rule. If you attach semantic
actions to individual alternatives, then the rule only applies it
the semantic actions are the same.
A: a* {action1} // conflicts must be resolved
| (aa)* {action2} // semantic actions make rules differ
Similarly, if you "clump" tokens by the operators (often indicated
by using parenthesis in the grammar), so that the resulting stack
contains 1 entry for each clump (rahter than one entry for each
token the rhs matches), you need to resolve the conflicts.
Moreover, if you do this, it is better to convert your operators
into auxillary rules, because the auxilllary rules will auto-
matically provide the correct "clumping".
Clumped interpretation:
A: a (a)* // two stack entries one for a and one for a*
| a (a)* a // three stack entries a a* a
Unclumped interpretation:
A: a (a)* // number of stack entries is strictly count of
| a (a)* a // tokens matched and operators make no difference
2) Prefer longest match, then apply precedence and associativity, and
finally use ordering of rules in grammar to select conflicting
alternatives.
A: a* // longest match use 2nd rule for aab, aaaab, ...
| (aa)* b // rule ordering use 1st rule for a, aa (no b), aaa ...
A variation on this rule we have found useful for Yacc++ is to
prefer a RE over a nested rule. The idea is to have the resulting
parser stack have as few non-terminals as possible (and thus the
largest RE on the stack). Note, if you are concerned with stack
overflow, the reverse rule might make sense.
A: a* // prefer a* (one reduce of a long string)
| a A // over aAAAAA (lots of reduces of short strings)
| A a // or AAAAa
We have also found that the precedence and associativity concepts
can be extended and improved, making them both more powerful and
less error prone. We have several ideas along this line. However,
ther isn't space in this note to describe those ideas.
3) If all else fails, declare am ambiguity and make the grammar
writer rework the grammar to express what they want. In 12 years
of Yacc++ use (with hunders of users), we have probably had only a
handfull of complaints about things Yacc++ decided were ambiguous.
Most of them were truly bugs in the tool, where the tool had gotten
confused and generated incorrect tables and decided that something
was ambiguous only because it was going to misparse the input.
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
------------------------------------------------------------------------------
PS. Is there any particular reason, you are inventing your own
compiler-compiler technology? Perhaps there is a way that we could
cooperate and share technology. Yacc++ has done (direct translation)
ELR parsing for some time and is quite stable.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.