Re: regular expression operators in CF grammars

"Chris F Clark" <cfc@shell01.TheWorld.com>
12 Sep 2002 00:21:54 -0400

          From comp.compilers

Related articles
[17 earlier articles]
Re: regular expression operators in CF grammars cfc@world.std.com (Chris F Clark) (2002-07-15)
Re: regular expression operators in CF grammars kgw-news@stiscan.com (2002-07-21)
Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10)
Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10)
Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-23)
Re: regular expression operators in CF grammars tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-09-03)
Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-09-12)
Re: regular expression operators in CF grammars cfc@TheWorld.com (Chris F Clark) (2002-09-14)
RE: regular expression operators in CF grammars qjackson@shaw.ca (Quinn Tyler Jackson) (2002-09-14)
| List of all articles for this month |

From: "Chris F Clark" <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 12 Sep 2002 00:21:54 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 02-05-142 02-05-151 02-06-024 02-06-056 02-06-090 02-07-010 02-07-025 02-07-043 02-08-035 02-08-078 02-09-018
Keywords: parse
Posted-Date: 12 Sep 2002 00:21:54 EDT

tj bandrowsky wrote:
> If so, I was able to parse it with diplodicus. The trace for the
> parse is below. Diplodicus tries to solve this sort of ambiguity
> problem by never reducing until there is one and only one unambiguous
> reduction to take. It keeps track of all the rules that are possibly
> ambiguous as each token is shifted. The trick I used to make this
> work is sensing the condition where you go from not reducing a
> non-ambiguous rule because another rule is ambiguous with to have no
> reductions to make at all, in which case you have to have backtrack at
> most one token, take the unambiguous reduction you previously blew
> off, and then proceed.


From, what I can tell, you have the correct interpretation, and it is
not surprising that you can find it by "parsing ahead" until there is
no ambiguity and then making the unresolved reductions to match the
now unambiguous context.


I am a little suprised that you can always backup at most one token.
My experience suggests that there are temporary/local ambiguities
(i.e. conflicts) that require an infinite amount of lookahead to
resolve, although the lookahead can always be parsed by a CFG.


Consider:


goal: a | b;


a: x y z;
b: r s t;


x: "0" x | "0";
y: "1" y | "1";
z: "2";


r: "0" r | "0";
s: "1" s | "1";
t: "3";


In this gramar, the non-terminal x is the same as r and y is the same
as s but z and t are distinct. Thus, the language is unambiguous.
However, to distinguish between x and r, one must look past y and s,
requiring unbounded lookahead, but with lookahead that is parse-able
with a CFG. Do you think you can parse this by backing up only one
token? What if the ambiguity in z versus t is k tokens in? My point
is not that you cannot resolve this conflict with diplodicus (I htink
you can), but that you may have to back up more than one token to do
so.


BTW, this is exactly the type of problem that we tried to solve with
our "LR(infinity)" parsing technology.


Note, this gramar is GLR, but not LR(k) for any k. Note, that the
language itself is LR(k), even though the grammar given isn't. To
make an LR(k) grammar, simply substitute x for r and y for s in the b
rule and viola an LR(k) grammar--thus the language is LR(k) despite
the conflicts in the original grammar.


The LR(infinity) model could be stated simply as:


if at any conflict in the LR machine the two conflicting rules are the
result of distinct items, the parser generator attempts to resolve the
conflict by parsing the distinct right context grammars implied by
those two items until they can be distinguished.


The right context grammar for the rule x: "0" x | "0" .;
is:


a: x .y z;
y: "1" y | "1";
z: "2";


The right context grammar for the rule r: "0" r | "0" .;
is:


b: r .s t;
s: "1" s | "1";
t: "3";


Resolving this conflict may introduce new conflicts in the resulting
LR machine and these are resolved the same way. One can think of this
as "taking the limit" of the conflict resolution process. If the
conflicts are eventually all resolved, the limit converges and the
grammar is LR(infinity). If the conflicts continue generating new
conflicts, the limit diverges and the grammar is not LR(infinity).


By appealing to the theorem which states that it is undecidable
whether a given cfg is unambiguous or not, we can tell that there is
no algorithmic way to decide whether the limit will converge or not
(for any arbitrary) grammar. (That is, if someone gives an ominpotent
being an algorithm that purports to solve the ambiguity problem, that
omnipotent being can give them a grammar their algorithm will
misclassify.) However, it is my belief that for a large class of
grammars, the limits do converge. I also believe that most common
ambiguities can also be detected.


(My apologies in being so tardy in replying. I've been on vacation
and then got caught up in another interesting discussion via email on
predicated grammars.)


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.