Have I discovered something new?

"Chris F Clark" <cfc@world.std.com>
21 Jul 2002 02:09:33 -0400

          From comp.compilers

Related articles
Have I discovered something new? steve.horne@iris.co.uk (Stephen Horne) (2002-07-15)
Re: Have I discovered something new? torbenm@diku.dk (Torben Ægidius Mogensen) (2002-07-21)
Re: Have I discovered something new? joachim_d@gmx.de (Joachim Durchholz) (2002-07-21)
Have I discovered something new? cfc@world.std.com (Chris F Clark) (2002-07-21)
Re: Have I discovered something new? kgw-news@stiscan.com (2002-07-21)
Re: Have I discovered something new? whopkins@alpha2.csd.uwm.edu (Mark) (2002-07-24)
Re: Have I discovered something new? steve@lurking.demon.co.uk (Steve Horne) (2002-07-25)
Re: Have I discovered something new? robert.corbett@sun.com (Robert Corbett) (2002-07-31)
Re: Have I discovered something new? whopkins@alpha2.csd.uwm.edu (Mark) (2002-07-31)
Re: Have I discovered something new? cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-04)
[1 later articles]
| List of all articles for this month |
From: "Chris F Clark" <cfc@world.std.com>
Newsgroups: comp.compilers
Date: 21 Jul 2002 02:09:33 -0400
Organization: Compilers Central
References: 02-07-057
Keywords: parse
Posted-Date: 21 Jul 2002 02:09:32 EDT

Steve Horne wrote:
> This book contained a statement that the possibility
> of an linear-time parsing algorithm for arbitrary context-free
> grammars was still an open question - that there was no proof or
> disproof, that every known context-free grammar can be parsed in
> linear time using specially designed methods, but that no general
> algorithm currently exists which works for all context-free grammars.
>
> Is this still the state of play?


Yes. As far as I can tell there is no published algorithm the parses
every known context free grammar in linear time. Nor is there even
anyone who claims to have implemented an algorithm the does that (that
I am aware of).


> The reason I ask is that I believe I have found an algorithm to do
> precisely this.


Well, that would be something new and it is certainly worth pursuing
what you have done further, even if what you believe is not
*completely* true.


> It will certainly handle any grammar that a Tomita parser will
> handle, and do so without needing stack duplication.


At one level I believe your claim for this, because we have an
(unpublished) algorithm implemented in an experimental version of
Yacc++ that does roughly this. (We call our technique LR(infinity)
because it effectively extends the lookahead of the parser out in an
unbounded fashion.) The idea is that instead of duplicating the stack
at runtime as a Tomita parser does, one precomputes the stack
extensions at compile time (the same way that an LR parser does
relative to an LL one).


I believe for non-ambiguous grammar the process of doing so always
terminates. So, if that is what you have invented, I think the basic
concept is sound.


On the other hand, your ability to produce all possible pairings for
ambiguous grammars in linear time is not believable. I have seen a
proof that such pairings grow at an n**2 rate. Thus, writing them
will necessarily take n**2 time.


> The only area where I have a serious uncertainty is in the issue of
> parsing infinitely ambiguous grammars. I need to think about this some
> more. I'm pretty sure the technique can be adapted, and the final
> output from parsing a grammar and scentence with infinite ambiguity
> would be essentially a regular grammar (with branches instead of
> alternatives) describing the structure of the parse tree.


If your technique is similar to the one I described, this is a real
problem. For one thing, there is a proof that there is no algorithm
that can determine if an arbitrary cfg is unambiguous--one can use an
algorithm that determines if an arbitrary cfg is ambiguous to solve
the Post-correspondence problem, which is equivalent to solving the
halting problem. If your algorithm could detect these infinite
ambiguities (and halt), then it would have essentially solved the cfg
ambiguity problem.


Our technique founders on this very rock. Although there are certain
ambiguities it can detect, on other grammars the algorithm loops
forever attempting to resolve the lookahead.


That is not a completely hopeless situation. For every set of
grammars less than some finite length, there is a finite maximum
number of states that any unambiguous grammar has and all ambiguous
grammars are "detectable" by that point. The only problem is that we
cannot know (via an algorithm) what that number is in advance for some
unspecified finite length. That is, if I tell you in advance what the
length is, you can tell me a number, because it exists. However, if
you tell me that you can calculate the number for any length, I can
come up with a length for which your calculation fails.


In any case, we use that property to constrain our algorithm by
allowing the user to fix the maximum nuber of lookahead states.
Unambiguous grammars don't generally need very many lookahead states.
It is worth losing a few unambiguous grammars to have an algorithm
that always terminates.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.