Re: parsing, was .NET Compiler for Interactive Fiction

"Ralph P. Boland" <rpboland@math.uwaterloo.ca>
15 Apr 2003 00:14:25 -0400

          From comp.compilers

Related articles
.NET Compiler for Interactive Fiction david.cornelson@iflibrary.com (David A. Cornelson) (2003-02-21)
Re: .NET Compiler for Interactive Fiction neelk@alum.mit.edu (Neelakantan Krishnaswami) (2003-02-24)
Re: .NET Compiler for Interactive Fiction tbandrow@unitedsoftworks.com (2003-03-09)
Re: .NET Compiler for Interactive Fiction david.cornelson@iflibrary.com (David A. Cornelson) (2003-03-14)
Re: .NET Compiler for Interactive Fiction tbandrow@unitedsoftworks.com (2003-03-16)
Re: .NET Compiler for Interactive Fiction JeffKenton@attbi.com (Jeff Kenton) (2003-04-05)
Re: .NET Compiler for Interactive Fiction joachim_d@gmx.de (Joachim Durchholz) (2003-04-13)
Re: parsing, was .NET Compiler for Interactive Fiction rpboland@math.uwaterloo.ca (Ralph P. Boland) (2003-04-15)
Re: parsing, was .NET Compiler for Interactive Fiction cfc@TheWorld.com (Chris F Clark) (2003-04-15)
Re: parsing, was .NET Compiler for Interactive Fiction joachim_d@gmx.de (Joachim Durchholz) (2003-04-20)
Re: parsing, was .NET Compiler for Interactive Fiction bobduff@shell01.TheWorld.com (Robert A Duff) (2003-04-20)
Re: parsing, was .NET Compiler for Interactive Fiction bobduff@shell01.TheWorld.com (Robert A Duff) (2003-04-27)
Re: parsing, was .NET Compiler for Interactive Fiction rpboland@math.uwaterloo.ca (Ralph P. Boland) (2003-04-27)
Re: parsing, was .NET Compiler for Interactive Fiction zivca@netvision.net.il (2003-04-27)
[11 later articles]
| List of all articles for this month |

From: "Ralph P. Boland" <rpboland@math.uwaterloo.ca>
Newsgroups: comp.compilers
Date: 15 Apr 2003 00:14:25 -0400
Organization: University of Waterloo
References: 03-02-125 03-02-147 03-03-043 03-03-061 03-03-103 03-04-006 03-04-028
Keywords: parse
Posted-Date: 15 Apr 2003 00:14:25 EDT

Joachim Durchholz wrote:
> John wrote:
>
>>[Parsing has indeed been well studied. The main changes in recent
>> years is that algorithms that used to be considered too slow for
>>production use aren't any more. -John]
>
>
> Well, Kannapinn's recent work is still a noticeable improvement in LR
> parsing, so there *is* progress in this area. But I agree that things
> have come to a near-standstill, problems are either solved or
> known/assumed to be unsolvable. Research has moved on to other topics.


DO YOU HAVE A REFERENCE TO KANNIPINN'S WORK?
I WOULD LIKE TO HAVE IT.


Well, as to recent advances in parsing I have four new algorithms, the
first is an extension to LR parsing and the next three are a
succession of extensions.


I'll describe the first method which is probably powerful enough that
the remaining three are not needed for compiler construction and (I'm
guessing) only rarely needed for parsing natural languages. That is,
my algorithm should replace GLR parsing in all but exceptional
circumstances (parse table being too big being one of them)!




First let me describe LR(0) parsing as follows.


A string of symbols (non-terminals initially) is processed left to
right with symbols (and states) being pushed on a stack. Whenever the
top k symbols on the stack can (correctly based on the state
information) be reduced to a non-terminal symbol, say B, then this
reduction is done. THEN B IS PUSHED ONTO THE INPUT QUEUE. The next
operation is always a shift of B onto the stack (with an approapriate
state). Whenever the parser recognizes that a reduction can be done
but is not sure if it should be done it fails. (Actually if this is
possible the parser is not even built).


In standard LR(0) algorithms the push B onto the input queue and
subsequent shift are combined but I use the above description because
it means that there are never two consecutive reductions in LR(0) or
LR(1) parsing.




My algorithm is a generalization of LR(1) parsing but I'll describe it
as a generalization of LR(0) parsing since that is simpler.


While LR(0) parsing (as described above) never does consecutive
reductions (a parser built by) my algorithm can. Let P be a parser
build by my algorithm. Consider that P is parsing a sentence and
    the top j symbols on the stack contains a string s
such that s can (correctly based on the states on the stack)
be reduced to a non-terminal C such that the parse
tree T corresponding to C is not recursive; that is no subtree of
T contains a further subtree such that the root of both subtrees
correspond to the same non-terminal symbol.


P correctly recognizes when s can be reduced to C and does so whenever
this situation occurs. Whenever LR(1) conflicts occur or most other
conflicts occur my algorithm does a shift unless the grammar is
ambiguous, which it detects, or certain difficulties arise in which
case it quits. (Actually if these difficulties arise my algorithm
either reports the ambiguity or reports that there are parsing
conflicts it can't handle and quits without building a parser.) My
algorithm requires more states than LR parsing possibly much more but
I claim (without the benifit of an implementation) that this is not a
problem in practice unless the grammar is very large or complicated or
a contrived bad example.


My algorithm is clearly much more powerful than LR parsing and (I
claim) more powerful than the various extensions to LR parsing in the
literature. (Kannapinn's excepted since I am unfamiliar with it as
far as I know and GLR excepted which can handle all CFGs but doesn't
build a full parser in advance) The parsers my algorithms builds
should be only slightly slower that LR parsing though building them
may take a while.


I will be implementing my algorithm and writing the papers that go
with it as soon as I can. Alas, I must finish my current postdoc
first (for which my research must be computatinal geometry related,
the subject of my Ph.D. thesis) and find a faculty position or another
postdoc (which is giving me trouble unfortuntely)


I intend to release under GPL the implementation (in Java, not because
I like Java (I don't) but so that I can learn the language well enough
to teach it) as soon as it's done. I will eventually implement or
arrange to be implemented versions in Smalltalk and C.


I expect the algorithm will lead to future research in parsing. For
example, the algorithm sometimes allows ambiguous grammars to be
detected. And yes I know that this is undecideable problem; the
algorithm cannot build parsers for all CFGs and thus sometimes fails
to determine whether or not an input grammar is ambiguous. Thus there
is no contradiction with established fact. Once you know that a
grammar is ambiguous there is opportunity to do research on what to do
about it.


I admit that I have made a very bold claim, but I am confident enough
in it to make it.


Doubters may at least verify that I am a postdoc in the school of
computer science at the univerity of Waterloo, Canada. This doesn't
prove my claim of course, but it supports that I make it seriously.


Ralph Boland


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.