15 Apr 2003 00:14:25 -0400

Related articles |
---|

.NET Compiler for Interactive Fiction david.cornelson@iflibrary.com (David A. Cornelson) (2003-02-21) |

Re: .NET Compiler for Interactive Fiction neelk@alum.mit.edu (Neelakantan Krishnaswami) (2003-02-24) |

Re: .NET Compiler for Interactive Fiction tbandrow@unitedsoftworks.com (2003-03-09) |

Re: .NET Compiler for Interactive Fiction david.cornelson@iflibrary.com (David A. Cornelson) (2003-03-14) |

Re: .NET Compiler for Interactive Fiction tbandrow@unitedsoftworks.com (2003-03-16) |

Re: .NET Compiler for Interactive Fiction JeffKenton@attbi.com (Jeff Kenton) (2003-04-05) |

Re: .NET Compiler for Interactive Fiction joachim_d@gmx.de (Joachim Durchholz) (2003-04-13) |

Re: parsing, was .NET Compiler for Interactive Fiction rpboland@math.uwaterloo.ca (Ralph P. Boland) (2003-04-15) |

Re: parsing, was .NET Compiler for Interactive Fiction cfc@TheWorld.com (Chris F Clark) (2003-04-15) |

Re: parsing, was .NET Compiler for Interactive Fiction joachim_d@gmx.de (Joachim Durchholz) (2003-04-20) |

Re: parsing, was .NET Compiler for Interactive Fiction bobduff@shell01.TheWorld.com (Robert A Duff) (2003-04-20) |

Re: parsing, was .NET Compiler for Interactive Fiction bobduff@shell01.TheWorld.com (Robert A Duff) (2003-04-27) |

Re: parsing, was .NET Compiler for Interactive Fiction rpboland@math.uwaterloo.ca (Ralph P. Boland) (2003-04-27) |

Re: parsing, was .NET Compiler for Interactive Fiction zivca@netvision.net.il (2003-04-27) |

[11 later articles] |

From: | "Ralph P. Boland" <rpboland@math.uwaterloo.ca> |

Newsgroups: | comp.compilers |

Date: | 15 Apr 2003 00:14:25 -0400 |

Organization: | University of Waterloo |

References: | 03-02-125 03-02-147 03-03-043 03-03-061 03-03-103 03-04-006 03-04-028 |

Keywords: | parse |

Posted-Date: | 15 Apr 2003 00:14:25 EDT |

Joachim Durchholz wrote:

*> John wrote:*

*>*

*>>[Parsing has indeed been well studied. The main changes in recent*

*>> years is that algorithms that used to be considered too slow for*

*>>production use aren't any more. -John]*

*>*

*>*

*> Well, Kannapinn's recent work is still a noticeable improvement in LR*

*> parsing, so there *is* progress in this area. But I agree that things*

*> have come to a near-standstill, problems are either solved or*

*> known/assumed to be unsolvable. Research has moved on to other topics.*

DO YOU HAVE A REFERENCE TO KANNIPINN'S WORK?

I WOULD LIKE TO HAVE IT.

Well, as to recent advances in parsing I have four new algorithms, the

first is an extension to LR parsing and the next three are a

succession of extensions.

I'll describe the first method which is probably powerful enough that

the remaining three are not needed for compiler construction and (I'm

guessing) only rarely needed for parsing natural languages. That is,

my algorithm should replace GLR parsing in all but exceptional

circumstances (parse table being too big being one of them)!

First let me describe LR(0) parsing as follows.

A string of symbols (non-terminals initially) is processed left to

right with symbols (and states) being pushed on a stack. Whenever the

top k symbols on the stack can (correctly based on the state

information) be reduced to a non-terminal symbol, say B, then this

reduction is done. THEN B IS PUSHED ONTO THE INPUT QUEUE. The next

operation is always a shift of B onto the stack (with an approapriate

state). Whenever the parser recognizes that a reduction can be done

but is not sure if it should be done it fails. (Actually if this is

possible the parser is not even built).

In standard LR(0) algorithms the push B onto the input queue and

subsequent shift are combined but I use the above description because

it means that there are never two consecutive reductions in LR(0) or

LR(1) parsing.

My algorithm is a generalization of LR(1) parsing but I'll describe it

as a generalization of LR(0) parsing since that is simpler.

While LR(0) parsing (as described above) never does consecutive

reductions (a parser built by) my algorithm can. Let P be a parser

build by my algorithm. Consider that P is parsing a sentence and

the top j symbols on the stack contains a string s

such that s can (correctly based on the states on the stack)

be reduced to a non-terminal C such that the parse

tree T corresponding to C is not recursive; that is no subtree of

T contains a further subtree such that the root of both subtrees

correspond to the same non-terminal symbol.

P correctly recognizes when s can be reduced to C and does so whenever

this situation occurs. Whenever LR(1) conflicts occur or most other

conflicts occur my algorithm does a shift unless the grammar is

ambiguous, which it detects, or certain difficulties arise in which

case it quits. (Actually if these difficulties arise my algorithm

either reports the ambiguity or reports that there are parsing

conflicts it can't handle and quits without building a parser.) My

algorithm requires more states than LR parsing possibly much more but

I claim (without the benifit of an implementation) that this is not a

problem in practice unless the grammar is very large or complicated or

a contrived bad example.

My algorithm is clearly much more powerful than LR parsing and (I

claim) more powerful than the various extensions to LR parsing in the

literature. (Kannapinn's excepted since I am unfamiliar with it as

far as I know and GLR excepted which can handle all CFGs but doesn't

build a full parser in advance) The parsers my algorithms builds

should be only slightly slower that LR parsing though building them

may take a while.

I will be implementing my algorithm and writing the papers that go

with it as soon as I can. Alas, I must finish my current postdoc

first (for which my research must be computatinal geometry related,

the subject of my Ph.D. thesis) and find a faculty position or another

postdoc (which is giving me trouble unfortuntely)

I intend to release under GPL the implementation (in Java, not because

I like Java (I don't) but so that I can learn the language well enough

to teach it) as soon as it's done. I will eventually implement or

arrange to be implemented versions in Smalltalk and C.

I expect the algorithm will lead to future research in parsing. For

example, the algorithm sometimes allows ambiguous grammars to be

detected. And yes I know that this is undecideable problem; the

algorithm cannot build parsers for all CFGs and thus sometimes fails

to determine whether or not an input grammar is ambiguous. Thus there

is no contradiction with established fact. Once you know that a

grammar is ambiguous there is opportunity to do research on what to do

about it.

I admit that I have made a very bold claim, but I am confident enough

in it to make it.

Doubters may at least verify that I am a postdoc in the school of

computer science at the univerity of Waterloo, Canada. This doesn't

prove my claim of course, but it supports that I make it seriously.

Ralph Boland

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.