Earley parser

mefrill@yandex.ru (Vladimir)
3 May 2005 14:19:44 -0400

          From comp.compilers

Related articles
Earley parser aamshukov@cogeco.ca (arthur) (2005-05-01)
Re: Earley parser wyrmwif@tsoft.org (SM Ryan) (2005-05-01)
Re: Earley parser awwaiid@thelackthereof.org (Brock) (2005-05-02)
Re: Earley parser schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-05-02)
Earley parser mefrill@yandex.ru (2005-05-03)
Re: Earley parser angray@beeb.net (Aaron Gray) (2005-05-05)
Earley Parser gelhausen@ipd.uka.de (Tom Gelhausen) (2005-10-02)
Re: Earley Parser vmakarov@redhat.com (Vladimir N. Makarov) (2005-10-03)
Re: Earley Parser scavadini@ucse.edu.ar (2005-10-06)
Earley parser bugs? zingard@mcmaster.ca (Daniel Zingaro) (2007-07-25)
Re: Earley parser bugs? schmitz@i3s.unice.fr (Sylvain Schmitz) (2007-07-26)
| List of all articles for this month |

From: mefrill@yandex.ru (Vladimir)
Newsgroups: comp.compilers
Date: 3 May 2005 14:19:44 -0400
Organization: http://groups.google.com
Keywords: parse
Posted-Date: 03 May 2005 14:19:44 EDT

On 2005.05.01.21.43, arthur wrote:
| I have implemented a Earley's recognizer, now I am looking for any
| references to build a parse tree from it. I looked at the Aho/Ulman
| algorithm and the Grune/Jacob description. Does anybody have any
| others algorithms or references?


Hi Arthur,


Not so long ago I've introduced new algorithm, which was named as
Earley's Algorithm Adapted to build parse trees. The main problem with
Earley's algorithm is it does not correct build parse trees if the
input grammar is ambigous. That's the problem and answer why Earley
didn't provide parsing algorithm, only recognizing one. The approach
to build parse tree is following: we are extending Eraley's item,
which consists of three components [A-->alpha*beta, i, omega] where
A-->alpha*beta - input grammar's rule, i - number of Earley's state
(list) where this item was burn and omega - lookahead string. As it
was expained long before (in 70th) lookahead string is not needed here
because it only improves Predictor operation and only in few cases.
So, drop this and let Earley's item will consists of thow components
[A-->alpha*beta, i]. Now add two extended components to Earley's item:


1. The reference to item, which was the reason the current item was
added to this Earley's state. This is the item, which was the
parameter of Completer operation (I'll explain this later in details).
It is named as low-reference.


2. The reference to item within the same components but where the
point in the rule A-->alpha*beta is one simbol left (if we have item
[A-->ab*c,2], this is item [A-->a*bc,2]). It is clear the reference is
bull if the point is in the beginning of the rule. It is named as
left-reference.


So, here is the example. The grammar is
E-->E+n
E-->n
E - nonterminal, n, + - terminals.
The input string is "n".


The parsing is:
List 0:
1.[E-->*E+n,0,null,null]
2.[E-->*n,0,null,null]


List 1:
1.[E-->n*,0,null,0.2] (0 is the list's number, 2 is the nu,ber of
item).


So, we have parse tree:
                  E
                  |
                  n


But, such the references keeping may be incorrect if the grammar have
the rules with empty right part. I've intended keep the list of
low-references instead the single one. And, if there will be ambigous
parsing the list of low-references will reflect this in correct way.
So, the item now is [A-->alpha*beta,i,<low-ref>,left-ref]. Every time
Completer operation runs on item I, if it adds a new item I1 the
low-ref to I1 is added to the list of low-refs in item I.


This algorithm was a part of my PhD. It is in russian. If you know
this I can send you the text within the algorithm description and
correctness prove as well.


Vladimir.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.