# Re: Opinions about "epsilon" Symbols in Parse Trees

## "mefrill" <mefrill@yandex.ru>12 Jun 2005 21:30:26 -0400

From comp.compilers

Related articles
Opinions about "epsilon" Symbols in Parse Trees drikosv@otenet.gr (Evangelos Drikos) (2005-06-09)
Re: Opinions about "epsilon" Symbols in Parse Trees schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-06-10)
Re: Opinions about "epsilon" Symbols in Parse Trees DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-06-10)
Re: Opinions about "epsilon" Symbols in Parse Trees mefrill@yandex.ru (mefrill) (2005-06-12)
Re: Opinions about "epsilon" Symbols in Parse Trees news4e71@yahoo.com (0x4e71) (2005-06-12)
Re: Opinions about "epsilon" Symbols in Parse Trees DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-06-12)
Re: Opinions about "epsilon" Symbols in Parse Trees drikosv@otenet.gr (eDrikos) (2005-06-13)
| List of all articles for this month |
 From: "mefrill" Newsgroups: comp.compilers Date: 12 Jun 2005 21:30:26 -0400 Organization: http://groups.google.com References: 05-06-052 Keywords: parse, LR(1) Posted-Date: 12 Jun 2005 21:30:26 EDT

The epsilon problem in LR(k) analysis is really wider problem of
ambiguity. Now I'll tey explain this. Firstly, you should remember LR
algorithm does not work with rules but works with states of
LR-automata. State contain signed rules - rules within point inside
right part of the rule. For example, rule id --> ids idp can produce
three signed rules: id --> * ids idp, id --> ids * idp, id --> ids idp
*. LR(k) generator firstly (befroe any parsing) builds LR(k)-automata.
The state of LR(k)-automata is LR(k)-item, which consists of two
components: signed rule and lookahead string. Lookahead string is just
the sequence of k terminals of the grammar. For example, LR(1)-item's
lookahead string contains only the single terminal symbol. We'll
discuss LR(0)-items with empty lookahead string to simplify our
discussion. Take you grammar and build LR(0)-automata for this as an
example and illistration of the problem. Lets start from grammar
description:
id --> ids idp
ids --> LL
idp --> LL
idp --> D
idp -->

And build LR(0)-automata for this:

State 0 is initial state. So, we put start rule there:
id --> * ids idp.
So far, the state contains only the single state and nonterminal symbol
ids is just after the point in signed rule. So, apply closure operation
i.e. add to the state all signed rules of the grammar where ids is on
the left part. It is only the single such the rule ids --> LL. So, the
state number 0 is now follows:
id --> * ids idp
ids --> * LL

Now, build moves from this state. Move means really point move one
symbol to right. There are two moves: on symbols ids and LL. So, there
may be two additinal states:
State 1 contains
ids --> LL *
Here closure operation cannot be applied as there is not any symbol
after the point. And state 2
id --> ids * idp
It is time to apply closure to the symbol idp. We get
id --> ids * idp
idp --> * LL
idp --> *D
idp --> *

Build three additional state on moves from state 2.
State 3 on LL move
idp --> LL *

State 4 on D move
idp --> D *

And state 5 on idp move
id --> ids idp *

For the moment, LR(0)-automata is build (there are not any move). But
the information we can get from the automata is not enough to do
LR-analysis. The problem is in move on nonterminal. We know how to do
moving on terminal, when we read it from input string. So, starting
from initial state 0 we can easy move in state 1 by reading symbol LL
from input string. But, how to move on ids??? The solution is taking a
look on state with LR-item where point is on the last position in right
part of the rule. In our automata we have all such the states except
state 0. What does it mean the point on the last position in the signe
rule? It means that some rule (and some nonterminal!) is parsed. So, in
state 1 we have parsed rule ids --> LL and it means we have parsed
symbol ids. So, we can move from state 0 to state 2 on symbol ids. Why
from state 0? Beacause it contains rule ids --> * LL where point is on
the start position and via the sequence of moves (really only one move)
we can get state 1. It is named reduce in LR-analysis. The move on some
symbol is named as shift. In each state we must know what to do,
wherther reduce or shift. If we have more then one possibility it is
named as ambiguity. We have ambuity in state 2. We can shift on LL,
shift on D and reduce on idp in the same state 2 and after go to state
5. And, if the problem with mmoves on LL and D can be avoided by using
LR(1)-automata instead of LR(0), the empty rule idp --> * cannot be
avoided for any k in LR(k). We will have shift/reduce conflict. How to
decide this? The approach was intended in 1974 by Bernard Lang and is
based really on very simple idea. Every time when we have ambiguity
comflict clone the parse stack and parse on them in parallel. So, we
may have possiblly many many stacks with many symbols and state numbers
in each of them and every time when we read symbol from the input
string we do parsing on each stack. Of course, it may be not so soon,
but it is the payment for ambiguity. In 80th Masari Tomita suggested do
not clone stacks of symbols, only clone references on them. So, we have
references and it means we have graph where vertexes are either state
numbers or grammar symbols and references between them are edges.
Tomita named such the graph as Graph Structured Stack - GSS and the
Tomita's algorithm worked on this. Tomita intended five algorithms amd,
it's interesting, not one of them worked correctly on any grammar. The
problem was still with empty rule. In firs part of 90th Tomita's
disciple Farshi (I am not sure about right name) intended new
algorithm, which worked correctly on any grammar. But, the
computational complexity of the algorithm was very high. And in 2000th
three guys from London, Scott, Johnstone and Hussain, decided the
problem with empty rule by modification LR-automata (adding additional
moves on empty rules). So, you really need to this algorithm. Below I
put some publications list where you can find needed information.

1. B. Lang. Deterministic techniques for efficient non-deterministic
parsers. Automata, Languages and Programming: 2nd Colloquium, pages
255-269, 1974.
2. M. Tomita. An Efficient Augmented-Context-Free Parsing Algorithm.
Pittsburgh, Carnegie-Mellon University, 1987.
3. M. Tomita. Efficient parsing for natural language. Kluwer Academic
Publishers, Boston, 1986.
4. R. Nozohoor-Farshi. GLR parsing for -grammars. In Masaru Tomita,
editor, Generalized LR parsing, pages 61-75. Kluwer Academic
Publishers, Netherlands, 1991,
5. E. Scott, A. Johnstone, S. S. Hussain. Tomita-Style Generalized LR
Parsers. Royal Holloway University of London, 2000.

Post a followup to this message