[LONG&BORING] ULR : Ultimate LR parser =)

"Nicolas Repiquet" <deadcow@free.fr>
27 Oct 2003 16:05:03 -0500

          From comp.compilers

Related articles
[LONG&BORING] ULR : Ultimate LR parser =) deadcow@free.fr (Nicolas Repiquet) (2003-10-27)
Re: [LONG&BORING] ULR : Ultimate LR parser =) oliver@zeigermann.de (Oliver Zeigermann) (2003-10-31)
| List of all articles for this month |
From: "Nicolas Repiquet" <deadcow@free.fr>
Newsgroups: comp.compilers
Date: 27 Oct 2003 16:05:03 -0500
Organization: Guest of ProXad - France
Keywords: parse, LR(1)
Posted-Date: 27 Oct 2003 16:05:03 EST

In an LR parser, when a conflict state is reached, the parser has the
choice between reducing the production rule or continue. To decide
what to do, several technics exist :


- rebuild grammar to avoid conflicts
- introducing a lookahead
- adding new characteristics to the production rule.


This is what I propose today.
(perhaps it has already been done, in which case I'd like some URL please)
Imagine we add 2 more informations to the production rule :
- a precedence level
- an associative indication


The syntax can be something like


axiom : expression eof;


expression :


    num_expression : number;
    star_expression (2>): expression '*' expression;
    plus_expression (1>): expression '+' expression;
}




Where :
- '>' indicates the associativity : here from left to right, but it can also
be < (from right to left)
- '3' indicates the level of precedence
- by default each rule has a (0>) precedence level/associative indication.


There is another notion introduced by this technic : Guaranteed states. A
state has a "guaranteed production" when this state lead to a sole
production.


Here are the states for the above grammar ( position & transition preceded
by @ are the root ones ):


State 0: ( Guarantee axiom )
@ axiom : . expression eof
num_expression : . number
star_expression : . expression '*' expression
plus_expression : . expression '+' expression
@ expression -> 1
number -> 2


State 1:
@ axiom : expression . eof
@ star_expression : expression . '*' expression
@ plus_expression : expression . '+' expression
@ eof -> 3
@ '*' -> 4
@ '+' -> 5


State 2: ( Guarantee num_expression )
@ num_expression : number .
<- num_expression


State 3: ( Guarantee axiom )
@ axiom : expression eof .
<- axiom


State 4: ( Guarantee star_expression )
@ star_expression : expression '*' . expression
num_expression : . number
star_expression : . expression '*' expression
plus_expression : . expression '+' expression
@ expression -> 6
number -> 2


State 5: ( Guarantee plus_expression )
@ plus_expression : expression '+' . expression
num_expression : . number
star_expression : . expression '*' expression
plus_expression : . expression '+' expression
@ expression -> 7
number -> 2


State 6:
@ star_expression : expression '*' expression .
@ star_expression : expression . '*' expression
@ plus_expression : expression . '+' expression
@ '*' -> 4
@ '+' -> 5
<- star_expression


State 7:
@ plus_expression : expression '+' expression .
@ star_expression : expression . '*' expression
@ plus_expression : expression . '+' expression
@ '*' -> 4
@ '+' -> 5
<- plus_expression




Now, let's study the parsing of "3 + 3 * 3"


State | Stack | Input | Action
------+---------------+-----------+--------------
0 | 3 | +3*3eof | go to 2
2 | num_e | +3*3eof | reduce & go to 0
0 | num_e | +3*3eof | go to 1
1 | num_e+ | 3*3eof | go to 5
5 | num_e+3 | *3eof | go to 2
2 | num_e+num_e | *3eof | reduce & go to 5
5 | num_e+num_e | *3eof | go to 7


Here its special because 7 is a "conflict state". We gonna just push the
conflict informations on the top of a special "conflict stack", and continue
to go until we reach a "guaranteed state" or an error.


7 | num_e+num_e* | 3eof | go to 4


Hey ! 4 is a guaranteed state. Let's compare informations on running
conflict & actual state :


conflict state reduction : plus_expression (1>)
garanteed state : star_expression (2>)


star_expression has priority, so we can safely delete conflict and continue
to go.


4 | num_e+num_e*3 | eof | go to 2
2 | num_e+num_e*num_e | eof | reduce & go to 4
4 | num_e+num_e*num_e | eof | go to 7


Another conflict. Push informations and continue.


7 | num_e+num_e*num_e eof | | error, nowhere to go.


Error ! We can safely go back to our last conflict, and choose reduction.
Token shifted since our last conflict are pushed back in input.


4 | num_e+star_e | eof | reduce & go to 7


and so on ...


Something must be added : guaranteed states stop conflicts only road from
conflict states and guaranteed state use root transitions, otherwise
guaranteed state are ignored.


As far as i know, this technique can handle any context-free grammar, but
I'd like to know if you see limitations. (and I suppose there are)




Kind regards


-- Nicolas Repiquet


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.