[LONG&BORING] ULR : Ultimate LR parser =)

"Nicolas Repiquet" <deadcow@free.fr>
27 Oct 2003 16:05:03 -0500

From comp.compilers

Related articles
*[LONG&BORING] ULR : Ultimate LR parser =) deadcow@free.fr (Nicolas Repiquet)* (2003-10-27)**
Re: [LONG&BORING] ULR : Ultimate LR parser =) oliver@zeigermann.de (Oliver Zeigermann) (2003-10-31)

| List of all articles for this month |

From:	"Nicolas Repiquet" <deadcow@free.fr>
Newsgroups:	comp.compilers
Date:	27 Oct 2003 16:05:03 -0500
Organization:	Guest of ProXad - France
Keywords:	parse, LR(1)
Posted-Date:	27 Oct 2003 16:05:03 EST

In an LR parser, when a conflict state is reached, the parser has the
choice between reducing the production rule or continue. To decide
what to do, several technics exist :

- rebuild grammar to avoid conflicts
- introducing a lookahead
- adding new characteristics to the production rule.

This is what I propose today.
(perhaps it has already been done, in which case I'd like some URL please)
Imagine we add 2 more informations to the production rule :
- a precedence level
- an associative indication

The syntax can be something like

axiom : expression eof;

expression :

    num_expression : number;
    star_expression (2>): expression '*' expression;
    plus_expression (1>): expression '+' expression;
}

Where :
- '>' indicates the associativity : here from left to right, but it can also
be < (from right to left)
- '3' indicates the level of precedence
- by default each rule has a (0>) precedence level/associative indication.

There is another notion introduced by this technic : Guaranteed states. A
state has a "guaranteed production" when this state lead to a sole
production.

Here are the states for the above grammar ( position & transition preceded
by @ are the root ones ):

State 0: ( Guarantee axiom )
@ axiom : . expression eof
num_expression : . number
star_expression : . expression '*' expression
plus_expression : . expression '+' expression
@ expression -> 1
number -> 2

State 1:
@ axiom : expression . eof
@ star_expression : expression . '*' expression
@ plus_expression : expression . '+' expression
@ eof -> 3
@ '*' -> 4
@ '+' -> 5

State 2: ( Guarantee num_expression )
@ num_expression : number .
<- num_expression

State 3: ( Guarantee axiom )
@ axiom : expression eof .
<- axiom

State 4: ( Guarantee star_expression )
@ star_expression : expression '*' . expression
num_expression : . number
star_expression : . expression '*' expression
plus_expression : . expression '+' expression
@ expression -> 6
number -> 2

State 5: ( Guarantee plus_expression )
@ plus_expression : expression '+' . expression
num_expression : . number
star_expression : . expression '*' expression
plus_expression : . expression '+' expression
@ expression -> 7
number -> 2

State 6:
@ star_expression : expression '*' expression .
@ star_expression : expression . '*' expression
@ plus_expression : expression . '+' expression
@ '*' -> 4
@ '+' -> 5
<- star_expression

State 7:
@ plus_expression : expression '+' expression .
@ star_expression : expression . '*' expression
@ plus_expression : expression . '+' expression
@ '*' -> 4
@ '+' -> 5
<- plus_expression

Now, let's study the parsing of "3 + 3 * 3"

State | Stack | Input | Action
------+---------------+-----------+--------------
0 | 3 | +3*3eof | go to 2
2 | num_e | +3*3eof | reduce & go to 0
0 | num_e | +3*3eof | go to 1
1 | num_e+ | 3*3eof | go to 5
5 | num_e+3 | *3eof | go to 2
2 | num_e+num_e | *3eof | reduce & go to 5
5 | num_e+num_e | *3eof | go to 7

Here its special because 7 is a "conflict state". We gonna just push the
conflict informations on the top of a special "conflict stack", and continue
to go until we reach a "guaranteed state" or an error.

7 | num_e+num_e* | 3eof | go to 4

Hey ! 4 is a guaranteed state. Let's compare informations on running
conflict & actual state :

conflict state reduction : plus_expression (1>)
garanteed state : star_expression (2>)

star_expression has priority, so we can safely delete conflict and continue
to go.

4 | num_e+num_e*3 | eof | go to 2
2 | num_e+num_e*num_e | eof | reduce & go to 4
4 | num_e+num_e*num_e | eof | go to 7

Another conflict. Push informations and continue.

7 | num_e+num_e*num_e eof | | error, nowhere to go.

Error ! We can safely go back to our last conflict, and choose reduction.
Token shifted since our last conflict are pushed back in input.

4 | num_e+star_e | eof | reduce & go to 7

and so on ...

Something must be added : guaranteed states stop conflicts only road from
conflict states and guaranteed state use root transitions, otherwise
guaranteed state are ignored.

As far as i know, this technique can handle any context-free grammar, but
I'd like to know if you see limitations. (and I suppose there are)

Kind regards

-- Nicolas Repiquet

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

[LONG&BORING] ULR : Ultimate LR parser =)

"Nicolas Repiquet" <deadcow@free.fr>27 Oct 2003 16:05:03 -0500

"Nicolas Repiquet" <deadcow@free.fr>
27 Oct 2003 16:05:03 -0500