Related articles |
---|
[LONG&BORING] ULR : Ultimate LR parser =) deadcow@free.fr (Nicolas Repiquet) (2003-10-27) |
Re: [LONG&BORING] ULR : Ultimate LR parser =) oliver@zeigermann.de (Oliver Zeigermann) (2003-10-31) |
From: | "Nicolas Repiquet" <deadcow@free.fr> |
Newsgroups: | comp.compilers |
Date: | 27 Oct 2003 16:05:03 -0500 |
Organization: | Guest of ProXad - France |
Keywords: | parse, LR(1) |
Posted-Date: | 27 Oct 2003 16:05:03 EST |
In an LR parser, when a conflict state is reached, the parser has the
choice between reducing the production rule or continue. To decide
what to do, several technics exist :
- rebuild grammar to avoid conflicts
- introducing a lookahead
- adding new characteristics to the production rule.
This is what I propose today.
(perhaps it has already been done, in which case I'd like some URL please)
Imagine we add 2 more informations to the production rule :
- a precedence level
- an associative indication
The syntax can be something like
axiom : expression eof;
expression :
num_expression : number;
star_expression (2>): expression '*' expression;
plus_expression (1>): expression '+' expression;
}
Where :
- '>' indicates the associativity : here from left to right, but it can also
be < (from right to left)
- '3' indicates the level of precedence
- by default each rule has a (0>) precedence level/associative indication.
There is another notion introduced by this technic : Guaranteed states. A
state has a "guaranteed production" when this state lead to a sole
production.
Here are the states for the above grammar ( position & transition preceded
by @ are the root ones ):
State 0: ( Guarantee axiom )
@ axiom : . expression eof
num_expression : . number
star_expression : . expression '*' expression
plus_expression : . expression '+' expression
@ expression -> 1
number -> 2
State 1:
@ axiom : expression . eof
@ star_expression : expression . '*' expression
@ plus_expression : expression . '+' expression
@ eof -> 3
@ '*' -> 4
@ '+' -> 5
State 2: ( Guarantee num_expression )
@ num_expression : number .
<- num_expression
State 3: ( Guarantee axiom )
@ axiom : expression eof .
<- axiom
State 4: ( Guarantee star_expression )
@ star_expression : expression '*' . expression
num_expression : . number
star_expression : . expression '*' expression
plus_expression : . expression '+' expression
@ expression -> 6
number -> 2
State 5: ( Guarantee plus_expression )
@ plus_expression : expression '+' . expression
num_expression : . number
star_expression : . expression '*' expression
plus_expression : . expression '+' expression
@ expression -> 7
number -> 2
State 6:
@ star_expression : expression '*' expression .
@ star_expression : expression . '*' expression
@ plus_expression : expression . '+' expression
@ '*' -> 4
@ '+' -> 5
<- star_expression
State 7:
@ plus_expression : expression '+' expression .
@ star_expression : expression . '*' expression
@ plus_expression : expression . '+' expression
@ '*' -> 4
@ '+' -> 5
<- plus_expression
Now, let's study the parsing of "3 + 3 * 3"
State | Stack | Input | Action
------+---------------+-----------+--------------
0 | 3 | +3*3eof | go to 2
2 | num_e | +3*3eof | reduce & go to 0
0 | num_e | +3*3eof | go to 1
1 | num_e+ | 3*3eof | go to 5
5 | num_e+3 | *3eof | go to 2
2 | num_e+num_e | *3eof | reduce & go to 5
5 | num_e+num_e | *3eof | go to 7
Here its special because 7 is a "conflict state". We gonna just push the
conflict informations on the top of a special "conflict stack", and continue
to go until we reach a "guaranteed state" or an error.
7 | num_e+num_e* | 3eof | go to 4
Hey ! 4 is a guaranteed state. Let's compare informations on running
conflict & actual state :
conflict state reduction : plus_expression (1>)
garanteed state : star_expression (2>)
star_expression has priority, so we can safely delete conflict and continue
to go.
4 | num_e+num_e*3 | eof | go to 2
2 | num_e+num_e*num_e | eof | reduce & go to 4
4 | num_e+num_e*num_e | eof | go to 7
Another conflict. Push informations and continue.
7 | num_e+num_e*num_e eof | | error, nowhere to go.
Error ! We can safely go back to our last conflict, and choose reduction.
Token shifted since our last conflict are pushed back in input.
4 | num_e+star_e | eof | reduce & go to 7
and so on ...
Something must be added : guaranteed states stop conflicts only road from
conflict states and guaranteed state use root transitions, otherwise
guaranteed state are ignored.
As far as i know, this technique can handle any context-free grammar, but
I'd like to know if you see limitations. (and I suppose there are)
Kind regards
-- Nicolas Repiquet
Return to the
comp.compilers page.
Search the
comp.compilers archives again.