Related articles |
---|
Regular LALR grammars simonh@swidev.demon.co.uk (1993-05-25) |
Re: Regular LALR grammars markh@csd4.csd.uwm.edu (1993-05-26) |
Newsgroups: | comp.compilers |
From: | markh@csd4.csd.uwm.edu (Mark) |
Keywords: | LALR, parse |
Organization: | Computing Services Division, University of Wisconsin - Milwaukee |
References: | 93-05-125 |
Date: | Wed, 26 May 1993 23:01:23 GMT |
>How are regular parser grammars handled by the parser generator?
The right hand side of an item is now a DFA.
Take the grammar (based on one presented on a prior post about the Tomita
parser):
S: NP VP.
NP: [D] N PP*.
VP: V NP PP*.
PP: P NP.
Rewriting these in equational DFA form, you get:
Q0 = S Q1
Q1 = 1.
S: Q2 = NP Q3.
Q3 = VP Q4.
Q4 = 1.
NP: Q5 = D Q6 | N Q7.
Q6 = N Q7.
Q7 = PP Q7 | 1.
VP: Q8 = V Q9.
Q9 = NP Q10.
Q10 = PP Q10 | 1.
PP: Q11 = P Q12.
Q12 = NP Q13.
Q13 = 1.
State 0 starts out with the item: [accept: Q0 ]. On closure, you get:
[accept: Q0 ].
[S: Q2].
[NP: Q5].
If any of the states (Q0, Q2, Q5) are final states in their respective
DFAs then the corresponding reduction is listed. For each transition on
each of the states (Q0, Q2, and Q5) a corresponding contribution to a
shift action is recorded:
shift symbol new state
S 1: [accept: Q1].
NP 2: [S: Q3].
D 3: [NP: Q6].
N 4: [NP: Q7].
State 1 has only one item with Q1 listed. Since Q1 is a final state in
the DFA corresponding to "accept", then only the reduction: [accept -> S]
is listed in state 1.
The final list of closures looks like this:
0: [accept: Q0] [S: Q2] [NP: Q5]
1: [accept: Q1]
2: [S: Q3] [VP: Q8]
3: [NP: Q6]
4: [NP: Q7] [PP: Q11]
5: [S: Q4]
6: [VP: Q9] [NP: Q5]
7: [PP: Q12] [NP: Q5]
8: [VP: Q10] [PP: Q11]
9: [PP: Q13]
and the final state table like this:
0: S => 1, NP => 2, D => 3, N => 4
1: <accept>
2: VP => 5, V => 6
3: N => 4
4: PP => 4, P => 7, <reduce NP: D N PP*>
5: <reduce S: NP VP>
6: NP => 8, D => 3, N => 4
7: NP => 9, D => 3, N => 4
8: PP => 8, P => 7, <reduce VP: V NP PP*>
9: <reduce PP: P NP>
with shift-reduce conflicts on states 4 and 8 on P.
Two things are true here that are not true of the BNF-based LR(k) parser:
(P0) The tables tend to be more compact
(P1) A state no longer necessarily has a unique predecessor
symbol (e.g. both N and PP have shifts/gotos into state 4).
A reduce action is carried out by following the corresponding DFA
backwards until you get back to the start state. For example, for the
reduction <VP = V NP PP*>, you first reverse the DFA:
VP: Q8 = V Q9.
Q9 = NP Q10.
Q10 = PP Q10 | 1.
by reversing all the arrows and reversing final and start states to get:
<reduce VP: V NP PP*>: Q10 = PP Q10 | NP Q9.
Q9 = V Q8.
Q8 = 1.
Since there may be more than one final state in the DFA, the reverse
finite state automaton won't necessarily be deterministic and might need
to be reduced to a DFA after the reversal. This automaton will reduce any
sequence of symobols on the stack of the form (reading backwards): PP* NP
V into a VP node.
Property P1 has the consequences that:
(a) a state in the reversed DFA
may have more than one transition
and (b) it's therefore necessary to actually look at the symbol on
the stack when making the reduction.
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.