Related articles |
---|
parsers for ambiguous grammars? dekker@dutiws.twi.tudelft.nl (1994-02-16) |
GLR(k) parsing using Quantum Grammars mark@freenet.uwm.edu (1994-02-18) |
Newsgroups: | comp.compilers |
From: | mark@freenet.uwm.edu (Mark Hopkins) |
Keywords: | parse |
Organization: | Compilers Central |
References: | 94-02-099 |
Date: | Fri, 18 Feb 1994 05:21:47 GMT |
Rene Dekker <dekker@dutiws.twi.tudelft.nl> asks:
>I am looking for parsing techniques that can handle a highly ambiguous
>grammar and are able to generate partial parses. I am aware that
>chart-parsers can do such a thing, but I am looking for other techniques
>also...
To follow up on the reference to the Tomita algorithm, I'm also working
on application of Quantum Parsing for handling ambiguous grammars.
Refer to the previous articles "bottom up parsing by hand..." for more
background information.
For illustration, this is my pet toy natural language grammar:
S -> NP VP
NP -> NP PP | c S | n
VP -> v NP | VP PP
PP -> p NP
Introducing a set of action symbols to capture the node formation process,
this can be rewritten in the form:
S = NP VP a
VP = v NP b + VP PP c
NP = n d + NP PP e
PP = p NP f
After operators are inserted:
S = <0| NP |0> <0| VP |0> a
VP = v <1| NP |1> b + VP <2| PP |2> c
NP = n d + NP <3| PP |3> e
PP = p NP f
The action symbols are a, b, c, d, e, and f. The input symbols are
v, n, and p. Recall that the sets of input symbols (X), actions symbols
(Y), and operator symbols (D) commute:
x y = y x, x d = d x, y d = d y for all x in X, y in Y, d in D
From the syntax above, we get the following Quantum Grammar:
S = <0| NP
VP = v <1| NP
VP_F = |0> a + <2| PP
NP = n d NP_F
NP_F = |0><0| VP + |1> b VP_F + <3| PP + f PP_F
PP = p NP
PP_F = |2> c VP_F + |3> e NP_F
This can be simplified to the following:
S = <0| n d NP_F
NP_F = |0><0|<1| v n d NP_F + <3| p n d NP_F
+ (|1> b + |2> f c) (a |0> + <2| p n d NP_F)
+ |3> f e NP_F
and converted to the form:
S = <0| n d NP_F
NP_F = (|3> f e)* |0><0|<1| v n d NP_F
+ (|3> f e)* (<3| + (|1> b + |2> f c) <2|) p n d NP_F
+ (|3> f e)* (|1> b + |2> f c) a |0>
using regular expression algebra, and with sharing of common terms, to:
S = <0| d n NP_F
NP_F = A B d v n NP_F
+ A D d p n NP_F
+ A C a |0>
where
A = (|3> f e)*
B = |0><0| <1|
C = (|1> b + |2> f c)
D = <3| + C<2|
This form is what I refer to as quasi-deterministic. Every term on
the right hand side is of the form:
RE(|m>'s, <m|'s, action sym's) (1 or more input syms) Q
RE(|m)'s, <m|'s, action sym's)
where RE(...) is a pure operator/action symbol regular expression.
As an example, taking the input n v n p n p n, we can perform the
following derivation:
S |- <0| d n NP_F
NP_F |- A B d v n NP_F
NP_F |- A D d p n NP_F
NP_F |- A D d p n NP_F
NP_F |- A C a |0>
>From this, the pure operator/action symbol expression can be extracted:
<0| d A B d A D d A D d A C a |0>
where
A = (|3> f e)*
B = |0><0| <1|
C = (|1> b + |2> f c)
D = <3| + C<2|
then encodes the set of all parse sequences of the input. It can be
finite (and will be if the grammar was originally non-cyclic), or infinite.
In fact, not only could it be infinite, but it could be non-regular: a
context free expression in its own right.
This expression can be successively reduced, by operator cancellation,
e.g.:
(1) <0| d A B = <0| d (|3> f e)* B
= d <0| B + d <0| |3> f e (|3> f e)* B
= d <0| B
= d <0| |0><0| <1|
= d <0|<1|
(2) <0| d A B d A D = d <0|<1| d (|3> f e)* D
= d d <0|<1| D
= d d <0|<1| (<3| + (|1> b + |2> f c)<2|)
= d d <0|<1|<3| + d d b <0|<2|
= d d <0| (<1|<3| + b <2|)
The first reduction, (1), in the sequence above is deterministic. The second,
(2), leads to a decision point where one can choose <1|<3| or d<2|. Places
like this are where ambiguity can be dealt with.
Though it may not be obvious to the unaided eye, ultimately this expression
will reduce to the following:
dd ((dfeb + bdfc) d + bddfe) fca + ddd (fed + dfe) feba
which is the 5 element set:
dddfebdfca + ddbdfcdfca + ddbddfefca + dddfedfeba + ddddfefeba
corresponding to the 5 possible parses of the input. Each one of these
is a compact representation of the reduction sequence for the parse.
For instance, corresponding to the first expression is:
d: n v n p n p n <- NP v n p n p n
d: NP v n p n p n <- NP v NP p n p n
d: NP v NP p n p n <- NP v NP p NP p n
f: NP v NP p NP p n <- NP v NP PP p n
e: NP v NP PP p n <- NP v NP p n
b: NP v NP p n <- NP VP p n
d: NP VP p n <- NP VP p NP
f: NP VP p NP <- NP VP PP
c: NP VP PP <- NP VP
a: NP VP <- S
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.