GLR(k) parsing using Quantum Grammars

mark@freenet.uwm.edu (Mark Hopkins)
Fri, 18 Feb 1994 05:21:47 GMT

          From comp.compilers

Related articles
parsers for ambiguous grammars? dekker@dutiws.twi.tudelft.nl (1994-02-16)
GLR(k) parsing using Quantum Grammars mark@freenet.uwm.edu (1994-02-18)
| List of all articles for this month |
Newsgroups: comp.compilers
From: mark@freenet.uwm.edu (Mark Hopkins)
Keywords: parse
Organization: Compilers Central
References: 94-02-099
Date: Fri, 18 Feb 1994 05:21:47 GMT

Rene Dekker <dekker@dutiws.twi.tudelft.nl> asks:
>I am looking for parsing techniques that can handle a highly ambiguous
>grammar and are able to generate partial parses. I am aware that
>chart-parsers can do such a thing, but I am looking for other techniques
>also...


To follow up on the reference to the Tomita algorithm, I'm also working
on application of Quantum Parsing for handling ambiguous grammars.


Refer to the previous articles "bottom up parsing by hand..." for more
background information.


For illustration, this is my pet toy natural language grammar:


S -> NP VP
NP -> NP PP | c S | n
VP -> v NP | VP PP
PP -> p NP


Introducing a set of action symbols to capture the node formation process,
this can be rewritten in the form:


                S = NP VP a
                VP = v NP b + VP PP c
                NP = n d + NP PP e
                PP = p NP f


After operators are inserted:


                S = <0| NP |0> <0| VP |0> a
                VP = v <1| NP |1> b + VP <2| PP |2> c
                NP = n d + NP <3| PP |3> e
                PP = p NP f


The action symbols are a, b, c, d, e, and f. The input symbols are
v, n, and p. Recall that the sets of input symbols (X), actions symbols
(Y), and operator symbols (D) commute:


                    x y = y x, x d = d x, y d = d y for all x in X, y in Y, d in D


      From the syntax above, we get the following Quantum Grammar:


                S = <0| NP
                VP = v <1| NP
                VP_F = |0> a + <2| PP
                NP = n d NP_F
                NP_F = |0><0| VP + |1> b VP_F + <3| PP + f PP_F
                PP = p NP
                PP_F = |2> c VP_F + |3> e NP_F


This can be simplified to the following:


                S = <0| n d NP_F
                NP_F = |0><0|<1| v n d NP_F + <3| p n d NP_F
                          + (|1> b + |2> f c) (a |0> + <2| p n d NP_F)
                          + |3> f e NP_F


and converted to the form:


                S = <0| n d NP_F
                NP_F = (|3> f e)* |0><0|<1| v n d NP_F
                          + (|3> f e)* (<3| + (|1> b + |2> f c) <2|) p n d NP_F
                          + (|3> f e)* (|1> b + |2> f c) a |0>


using regular expression algebra, and with sharing of common terms, to:


                S = <0| d n NP_F
                NP_F = A B d v n NP_F
                          + A D d p n NP_F
                          + A C a |0>
                where
                      A = (|3> f e)*
                      B = |0><0| <1|
                      C = (|1> b + |2> f c)
                      D = <3| + C<2|


This form is what I refer to as quasi-deterministic. Every term on
the right hand side is of the form:


                    RE(|m>'s, <m|'s, action sym's) (1 or more input syms) Q
                    RE(|m)'s, <m|'s, action sym's)


where RE(...) is a pure operator/action symbol regular expression.


      As an example, taking the input n v n p n p n, we can perform the
following derivation:


                    S |- <0| d n NP_F
              NP_F |- A B d v n NP_F
              NP_F |- A D d p n NP_F
              NP_F |- A D d p n NP_F
              NP_F |- A C a |0>


>From this, the pure operator/action symbol expression can be extracted:


                  <0| d A B d A D d A D d A C a |0>
                  where
                        A = (|3> f e)*
                        B = |0><0| <1|
                        C = (|1> b + |2> f c)
                        D = <3| + C<2|


then encodes the set of all parse sequences of the input. It can be
finite (and will be if the grammar was originally non-cyclic), or infinite.
In fact, not only could it be infinite, but it could be non-regular: a
context free expression in its own right.


      This expression can be successively reduced, by operator cancellation,
e.g.:


    (1) <0| d A B = <0| d (|3> f e)* B
                                      = d <0| B + d <0| |3> f e (|3> f e)* B
                                      = d <0| B
                                      = d <0| |0><0| <1|
                                      = d <0|<1|


    (2) <0| d A B d A D = d <0|<1| d (|3> f e)* D
                                                  = d d <0|<1| D
                                                  = d d <0|<1| (<3| + (|1> b + |2> f c)<2|)
                                                  = d d <0|<1|<3| + d d b <0|<2|
                                                  = d d <0| (<1|<3| + b <2|)


The first reduction, (1), in the sequence above is deterministic. The second,
(2), leads to a decision point where one can choose <1|<3| or d<2|. Places
like this are where ambiguity can be dealt with.


Though it may not be obvious to the unaided eye, ultimately this expression
will reduce to the following:


                dd ((dfeb + bdfc) d + bddfe) fca + ddd (fed + dfe) feba


which is the 5 element set:


            dddfebdfca + ddbdfcdfca + ddbddfefca + dddfedfeba + ddddfefeba


corresponding to the 5 possible parses of the input. Each one of these
is a compact representation of the reduction sequence for the parse.
For instance, corresponding to the first expression is:


                          d: n v n p n p n <- NP v n p n p n
                          d: NP v n p n p n <- NP v NP p n p n
                          d: NP v NP p n p n <- NP v NP p NP p n
                          f: NP v NP p NP p n <- NP v NP PP p n
                          e: NP v NP PP p n <- NP v NP p n
                          b: NP v NP p n <- NP VP p n
                          d: NP VP p n <- NP VP p NP
                          f: NP VP p NP <- NP VP PP
                          c: NP VP PP <- NP VP
                          a: NP VP <- S
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.