A new? table compression technique for LR parsers

Torben Mogensen <torbenm@diku.dk>
7 May 1998 16:50:43 -0400

          From comp.compilers

Related articles
A new? table compression technique for LR parsers torbenm@diku.dk (Torben Mogensen) (1998-05-07)
Re: A new? table compression technique for LR parsers vmakarov@cygnus.com (Vladimir N. Makarov) (1998-05-12)
Re: A new? table compression technique for LR parsers paulmann@thegrid.net (Paul Mann) (1998-05-15)
| List of all articles for this month |

From: Torben Mogensen <torbenm@diku.dk>
Newsgroups: comp.compilers
Date: 7 May 1998 16:50:43 -0400
Organization: Compilers Central
Keywords: parse, yacc

While toying with LR parsing, I stumbled on a simple idea for table
compression in LR parsers. A brief search of literature did not reveal
any previous mention of the methods, so it may be new. If you have
heard about it before, please tell me.

The idea is that two states in the parse table can be joined if for
each symbol (terminal or nonterminal) they either have the same action
or one of them has an error action. When joining the states, the first
case yields the common action and the second case yields the non-error
action. An example (taken from The Dragon Book) is

                    | id + * ( ) $ | E
                0 | s3 s2 | 1
                1 | s4 s5 acc|
                2 | s3 s2 | 6
                3 | r4 r4 r4 r4|
                4 | s3 s2 | 7
                5 | s3 s2 | 8
                6 | s4 s5 s9 |
                7 | r1 s5 r1 r1|
                8 | r2 r2 r2 r2|
                9 | r3 r3 r3 r3|

We can combine:

    0, 1 and 6 to a new state 0,
    2 and 3 to a new state 1,
    4 and 7 to a new state 2,
    5 and 8 to a new state 3,
    9 to a new state 4.

This gives

                    | id + * ( ) $ | E
                0 | s1 s2 s3 s1 s4 acc| 0
                1 | s1 r4 r4 s1 r4 r4| 0
                2 | s1 r1 r1 s1 r1 r1| 2
                3 | s1 r2 r2 s1 r2 r2| 3
                4 | r3 r3 r3 r3|

Obviously, a parser using the new table will take non-error actions
where it should have reported an error. This problem is solved by
letting reduce actions check that the stack actually contains the
correct symbols. Since all reductions done by the parser are correct
reductions, you can only get correct parses.

The problems with this methods are:

  1) You detect errors later than you would using a traditional LR

  2) You need to store the grammar symbols on the stack and check these
        on reductions.

On the other hand, you save a lot of space. Experiments (by hand) with
small parse tables (from chapter 4 in The Dragon Book) yield around
50% compression and I expect it will be better with large tables,
which tends to be sparser than small tables.

If (as is commonly done) precedence declarations are used to eliminate
conflicts, you must be careful about nonassoc declarations, as these
can replace one or more actions by an error action (left and right
associative declarations only eliminate comflicts, so they are no
problem). If, by joining states, we replace the error action
introduced by the nonassoc declaration by a non-error action, we might
accept associative uses of the operator in question. This can be
solved by letting nonassoc declarations introduce a special kind of
error action that can not be merged with a non-error action.

                Torben Mogensen (torbenm@diku.dk)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.