15 May 1998 22:38:24 -0400

Related articles |
---|

A new? table compression technique for LR parsers torbenm@diku.dk (Torben Mogensen) (1998-05-07) |

Re: A new? table compression technique for LR parsers vmakarov@cygnus.com (Vladimir N. Makarov) (1998-05-12) |

Re: A new? table compression technique for LR parsers paulmann@thegrid.net (Paul Mann) (1998-05-15) |

From: | "Paul Mann" <paulmann@thegrid.net> |

Newsgroups: | comp.compilers |

Date: | 15 May 1998 22:38:24 -0400 |

Organization: | Call America Internet Services +1 (800) 563-3271 |

References: | 98-05-031 |

Keywords: | parse |

Torben Mogensen wrote in message 98-05-031...

*>While toying with LR parsing, I stumbled on a simple idea for table*

*>compression in LR parsers. A brief search of literature did not reveal*

*>any previous mention of the methods, so it may be new. If you have*

*>heard about it before, please tell me.*

Very interesting idea. OK, you save space by compressing the parser

matrix, but ... you add space by storing all the productions in the

parser tables. You slow down the apply action for reductions and if

you have and error, and want to do error recovery, now you have a

mess, right?, because you have to backtrack.

The theoretical proof for the correctness of this technique would be

very challenging. I think it might be possible to end up in a state

that does a reduction and checks the symbols on the parse stack and

continues parsing WITHOUT realizing that the wrong reduction was made.

Sometimes the right sides of a grammar rule are IDENTICAL but the head

symbol (nonterminal) is not the same. The nonterminal would not be on

the parse stack and, uh oh, you've got a problem. Not you have to

figure out if your grammar is ambiguous or your compressed parser

matrix has introduced ambiguity into the parser.

I implemented LALR rather than LR(1) as follows.

I used the compression technique from the book, "Compiler

Construction" by Waite and Goos, Springer-Verlag, which does not give

enough informa- tion for complete implementation. So, I also

recommend a paper on Optimization of Parser Tables for Portable

Compilers, by Dencker and 3 others.

You make a bit matrix with a one for a transition and a zero for error

and a zero for a reduction. Most of the states in an LALR parser can

make a default reduction anyway, so I have a separate reduction

matrix, which is small and only necessary for those state that have

more than production being recognized.

So from this original parser matrix with the reductions removed:

| id + * ( ) $ | E

--+------------------------+---

0 | s3 s2 | 1

1 | s4 s5 |

2 | s3 s2 | 6

3 | |

4 | s3 s2 | 7

5 | s3 s2 | 8

6 | s4 s5 s9 |

7 | s5 |

8 | |

9 | |

--+------------------------+---

You get this bit matrix:

| id + * ( ) $ | E

--+------------------------+---

0 | 1 0 0 1 0 0 |

1 | 0 1 1 0 0 0 |

2 | 1 0 0 1 0 0 |

3 | 0 0 0 0 0 0 |

4 | 1 0 0 1 0 0 |

5 | 1 0 0 1 0 0 |

6 | 0 1 1 0 1 0 |

7 | 0 0 1 0 0 0 |

8 | 0 0 0 0 0 0 |

9 | 0 0 0 0 0 0 |

--+------------------------+---

Then all ther error cells in the parser matrix become "don't care".

Now you can go back to the parser matrix and merge all states with

non-conflicting transitions. You can also merge all columns. You can

also compress the bit matrix the same way, merging all rows that are

similar and then all column that are similar.

You still don't have to store the production symbols in the tables.

You still stop at the first error, and you still have your quick

reductions.

And ... if you want to see how small the tables can be ... you can use

the free demo version of an LALR parser generator from LALR Systems.

There are 18 BNF grammars available for download ranging from small to

huge. It's at: http://www.lalr.com/

Paul Mann

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.