Fri, 11 Dec 2009 09:43:37 -0800

Related articles |
---|

LALR parsing alinsoar@voila.fr (A. Soare) (2009-12-04) |

Re: LALR parsing torbenm@diku.dk (Torben AEgidius Mogensen) (2009-12-07) |

Re: LALR parsing alinsoar@voila.fr (A. Soare) (2009-12-09) |

Re: LALR parsing Danny.Dube@ift.ulaval.ca (2009-12-09) |

Re: LALR parsing torbenm@diku.dk (2009-12-10) |

Re: LALR parsing rsc@swtch.com (Russ Cox) (2009-12-11) |

Re: LALR parsing torbenm@diku.dk (2009-12-14) |

Re: LALR parsing ott@mirix.org (Matthias-Christian ott) (2009-12-14) |

Re: LALR parsing cfc@shell01.TheWorld.com (Chris F Clark) (2009-12-25) |

From: | Russ Cox <rsc@swtch.com> |

Newsgroups: | comp.compilers |

Date: | Fri, 11 Dec 2009 09:43:37 -0800 |

Organization: | Compilers Central |

References: | 09-12-007 09-12-009 09-12-018 09-12-019 |

Keywords: | LALR, parse, theory |

Posted-Date: | 12 Dec 2009 12:18:10 EST |

*>> B author = {Lee, Lillian},*

*>> B title = {Fast context-free grammar parsing requires fast*

*>> B B B B B boolean matrix multiplication},*

*>*

*> I read the abstract of that paper some time ago, and as far as I recall,*

*> the essense is that you can multiply two sqrt(n) x sqrt(n) binary*

*> matrices in the time it takes to parse a length-n string. B So if you can*

*> multiply binary matrices in O(n^2) time, you _might_ be able to parse in*

*> linear time.*

I think this came across weaker than intended: the _might_ here is

formally a negation. That is, since fast parsing implies fast matrix

multiplication, inability to do fast matrix multiplication implies

inability to do fast parsing. So unless you can multiply binary

matrices in something approaching quadratic time, you _cannot_ parse

in linear time.

The abstract fills in the details better than I can:

In 1975, Valiant showed that Boolean matrix multiplication can be

used for parsing context-free grammars (CFGs), yielding the

asymptotically fastest (although not practical) CFG parsing

algorithm known. We prove a dual result: any CFG parser with time

complexity O(g*n^(3-epsilon)), where g is the size of the grammar

and n is the length of the input string, can be efficiently

converted into an algorithm to multiply m x m Boolean matrices in

time O(m^(3-epsilon/3)). Given that practical, substantially

subcubic Boolean matrix multiplication algorithms have been quite

difficult to find, we thus explain why there has been little

progress in developing practical, substantially subcubic general

CFG parsers. In proving this result, we also develop a

formalization of the notion of parsing.

The more precise version of the statement above is therefore:

Unless you can do matrix multiplication in O(m^(2 1/3)) time,

you cannot do context free parsing in O(g n) time.

Getting back to the original post, the connection between matrix

multiplication and parsing applies to general CFGs, not LALR(1),

which are a significantly restricted subset.

There are many aspects to writing a fast LALR(1) parser, but none of

them derive from the matrix multiplication connection: in this

context, general context-free parsers solve a needlessly hard problem.

LALR(1) parsing runs in linear time in the size of the input already,

and since you have to process the input, you can't do better. Actual

implementations may have different constant factors, of course, but

most papers talking about efficient LALR implementations mean the

preprocessing that converts the grammar into the LALR tables used

during the parse. That step is superlinear in the size of the grammar

and can be significant for large grammars. On the other hand, if

you're writing a tool like yacc, it may not matter at all: in my own

builds, the compilation of the rest of the program always takes much

longer than yacc's table generation. (It does matter a lot if you're

processing new grammars on the fly at run time; the original post

doesn't say.)

Russ

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.