Related articles |
---|
LALR parsing alinsoar@voila.fr (A. Soare) (2009-12-04) |
Re: LALR parsing torbenm@diku.dk (Torben AEgidius Mogensen) (2009-12-07) |
Re: LALR parsing alinsoar@voila.fr (A. Soare) (2009-12-09) |
Re: LALR parsing Danny.Dube@ift.ulaval.ca (2009-12-09) |
Re: LALR parsing torbenm@diku.dk (2009-12-10) |
Re: LALR parsing rsc@swtch.com (Russ Cox) (2009-12-11) |
Re: LALR parsing torbenm@diku.dk (2009-12-14) |
Re: LALR parsing ott@mirix.org (Matthias-Christian ott) (2009-12-14) |
Re: LALR parsing cfc@shell01.TheWorld.com (Chris F Clark) (2009-12-25) |
From: | Russ Cox <rsc@swtch.com> |
Newsgroups: | comp.compilers |
Date: | Fri, 11 Dec 2009 09:43:37 -0800 |
Organization: | Compilers Central |
References: | 09-12-007 09-12-009 09-12-018 09-12-019 |
Keywords: | LALR, parse, theory |
Posted-Date: | 12 Dec 2009 12:18:10 EST |
>> B author = {Lee, Lillian},
>> B title = {Fast context-free grammar parsing requires fast
>> B B B B B boolean matrix multiplication},
>
> I read the abstract of that paper some time ago, and as far as I recall,
> the essense is that you can multiply two sqrt(n) x sqrt(n) binary
> matrices in the time it takes to parse a length-n string. B So if you can
> multiply binary matrices in O(n^2) time, you _might_ be able to parse in
> linear time.
I think this came across weaker than intended: the _might_ here is
formally a negation. That is, since fast parsing implies fast matrix
multiplication, inability to do fast matrix multiplication implies
inability to do fast parsing. So unless you can multiply binary
matrices in something approaching quadratic time, you _cannot_ parse
in linear time.
The abstract fills in the details better than I can:
In 1975, Valiant showed that Boolean matrix multiplication can be
used for parsing context-free grammars (CFGs), yielding the
asymptotically fastest (although not practical) CFG parsing
algorithm known. We prove a dual result: any CFG parser with time
complexity O(g*n^(3-epsilon)), where g is the size of the grammar
and n is the length of the input string, can be efficiently
converted into an algorithm to multiply m x m Boolean matrices in
time O(m^(3-epsilon/3)). Given that practical, substantially
subcubic Boolean matrix multiplication algorithms have been quite
difficult to find, we thus explain why there has been little
progress in developing practical, substantially subcubic general
CFG parsers. In proving this result, we also develop a
formalization of the notion of parsing.
The more precise version of the statement above is therefore:
Unless you can do matrix multiplication in O(m^(2 1/3)) time,
you cannot do context free parsing in O(g n) time.
Getting back to the original post, the connection between matrix
multiplication and parsing applies to general CFGs, not LALR(1),
which are a significantly restricted subset.
There are many aspects to writing a fast LALR(1) parser, but none of
them derive from the matrix multiplication connection: in this
context, general context-free parsers solve a needlessly hard problem.
LALR(1) parsing runs in linear time in the size of the input already,
and since you have to process the input, you can't do better. Actual
implementations may have different constant factors, of course, but
most papers talking about efficient LALR implementations mean the
preprocessing that converts the grammar into the LALR tables used
during the parse. That step is superlinear in the size of the grammar
and can be significant for large grammars. On the other hand, if
you're writing a tool like yacc, it may not matter at all: in my own
builds, the compilation of the rest of the program always takes much
longer than yacc's table generation. (It does matter a lot if you're
processing new grammars on the fly at run time; the original post
doesn't say.)
Russ
Return to the
comp.compilers page.
Search the
comp.compilers archives again.