Re: LALR parsing

Russ Cox <>
Fri, 11 Dec 2009 09:43:37 -0800

          From comp.compilers

Related articles
LALR parsing (A. Soare) (2009-12-04)
Re: LALR parsing (Torben AEgidius Mogensen) (2009-12-07)
Re: LALR parsing (A. Soare) (2009-12-09)
Re: LALR parsing (2009-12-09)
Re: LALR parsing (2009-12-10)
Re: LALR parsing (Russ Cox) (2009-12-11)
Re: LALR parsing (2009-12-14)
Re: LALR parsing (Matthias-Christian ott) (2009-12-14)
Re: LALR parsing (Chris F Clark) (2009-12-25)
| List of all articles for this month |

From: Russ Cox <>
Newsgroups: comp.compilers
Date: Fri, 11 Dec 2009 09:43:37 -0800
Organization: Compilers Central
References: 09-12-007 09-12-009 09-12-018 09-12-019
Keywords: LALR, parse, theory
Posted-Date: 12 Dec 2009 12:18:10 EST

>> B author = {Lee, Lillian},
>> B title = {Fast context-free grammar parsing requires fast
>> B B B B B boolean matrix multiplication},
> I read the abstract of that paper some time ago, and as far as I recall,
> the essense is that you can multiply two sqrt(n) x sqrt(n) binary
> matrices in the time it takes to parse a length-n string. B So if you can
> multiply binary matrices in O(n^2) time, you _might_ be able to parse in
> linear time.

I think this came across weaker than intended: the _might_ here is
formally a negation. That is, since fast parsing implies fast matrix
multiplication, inability to do fast matrix multiplication implies
inability to do fast parsing. So unless you can multiply binary
matrices in something approaching quadratic time, you _cannot_ parse
in linear time.

The abstract fills in the details better than I can:

        In 1975, Valiant showed that Boolean matrix multiplication can be
        used for parsing context-free grammars (CFGs), yielding the
        asymptotically fastest (although not practical) CFG parsing
        algorithm known. We prove a dual result: any CFG parser with time
        complexity O(g*n^(3-epsilon)), where g is the size of the grammar
        and n is the length of the input string, can be efficiently
        converted into an algorithm to multiply m x m Boolean matrices in
        time O(m^(3-epsilon/3)). Given that practical, substantially
        subcubic Boolean matrix multiplication algorithms have been quite
        difficult to find, we thus explain why there has been little
        progress in developing practical, substantially subcubic general
        CFG parsers. In proving this result, we also develop a
        formalization of the notion of parsing.

The more precise version of the statement above is therefore:
Unless you can do matrix multiplication in O(m^(2 1/3)) time,
you cannot do context free parsing in O(g n) time.

Getting back to the original post, the connection between matrix
multiplication and parsing applies to general CFGs, not LALR(1),
which are a significantly restricted subset.

There are many aspects to writing a fast LALR(1) parser, but none of
them derive from the matrix multiplication connection: in this
context, general context-free parsers solve a needlessly hard problem.

LALR(1) parsing runs in linear time in the size of the input already,
and since you have to process the input, you can't do better. Actual
implementations may have different constant factors, of course, but
most papers talking about efficient LALR implementations mean the
preprocessing that converts the grammar into the LALR tables used
during the parse. That step is superlinear in the size of the grammar
and can be significant for large grammars. On the other hand, if
you're writing a tool like yacc, it may not matter at all: in my own
builds, the compilation of the rest of the program always takes much
longer than yacc's table generation. (It does matter a lot if you're
processing new grammars on the fly at run time; the original post
doesn't say.)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.