Fri, 25 Dec 2009 14:18:23 -0500

Related articles |
---|

[2 earlier articles] |

Re: LALR parsing alinsoar@voila.fr (A. Soare) (2009-12-09) |

Re: LALR parsing Danny.Dube@ift.ulaval.ca (2009-12-09) |

Re: LALR parsing torbenm@diku.dk (2009-12-10) |

Re: LALR parsing rsc@swtch.com (Russ Cox) (2009-12-11) |

Re: LALR parsing torbenm@diku.dk (2009-12-14) |

Re: LALR parsing ott@mirix.org (Matthias-Christian ott) (2009-12-14) |

Re: LALR parsing cfc@shell01.TheWorld.com (Chris F Clark) (2009-12-25) |

From: | Chris F Clark <cfc@shell01.TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | Fri, 25 Dec 2009 14:18:23 -0500 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 09-12-007 09-12-009 09-12-018 09-12-019 09-12-021 09-12-024 09-12-027 |

Keywords: | LALR, parse, theory |

Posted-Date: | 25 Dec 2009 15:36:45 EST |

First, of all, let me apologise in advance for commenting on things

where I am "out of my depth". In addition, I have not read or studied

this particular aspect in a while, so my failing memory may introduce

additional errors.

You may want to look at the paper by Park, Choe, & Chang. They talk

about efficiently computing the lookahead sets needed in constructing

LALR tables. Googling them I found a related paper by Frank Ives that

may be of interest.

In any case, computing the lookahead sets requires computing

transitive closures. I believe there are some algorithms that are

asymptotitcally efficient that are based upon modifications of matrix

multiplication. (They remind me how certain Galois fields, GF(2**k),

can be mapped to carry-less multiplication using shifts and xors.)

Thus, I think if you read the literature, you will find an algorithm

for efficiently computing LALR tables using matix muliplication over a

Boolean field to compute certain required sets.

However, as others have said, this has no impact on the efficiency of

the generated parser, which is why I haven't studied it in more depth.

Even naive algorithms from computing the lookahead sets (and thus

performing conflict resolution) yield the same LALR tables. In fact,

there is a whole family of tables that are essentially the same (all

based on the SLR(0) tables) and the same size, the only distinction

being what reduce actions are inserted (how conflicts are resolved).

The differences in these tables are not the speed at which they parse,

but the ability to distinguish more subtle language cases (i.e. there

are languages which are LALR that are not SLR, and the only

distinction is that the LALR tables make finer conflict resolutions

than the SLR algorithm does, i.e. SLR declares a conflict and LALR

inserts reduce actions that allow it to parse the language).

If you are interested in making a more efficient parser (rather than a

more efficient parser table generator), it is hard to beat the

aymptoptic efficiency of LR style tables. The resulting parser is

linear in the input size and you can't do better than that, because

one has to look at every input symbol at least once to correctly

distinguish all cases.

That being said, one can affect the constant multiplier, how many

operations are done to process each input symbol. There isn't a lot

of research into this, since it isn't generally a theoretical issue.

Moreover, low level issues often overwhelm the theoretical ones. The

best performancs I've seen mentionied in a paper is from Tom Penello's

work on very fast LR parsing--essentially one generates an assmebly

language program them implements the tables. That's not that

theoretically interesting, but it is a very pragmatic solution.

On a more theoretical bent, one can eliminate many of the push actions

in an LR parser. You don't have to keep all the symbols on the stack.

You still have to read them, but your shift action can simply drop

them on the floor. There was a paper written up on this and the

resulting tool was called either Mule or Ox--I can't recall which for

certain. The other tool adds attribute grammar processing to yacc.

We use a similar technique in Yacc++ to help us solve the ELR problem.

In ELR parsing, one has variable length right-hand-sides (RHSs) and

one needs to pop a variable number of them when one reduces. That are

a variety of solutions to this problem, but the one which we use[d],

is to mark the starts of productions with pushes on a special stack.

This stack has the same flavor as the one mentioned in the preceeding

paragraph.

At the practical level, the speed of the resulting parser isn't a

particularly large issue. It is almost always drawfed by the speed of

the lexer. The reason being obvious, for a parser, you are doing

operations at the token level, at the lexer you are doing operations

at the character level, unless many of your tokens are only 1

character, the lexer speed is magnified in importance by mutliplying

by the average token length over the parser speed.

In either case, the lexer or the parser, you can get the cost down to

a few instructions per symbol, and for most cases get your entire set

of tables into a modern x86 procesors cache, such that you are

effectively bounded only by how fast one can read the input into the

computer.

There are places where automata performance at the low level matters,

such as network security (e.g. virus scanning). However, it isn't

garden variety parsing. We already know how to get the cost down to a

few machine instructions per character.

Moreover, don't be surprised when there are hardware implementations

of various FA algorithms that run at 1 byte per cycle as part of

"standard PCs". These machines are needed for the above mentioned

security applications. Eventually, you will need this security on

every device that can talk to any network. Thus, this hardware will

be everywhere. When that happens, the hardware will be there to do

lexing and parsing "for free"--i.e. as you read the text off the disk

(or network or whatever), the text will be lexed and parsed as fast as

the bytes are streamed in.

Hope this helps,

-Chris

******************************************************************************

Chris Clark email: christopher.f.clark@compiler-resources.com

Compiler Resources, Inc. Web Site: http://world.std.com/~compres

23 Bailey Rd voice: (508) 435-5016

Berlin, MA 01503 USA twitter: @intel_chris

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.