Re: Compiler Compiler Compiler

"Ira D. Baxter" <>
10 Apr 2001 01:18:09 -0400

          From comp.compilers

Related articles
[9 earlier articles]
Re: Compiler Compiler Compiler (2001-03-27)
Re: Compiler Compiler Compiler (2001-03-31)
Re: Compiler Compiler Compiler (Matthias Blume) (2001-03-31)
Re: compiler compiler compiler (Toon Moene) (2001-03-31)
Re: Compiler Compiler Compiler (Joachim Durchholz) (2001-04-04)
Re: compiler compiler compiler (2001-04-04)
Re: Compiler Compiler Compiler (Ira D. Baxter) (2001-04-10)
Re: Compiler Compiler Compiler (Chris F Clark) (2001-04-10)
Re: Compiler Compiler Compiler (2001-04-10)
| List of all articles for this month |

From: "Ira D. Baxter" <>
Newsgroups: comp.compilers
Date: 10 Apr 2001 01:18:09 -0400
Organization: Posted via Supernews,
References: 01-03-095 01-03-122 01-04-026
Keywords: tools, yacc
Posted-Date: 10 Apr 2001 01:18:09 EDT

"Joachim Durchholz" <> wrote in message
> Mike Dimmick <> wrote:
> > An LALR(1) grammar tool produces its syntax trees from the
> > bottom up; it therefore processes its input keeping as many
> > possibilities in mind (the set of possible reductions) as it can, then
> > selecting one when the input becomes unambiguous.

Tomita-style parsers are based on LALR grammars. They keep *all*
possible parses, ambiguous or not, without backtracking. If your
grammar is ambiguous, you get multiple parses. If your grammar is
"locally ambiguous" (i.e., requires an arbitrary amount of lookahead
to resolve), eventually it chooses the right reductions and discards
the rest.

We used this in our DMS Reengineering Toolkit to make it "easy" to
define parsers for difficult languages. It means you can parse
languages like C++ *without* haveing to do all the type resolution
stuff up front. (There's other complications due to macros, but
that's a different topic).

>> The introduction of
> > these semantic actions causes problems - the generator cannot know
> > which rule is to be reduced, so should it execute the action or not?

You don't execute side-effecting semantic actions in Tomita parsers,
because the particular parse-path might not survive. You delay such
actions for a post-parse tree walk, which is still pretty cheap.

However, you can use semantic actions on proposed reductions to rid
yourself of parses that won't go anywhere if you have additional
semantic information.

We use this to recover the nesting structure of FORTRAN loops during
                                DO m I=...
                                              DO n J= ....
                                k CONTINUE
                                z CONTINUE

If n=k, and m=z, you get one loop nesting. If m=n=k, you get another
loop nesting. The grammar proposes both, because it is clueless about
the possible values. During parsing, the integer values of m and n
are directly available, and so the proposals to reduce both loops to
either sharing/not sharing the terminal continue are easily resolved
using a semantic check.

> > It seems to me, after some study, that although LR(k) grammars are
> > stronger, LL(k) parsers tend to be faster. [...]
> > Users of your language will thank you too, because the compiler
> > should go one heck of a lot faster if it doesn't have to backtrack.

> Here's a question: does anybody know whether LL parsers are faster than
> LR parsers? I wouldn't expect any differences, but I never had the
> opportunity to run a direct comparison.

The speed of the parser is significantly affected by the speed of the
lexer. LL parsers "can be faster", if you insist on comparing apples
to oranges, by comparing implemented-as-recursive-calls LL engines vs.
implemented-as-table-lookup LR engines. But you can compile the
"state table lookup" mechanisms for LALR (and therefore Tomita
parsers) to machine code if you really want, and get lightning fast
parsers that way too. There's an old SIGPLAN paper on exactly how to
do that, by Pennello, I think, back in the 80s.

Tomita parsers do significantly more managment of the parse stack; if
fact, they have to manage a parse "DAG". For grammars having lots of
local ambiguity (consider the FORTRAN grammer applied to a qaudruply
nested loop... lots of possibilities), you can run into noticeable

OTOH, in the overall scheme of things, the parsing time really doesn't
seem to be a big problem. We parse a million lines of Java in about
15 minutes, including building ASTs for the whole thing. Not
particularly fast, but fine for something on that scale.

> There are other differences that might be more relevant.
> For example, LL parsers have been consistently reported to produce
> better error reporting.

I think better error reporting is a matter of how much energy you put
into the error reporting mechanism. A trick I've heard about is
to run 2 parsers, one N tokens behind the the other. When
the lead parser hits a problems, you now have N tokens of left
context which you can do anything you want want to guess
at a decent repair.

We simply walk the LALR parse tables, looking for a token
that will allow us to continue parsing with the current lookahead,
or, failing to find one, toss out the lookahead.
This catches a lot of problems, and is pretty easy to code.
Ira D. Baxter, Ph.D. CTO Semantic Designs, Inc.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.