Re: LL(1) vs LALR(1) parsers

ok@cs.rmit.edu.au (Richard A. O'Keefe)
Wed, 29 Nov 1995 07:10:12 GMT

From comp.compilers

Related articles
[7 earlier articles]
Re: LL(1) vs LALR(1) parsers bill@amber.ssd.hcsc.com (1995-11-22)
Re: LL(1) vs LALR(1) parsers elliottc@logica.com (1995-11-24)
Re: LL(1) vs LALR(1) parsers jgj@ssd.hcsc.com (1995-11-28)
Re: LL(1) vs LALR(1) parsers will@ccs.neu.edu (1995-11-28)
Re: LL(1) vs LALR(1) parsers ddean@dynastar.cs.princeton.edu (1995-11-28)
Re: LL(1) vs LALR(1) parsers napi@ms.mimos.my (1995-11-28)
*Re: LL(1) vs LALR(1) parsers ok@cs.rmit.edu.au* (1995-11-29)**
Re: LL(1) vs LALR(1) parsers mparks@oz.net (1995-11-29)
Re: LL(1) vs LALR(1) parsers jmccarty@spdmail.spd.dsccc.com (1995-11-29)
Re: LL(1) vs LALR(1) parsers pardo@cs.washington.edu (1995-11-29)
Re: LL(1) vs LALR(1) parsers CSPT@giraffe.ru.ac.za (Pat Terry) (1995-11-30)
Re: LL(1) vs LALR(1) parsers gvcormac@plg.uwaterloo.ca (Gord Cormack) (1995-12-01)
Re: LL(1) vs LALR(1) parsers bridges@cs.arizona.edu (1995-12-01)
[9 later articles]

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	ok@cs.rmit.edu.au (Richard A. O'Keefe)
Keywords:	parse, LALR, LL(1)
Organization:	Comp Sci, RMIT, Melbourne, Australia
References:	95-11-051 95-11-138 95-11-195
Date:	Wed, 29 Nov 1995 07:10:12 GMT

bill@amber.ssd.hcsc.com (Bill Leonard) writes:
>I don't think it is intellectual bigotry (to use automated parsers vs.
>hand-crafted ones), I think it's just common sense. We have automated
>means of generating parsers, why not use them?

Because what "we" typically have is yacc.
Yacc was wonderful for a PDP-11. I have encouraged several bright students
to use it in projects, and every time it has taken them a long time to
learn how to use it. Yacc sees subtle connections between parts of your
grammar that the programmer finds hard to detect and which are not relevant
to a recursive descent parser. In three student one semester projects, a
masters thesis, and some work of my own, it has taken FAR longer to hack a
grammar around to the point where Yacc would accept it than to write a
recursive descent parser. In a couple of cases where I've tried it, the
recursive descent parser has been smaller in terms of number of lines of
source code, there have been no tables, and the parser has been faster.

>If I were in charge of a compiler project (and I have been), I'd much
>rather devote engineering resources to making the generated code better or
>the compiler faster or any of a number of other improvements than in coding
>a parser.

So would I. That's why I've reverted to writing recursive descent parsers.

(Having several parsers in one program is also much easier when you don't
use Yacc.)

>For certain compilers, there may be good reasons not to use an automated
>parser, but in general I'd recommend using a parser generator. Why
>reinvent the wheel?

Because the most widely available wheel is pretty dreadful.

>Mr. Simmons says recursive descent parsers are a "small matter of
>programming". How small? If it takes more than a couple of hours to
>write, it's too much, because it will also take up time later on for others
>to learn and fix.

In practice, this counts as an argument _against_ using a parser generator.
I find that writing a recursive descent parser takes less time than hacking
a once clear grammar to the point where Yacc will accept it, EVEN NOW THAT
I KNOW YACC. It seems to take _bright_ students >10 hours to learn how to
use Yacc in a basic sort of way.

>All projects have a fixed amount of programmer
>resources, so any time spent on a parser is that much less spent elsewhere.

>In my opinion, parser generators have (at least) the following advantages
>over hand-crafted parsers:

> 1. It is easier to find the semantic action code -- it isn't buried
> amongst the parsing. This makes for easier maintenance.

There are three things required:
syntax
static semantics
intermediate structure building
Yacc provides no real support for static semantics, so it is still buried
as executable form amongst the code.

> 2. It is easier to change the language.

But the structure of a recursive descent parser is so very close to being
a grammar that it is not hard to change the language.

>I've heard many people say that hand-crafted parsers do better at error
>recovery. That may be true if you're comparing to yacc, but there are
>other parser generators that do better at error recovery, many of which
>have features for tuning the error recovery.

Could I suggest that one of the most valuable things that someone could
do for this newsgroup would be to take 10 parser generators (including
a couple of members of the Yacc family, PCCTS, Cocktail, and maybe a
couple of others) and write a really insightful review. The things to
cover are
        - given a bright MSc student who already understands the basic
            concepts of context free grammars and static semantic checking,
            how long does it take the student to learn the tool well enough
            to use it to write a parser for ISO Pascal Extended (to name a
            syntax-full language with no _intentional_ gotchas)

        - how long does it take to write the first full draft of the grammar
            (time the student and also someone reasonable experienced)

        - how long does it take to _test_ and debug the grammar (and what
            support does the tool provide for testing?)

        - how large is the grammar with respect to the original reference?

        - how much space does the parser take?

        - how fast does the parser run (on a selection of machines, say a
Pentium, a current SPARC, an ALPHA)

        - anything else, e.g. does the parser generator type check attributes?
(Yacc doesn't.)

For what it's worth, I've started writing the C grammar as a Prolog program.
The program is a Definite Clause Grammar. At the same time it is a
recursive descent parser or a left corner parser, whichever I want.
This covers
syntax
static semantics (symbol table, type checking)
building intermediate structure
It is not restricted to LALR(1) grammars. It can handle the
"object identifier / type identifier" ambiguities in C naturally.
It can even be automatically turned into a program that generates
test data (there are Yacc-like tools for that, but this is the _same_
grammar being used to generate test cases as for parsing). With a few
extra annotations, it should be a legal Mercury program (I won't be sure
until I've finished), and Mercury generates very good code. It will be
interesting to see how well the program performs.

>Furthermore, hand-crafted parsers only do better if you spend a
>considerable amount of time tweaking it.

This is a blanket assertion requiring more than just the author's
say-so. The authors of GNAT would I think disagree.

There are so many parser generators out there that I haven't the resources
to evaluate them all. I have seen some very good ones in the literature.
But what most people _get_ is some sort of Yacc, which is not wonderful.

I did look at <package X>, but when I ran the code through Lint and
strict ANSI checking I got so many complaints that I decided not to
bother any more. A pity, because it had some nice theory behind it.

--
Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: LL(1) vs LALR(1) parsers

ok@cs.rmit.edu.au (Richard A. O'Keefe)Wed, 29 Nov 1995 07:10:12 GMT

ok@cs.rmit.edu.au (Richard A. O'Keefe)
Wed, 29 Nov 1995 07:10:12 GMT