Re: Yacc poll

ihnp4!m10ux!mnc (52171-Michael Condict--MHx7002 (mh****)m000)
Fri, 14 Aug 87 13:38:41 edt

          From comp.compilers

Related articles
Re: Yacc poll decvax!utzoo!henry (1987-08-06)
Re: Yacc poll pase@ogcvax.UUCP (1987-08-04)
Re: Yacc poll (1987-08-11)
Re: Yacc poll ihnp4!m10ux!mncm000) (1987-08-14)
Re: Yacc poll malcolm@keilir.UUCP (1987-08-14)
Re: Yacc poll jbn@glacier.STANFORD.EDU (1987-08-16)
Re: Yacc poll decvax!utzoo!henry (1987-08-17)
Re: Yacc poll harvard!seismo!sun!tekchips!stevev (Steve Vegdahl) (1987-08-17)
Re: Yacc Poll harvard!rutgers!mcnc!ece-csc!mauney (1987-08-18)
| List of all articles for this month |

Date: Fri, 14 Aug 87 13:38:41 edt
From: ihnp4!m10ux!mnc (52171-Michael Condict--MHx7002 (mh****)m000)
Summary: Writing recursive descent parsers
References: <642@ima.ISC.COM> <378@hubcap.UUCP>

In article <378@hubcap.UUCP>, ("Steve" Stevenson) writes:
> Hand-writing a recursive descent parser makes as much sense as hand-writing
> a LALR(1) parser. Why don't you use yacc-lex to write a simple input
> mimic of yacc, then use a reasonable plagiarism (like from Backhouse's
> book.)

Actually, it makes a lot more sense. The natural control structure for
implementing a recursive-descent parser is a set of mutually recursive
procedures, one per syntactic construct. This is a much simpler structure,
bearing a much more obvious resemblance to the grammar, than the structure
that results from hand-coding a bottom-up parser. In fact, writing
the control structure of a recursive-descent parser from the grammar can be
done as fast as you can type it in, and will result in a program that is
no larger than a yacc grammar. The only difficult part (and it is not
difficult for a well-designed language) is to figure out the set of tokens
that can begin each type of construct, so that you can decide which of
several recursive procedures to call (or whether to return). I've been
able to do this correctly in my head as I typed in the parser for several
programming languages, including C and Pascal.

The advantages of a hand-written parser include:

(1) Speed. By a few carefully chosen modifications of the program produced by
        a literal transcription of the grammar, an experienced programmer can easily
        obtain a parser that is substantially faster than yacc. If pressed, I have
        examples supporting this claim. The obvious optimizations are to collapse
        the code of separate, non-recursive constructs and to use some sort of
        precedence-based technique for handling the expression portion of the
        language. Of course it is also easy to obtain recursive-descent parsers
        that are slower than yacc, but at least you can control the efficiency.

(2) Flexibility in error-reporting and recovery. You control the code -- you
        can do anything you want. This should be obvious. And as someone pointed
        out, you can use abstractions like: EXPECT(token, msg, skip_set) to mean

"Expect to find the specified token as the current token. If so, read
past it; otherwise print the specified error message and read tokens
until you find the expected one or one in the skip_set. If the
expected one was found, read past it."

        This produces error recovery that is at least the same quality as in yacc
        and at least as obvious from looking at the code.

(3) Ease of data allocation in the semantic code. You automatically get a
        fresh stack frame of variables for each nested occurrence of a particular
        construct. Does anyone know how to implement nested constructs such as FOR
        loops in a yacc parser without having to explicitly implement a stack to
        keep track of information about each currently active for loop? This is a
        real pain in the butt!

(4) Flexibility in use of the parser. Unlike a yacc parser, your parser can
        (by definition) be called recursively. For instance, suppose you decide
        that in order to process the semantics of the language construct you are
        looking at right now, you need to first process some declaration it depends
        on, which is in a different file. What you want to do is suspend the
        current parse, reinvoke the parser on the other file, then come back to
        what you were doing. This is trivial in a hand-written parser, and
        infeasible (as far as I know) in a yacc parser.

The only disadvantage of a hand-written parser that I am aware of is that, for
an experimental language whose syntax is changing rapidly, the effort spent
modifying the parser may be considerable. This is not a serious impediment
for a programmer who is experienced with recursive-descent parsers, but is a
consideration if the parser is to be maintained by programmers who have always
done their parsers with canned generators like yacc.

To summarize with a controversial claim: parser generators are a classic
example of a solution looking for a problem. They exist primarily because
they are fun to write and because automated parsing techniques are fun for
computer scientists to study (or at least easy to study). After using yacc in
three different commercial-quality projects, I do not find that is has saved
me any significant amount of labor, certainly not enough to justify the
restrictions it has put upon my parsers. Parsing is the most straightforward
part of language processing, the part that is easiest to do well without any
table-driven or program-generator techniques. It is the semantic processing
that is least-well understood and which is usually implemented quite poorly.

Michael Condict {ihnp4|vax135|cuae2}!m10ux!mnc
AT&T Bell Labs (201)582-5911 MH 3B-416
Murray Hill, NJ



To support my claim about the resemblance of a recursive-descent parser to
the grammar, here is a short example. Note that the parser does more
than the grammar, namely error reporting and recovery, using the above-
described EXPECT macro:

Grammar R.D. Parser
------ -----------
TOK_TYPE tok, gettok();

prog: stmt void prog() { tok=gettok();
| prog stmt while (tok != EOF_TOK) stmt();
; }

stmt: void stmt() {
switch (tok) {
IF case IF: tok=gettok();
expr expr();
if (tok == END) break;
stmt stmt();
EXPECT(END, "END IF expected", END);
END IF EXPECT(IF, "END IF expected", END);
| case ID: tok=gettok();
ID ':=' EXPECT(ASSN, "':=' expected", ';');
expr ';' expr();
EXPECT(';', "';' expected", ';');
error("Bad token beginning stmt");
; }

expr: void expr() {
primary primary();
while (1) {
if (tok == '+') {
| expr '+' primary tok=gettok(); primary();
} else if (tok == '-'); {
| expr '-' primary tok=gettok(); primary();
} else
; }

primary: ID void primary() {
EXPECT(ID, "ID expected in expr", ';');
; }
[I've written my share of commercial compilers, and found that yacc saved me an
enormous amount of time. To each his own. It occurs to me that it shouldn't
be all that hard to change the yacc parser so that it implements the kind of
error recovery that people have been suggesting for RD, to fake up a stream
of tokens that lets the parse continue. Has anyone done that, or are there
pitfalls I hadn't noticed? -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.