Yacc poll summary (long)

harvard!ames!ausmelb!ejp
Tue, 8 Sep 87 19:03:15 EST

          From comp.compilers

Related articles
Yacc poll summary (long) harvard!ames!ausmelb!ejp (1987-09-08)
Re: Yacc poll summary (long) sun!texsun!pollux!bobkat!pedz@ucbvax.Berkeley.EDU (Pedz Thing) (1987-09-14)
Re: Yacc poll summary (long) harvard!ausmelb.oz.au!ejp (1987-09-21)
| List of all articles for this month |
From: harvard!ames!ausmelb!ejp
Date: Tue, 8 Sep 87 19:03:15 EST

This is a summary of the responses to my recent USENET comp.compilers poll
on Yacc.


1. Straw issues: the form of Yacc's input, the limitations of its
syntax/semantic interface, the incomprehensibility of the C output, the size
of the generated parser, and debugging difficulties. These drew few or no
comments. (Frank DeRemer did point out years ago that LR tables, being
randomly accessed, can disturb a paging system more than a recursive-descent
parser, which has better locality of reference.)


2. Speed of the generated parser: the consensus was that there is no problem,
although it certainly is possible to hand-write faster parsers if required.
Those interested are referred to Tom Penello's recent TOPLAS paper,
for recursive-descent and LR(1) implementation possibilities.


3. Generic objections to LR parsing: Steve Stevenson
(steve@hubcap.clemson.edu) contributed an interesting discussion.


(a) it is possible (desirable?) to unite the lexical and syntax grammars
(b) he personally wants a 'pipe of parsers'
(c) you may not want to use the LR algorithm to gen. the parser.


Steve has modified his 'yacc' to do some of these things.


Occasionally, papers surface with titles like the recent SIGPLAN 'Are LR
parsers too powerful?', which I personally found entirely content-free. One
poll respondent claimed that LR parsing is a solution looking for a problem,
again without real justification.


4. Generic preference for other techniques: lots of people like
recursive-descent, as indeed do I, but I don't like living with the result. I
haven't lived with a Yacc result yet ... Nobody seemed keen on ad-hoc
techniques, and nobody even mentioned McKeeman/Horning MSP, Floyd production
systems, or the other antecedents.


A long debate ensued on Yacc's error recovery, agreeing that it was poor.
I can provide three causes for this:


(a) Parsers with LALR(1) (as opposed to LR(1)) tables sometimes
reduce too soon, making context-dependent recovery more difficult.


(b) System V Yacc treats states where nothing except <error>
can be shifted as full LR(1) states, not merging them, preventing (a).
BSD Yacc, which I would guess most folks are using,
merges them, causing (a).


(c) There is no good advice or folklore as to how to use
Yacc error rules; the Yacc manual implies that you should be
content with a simple-minded panic-mode:


stmnt : error ';'


This appears to have been followed fairly literally in pcc,
which has only 8 error productions.


The best suggestions on Yacc error productions I have seen are in
'Introduction to Compiler Construction with Unix' by Schriener & Friedman
(Prentice-Hall 1985) (a book of otherwise limited usefulness). Without
infringing their copyright, they say that error productions should be
associated with each repetitive construct, showing how for the three possible
cases (optional sequence, non-optional sequence and separated list). Doing
this systematically allows Yacc to recover to almost arbitrarily close restart
points.


It still won't attempt error *repair*, i.e. insertion or substitution of
symbols, but then there is another debate as to whether compilers should
attempt to second-guess programmers in this way, or simply catch as many of
their errors as possible. See also the April 1987 TOPLAS page 164,
for 'A Practical Method for LR and LL Syntax Error Diagnosis and Recovery',
a promising-looking scheme involving delayed reductions.


There is yet another debate as to whether compilers should say 'syntax error'
or things like 'ELSE without IF'. One is uninformative, while the other
describes the symptom, not the problem.


Perry Smith a.k.a. (Pedz Thing) (pedz@bobkat) wrote "the shift/reduce conflict
resolution could be more intelligent. In determining a shift/reduce conflict,
I think it needs to look up the parse tree and see what productions call the
productions in conflict. Usually, the problems could be resolved by
looking the ``parent'' productions."
He gave these examples:


        (a) statement : stuff statement | stuff statement ELSE statement ;


which gives an S/R conflict, and


        (b) statement : stuff thing ;
thing : statement | statement ELSE statement ;


which doesn't.


I found this comment - and the examples - most surprising. LR(1) analysers
including Yacc do indeed 'look back up the parse tree', in fact they look all
over it. However, in the above (a) is of course its own 'parent' production,
as it contains recursion. It is entirely clear that (a) is just plain
ambiguous, because of having > 1 recursion. Even in recursive descent you'd
have to make ELSE left-associative (by testing for it first) if you coded
directly from (a).


Respondents also went for Yacc for reasons of formal correctness, productivity,
maintenance, ...


Incidentally, we went with Yacc. Of course it ain't easy for Cobol, which is
neither regular, context-free, nor LR(1). I can discuss some of these problems
further: inquiries via e-mail please.


Thanks to all who responded, and to the moderator for making it possible.
--
Esmond Pitt, Austec International Ltd
...!seismo!munnari!ausmelb!ejp,ejp@ausmelb.oz.au
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.