Re: Compiler Books? Parsers?

Chris F Clark <>
21 Dec 2003 23:14:01 -0500

          From comp.compilers

Related articles
[5 earlier articles]
Re: Compiler Books? Parsers? (2003-11-08)
Re: Compiler Books? Parsers? (Jeff Kenton) (2003-11-21)
Re: Compiler Books? Parsers? (Chris F Clark) (2003-12-03)
Re: Compiler Books? Parsers? (Rodney M. Bates) (2003-12-08)
Re: Compiler Books? Parsers? (Nick Roberts) (2003-12-08)
Re: Compiler Books? Parsers? (Marco van de Voort) (2003-12-20)
Re: Compiler Books? Parsers? (Chris F Clark) (2003-12-21)
Re: Compiler Books? Parsers? (Carl Cerecke) (2003-12-23)
Re: errors in Java programs, was Compiler Books? Parsers? (Joachim Durchholz) (2003-12-27)
Re: Compiler Books? Parsers? (Chris F Clark) (2003-12-27)
Re: Compiler Books? Parsers? (Oliver Zeigermann) (2004-01-02)
Re: Compiler Books? Parsers? (Mark Sayers) (2004-01-07)
Re: Compiler Books? Parsers? (Jeff Kenton) (2004-01-09)
[1 later articles]
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: 21 Dec 2003 23:14:01 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 03-10-113 03-10-145 03-11-010 03-11-083 03-12-017 03-12-116
Keywords: parse, practice, performance
Posted-Date: 21 Dec 2003 23:14:01 EST

I wrote:
> This is trivially true, if one considers the fact that one can take the
> output of any given tool and then profile and modify it so that the
> resulting hand-written tool is faster.

To which, Marco van de Voort, asked:

> Interesting, if the main reason to chose handcoded tools is speed.
> What if it was quality of error generation? Could you argue your case if
> my primary objective was error generation?

I'm not certain I could make the argument (either way) for error
generation. Certainly, the general state of the art of error
reporting in generated parsers is poor.

On the other hand, many tools now given the user a good hand at
certain parts of the error reporting. For example, the machine
generated parser usually has encoded within it the list of tokens that
were expected. In addition, for LL machine generated parsers, the
generated parser knows exactly where in the parse the error occured
and what was attempting to be parsed--the LR machine knows that too,
but in a more limited fashion.

A hand generated parser is likely to be rewritten in the recursive
descent style. Thus, it is going to have roughly the same properties
as a machine generated LL parser, which is likely to be a machine
generated recursive descent parser. The key point being, that at any
point in the resulting parse when an erroneous token is detected the
parser can easily print out a relatively good error message that is
precisely tailored to the problem at hand.

However, if you think about it, that precise handling of the erroneous
token is almost exactly equivalent to adding a new rule to the grammar
that says that at that particular point the token is "legal" and
whatever error message to be printed is the appropriate semantics for
that rule. In the long run, I would be happier adding such rules to
my grammar and having a tool validate that I haven't introduced an
ambiguity. In fact, if I were working on such a problem, I would most
like a tool that implemented "fuzzy grammars"*, so that I could make
my error productions part of the "fuzzy set" and have them only apply
when they didn't introduce ambiguities. However, that's just me.

*) I have at least a pair of great papers on the concept of fuzzy
  grammars--grammars that have numbers associated with the rules that
  tell "how much" a particular rule is "part" of the grammar or not,
  1.0 being entirely applicable, 0.0 being not at all applicable, with
  the most applicable rule being the one that is applied in an
  ambiguous case. Unfortunately, the papers are buried in the stuff
  I'm preparing to move to my new house, so I can't properly cite them
  and give credit to the author--I think it was Peter Anvin....

The second point on this topic, which I think I mentioned in another
thread, is that many (most to my mind) errors are actually sub-lexical
occuring at the single character level and not at the parsing level.
Even most hand-written parsers use some form of separate lexer (which
is mostly context insensitive), so when I make the error of omitting
the closing quote from my character string, it swallows far too much
relevant text and the resulting error recovery isn't important,
because the basic problem the missing character is not in the parsers
purview at all.

Hope this helps,

Chris Clark Internet :
Compiler Resources, Inc. Web Site :
19 Bronte Way #33M voice : (508) 435-5016
Marlboro, MA 01752 USA fax : (508) 251-2347 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.