Re: Parsing Expression Grammar

Chris F Clark <>
10 Sep 2005 12:30:51 -0400

          From comp.compilers

Related articles
[3 earlier articles]
Re: Parsing Expression Grammar (Chris F Clark) (2005-09-02)
Re: Parsing Expression Grammar (A Pietu Pohjalainen) (2005-09-02)
Re: Parsing Expression Grammar (Oliver Wong) (2005-09-03)
Re: Parsing Expression Grammar (A Pietu Pohjalainen) (2005-09-07)
Re: Parsing Expression Grammar (Paul Mann) (2005-09-07)
Re: Parsing Expression Grammar (2005-09-10)
Re: Parsing Expression Grammar (Chris F Clark) (2005-09-10)
Re: Parsing Expression Grammar (George Neuner) (2005-09-10)
Re: Parsing Expression Grammar (Hans-Peter Diettrich) (2005-09-10)
Re: Parsing Expression Grammar (Paul Mann) (2005-09-11)
Re: Parsing Expression Grammar (Paul Mann) (2005-09-11)
Re: Parsing Expression Grammar (Hans-Peter Diettrich) (2005-09-14)
Re: Parsing Expression Grammar (George Neuner) (2005-09-14)
[18 later articles]
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: 10 Sep 2005 12:30:51 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 05-09-009 05-09-023
Keywords: syntax, design
Posted-Date: 10 Sep 2005 12:30:51 EDT

Paul Mann asked a wonderful question in this thread:
>Why constrain your language to LL(1) when LALR(1) tools are available?

1st, to parrot him:
> Answer: Readability for human's sake.

Now, to explain. I've used dialects of a wide variety of programming
languages for years and so have many of my peers and co-workers, and
guess what,we generally avoid using the "readability" shortcuts that
most languages afford.

A good example is most languages have some precedence operations that
mimic traditional algebraic precedences, i.e. multiplication binds
more tightly than addition. However, when writing conditional
expressions, we almost universally parenthesize them, including
parenthesizing &'s (ands) nested within |'s (ors), despite not knowing
a language which got those precendences wrong (at least not one which
imposed any precedence rules at all, APL being an example of a
language with totally flat precedence rules).

Similarly, I use "Perl" style conditionals, meaning if it takes more
than one line, it goes in a block. I only use a single statement with
a conditional when it fits on 1 line.

Now, how does this apply to LL(1) v. LR(k) grammars? Simple. The
easiest way to write an LL(1) grammar is to start each rule with a
unique non-terminal, just like old BASIC used to with the "let"
statement. Pascal pretty much followed this idea also, with each
declaration type have a unique keyword (and putting the type
expression after the variables being declared). Those languages are
both very easy to read. If I were designing a new language,
especially in an untested or novel area, where there isn't lots of
existing practice to follow, I would keep the syntax simple, so that
novice readers had a chance of understanding the language.

With communities that have existing practice, life is different. For
them, you want to be as close to their default notation as possible.
For example, we kept Yacc++ as close to yacc notation as we could.
Well, we could have stayed a little closer (and that might have been
better). When, we added regular expressions to Yacc++, we used the
"standard" notations (*, +, and ?) and precedences. We even added the
[] notation for optional, because that we often used in our target
community. Again, that was a potentially questionable chose because
much of our community has a LEX background where [] means a character
class. However, very little of our notation is LEX-like, so we hoped
that wouldn't be a major burden.

Now, when parsing an existing notation, you want to try to mimic the
conventions of the community as close as possible. That may require
sophisticated precedence rules, and that might require the most
powerful parsing notation one can get within your other constraints.
At the time we wrote Yacc++, Yacc++ was "it". I've never needed a
feature that Yacc++ doesn't provide, but if I were to come upon a need
for an adaptive parser, I would either look into Meta-S or USSA. I
could also imagine using a GLR parser like DMS. (To be totally
honest, I would be just as likely to add such features to Yacc++. But
if a potential client were to ask me, I would probably recommend one
of the above tools if I didn't think their problem fit well with
Yacc++ as being shipped.)

Even in the case, of implementing an LL(1) language, I would probably
choose the "most powerful" tool that was convenient to use to parse
the language. Thus, I would write an LR grammar for an LL language,
simply because the notation is a little more readable. I would not
give up regular expressions (or even predicates) to get LR power.
thus, if I had to choose between PCCTS and "vanilla" yacc, I would
choose PCCTS most of the time. The exception being the case, where I
had a non-trivial expression syntax. I wouldn't give up the
precedence and associativity declarations for an expression grammar
for regular expressions. Of course, I don't have to, since Yacc++ has
all of the above (excuse my lack of immodesty here).

BTW, for most cases, you shouldn't have to either. We are going to
license a version (roughly 2.1) of Yacc++ for "free" use in
non-commercial projects. The library will be GPL and the generator
will be "gratis" for non-commercial use (but not released in source
for now, next year we will probably have a GPL version of the
generator also). We don't have an official version that meets those
criteria, but if someone writes to us, I can get such a version
finished off "post haste".

Hope this helps,

Chris Clark Internet :
Compiler Resources, Inc. Web Site :
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.