Re: Full LR(1) parser generator Hyacc 0.9 release

Chris F Clark <cfc@TheWorld.com>
Thu, 28 Feb 2008 00:44:09 -0500 (EST)

          From comp.compilers

Related articles
[10 earlier articles]
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-24)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-25)
Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-26)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-27)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-28)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@shell01.TheWorld.com (Chris F Clark) (2008-02-28)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-28)
Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-28)
| List of all articles for this month |

From: Chris F Clark <cfc@TheWorld.com>
Newsgroups: comp.compilers
Date: Thu, 28 Feb 2008 00:44:09 -0500 (EST)
Organization: Compilers Central
References: 08-02-019 08-02-022 08-02-030 08-02-037 08-02-040 08-02-041 08-02-044 <Pine.SOC.4.61.0802230000570.9914@unixlab03.ces.clemson.edu> 08-02-076 <Pine.SOC.4.61.0802251216540.13590@unixlab03.ces.clemson.edu> 08-02-088 <Pine.SOC.4.61.0802271206510.25337@unixlab03.ces.clemson.edu>
Keywords: LR(1), parse
Posted-Date: 28 Feb 2008 10:43:36 EST

In our extended dialog "Joel E. Denny" writes (>) and I write (> >),
quoted out of order for clarity and simplicity:
> To be sure, I've been saying grammar when I really mean a grammar plus a
> specification for resolving conflicts.


Yes, a grammar is a specification of a language, and since conflict
resolution impacts what language the rules specify, it is part of the
grammar.


> For the grammar you quoted, Pager's algorithm accepts the same language as
> LALR(1) using the weak compatibility test. IELR(1) accepts the same
> language as canonical LR(1) as it should for any grammar.
>
> > I think one can argue that both of those results are "right"
> > for this grammar.
>
> For LR(1) algorithms (but not for implementations of them), I choose to
> define "right" as the behavior of a canonical LR(1) parser simply because
> it appears that canonical LR(1) exhibits the easiest behavior for a
> grammar developer to understand. I'm not sure what other useful
> definition of "right" exists here.


> Of course, I've heard the argument that I'm talking about artificial toy
> grammars and that any well designed grammar wouldn't suffer from these
> problems. I'm not sure how to prove or disprove a statement like that
> given how imprecise it is. However, in my experience, most grammars don't
> start off well designed and any good design that the grammar developer
> achieves may be lost (perhaps temporarily) during further development and
> maintenance. Based on discussions and papers I've seen on this topic, it
> seems that grammar developers can predict how a canonical LR(1) parser
> behaves in such situations, but most grammar developers have difficulty
> understanding the effects that various state merging algorithms have on
> the parser's behavior. IELR(1) obviates the need to understand such
> effects because it makes certain there are none. Thus, even if the final
> grammar ends up being LALR(1), developing and debugging that grammar may
> prove easier with IELR(1).


A reasonable argument for sure, and one which I would accept. The
only point I will make counter to it is that many grammars were
orignally developed with yacc (or earlier bison versions) and any
precedence and associativity declarations in those grammars would have
been added with the assumption of how an LALR parser generator uses
them to resolve the conflicts. This gives us an alternate definition
of "right", which is right is what is compatible across the different
tools that the user is using. Compatibility is an important goal in
itself. It would be nice if you could find a way to warn users who
ask for it which inputs would parse differently using conflict
resolution in the LR and LALR states.


However, and this is the advice I give to my clients, I recommend
removing all precedence and associativity declarations when extending
a grammar, making the extensions, and then (selectively and carefully)
adding the declarations back in. It, of course, helps if one follows
some form of process (such as agile development) where there exist
test cases (preferably both positive and negative ones) for what the
grammar should and should not accept.


One difficulty that is often overlooked due to its superficial
simplicity is that grammar maintenance is an order-of-magnitude harder
and more subtle than program maintenance. (It's one of my big
quibbles with hand-written recursive descent where people do grammar
maintenance without formal tools, but that's my personal bugaboo.)
Small changes in grammars can effect large changes in the languages
that they accept and people overlook this. This is even more true
when one starts playing with precedence declarations.


> > If you want
> > the affect of %left in that case, you are better off, simply removing
> > the rule (A: a a;) that causes the conflict.
>
> This isn't always a good solution as (A: a a;) may be useful in other
> parts of the grammar. In the Bison mailing list thread you quoted, I
> extend the first grammar to demonstrate how this might happen.


Yes, I saw that grammar. However, if one starts with the idea that a
grammar is a specification of a language (rather than a solution to a
puzzle), one quickly realizes that "useful" isn't a good concept. The
rules in a grammar should help one understand what is and is not in a
language. Having a rule that applies only in certain circumstances,
particularly when those circumstances are described indirectly
(through the interaction of other rules via precedence declarations)
is a good way to write something incomprehensible. Thus, my statement
about language design in a previous post in this thread.


I suspect a certain number of these conflicts occur because the
developer thinks, I need a specific feature in this circumstance and
extends the grammar for that one particular case without thinking
about how the extension applies globally or will interact with other
features. Then, as the grammar gets widened so that the new feature
is allowed generally, one realizes--oops, this conflicts with that. o
my mind it is easier to fall into this trap if you have a hand-written
recursive descent parser and ad-hack extensions are easy, where as
with a grammar generated parser, one is more likely to have reused
rules, so changing it in one place will change it everywhere, and the
tool will warn you of conflicts.


For cases, where one is attempting to implement a language that
already exists and is specified by someone else, I would relax that
argument, because in that case one may be attempting to solve a
puzzle, i.e. how did someone come up with a language that accepts only
these sentences?


However, it is somewhat uncommon for cases like you posit to exist in
standardized languages (although most standardized languages have one
such wart,but it isn't always a grammatic one) because someone on the
committee is likely a grammar enthusiast (and more importantly, it is
very rare for them to be solved by using precedence to exclude some
specific expansion of one alternative of one rule). In fact, when
such cases exist, the reference grammar usually accepts a larger
language and then has restrictions written in English to explain which
cases aren't actually legal. You will note that our moderator John
often recommends that approach, because it makes for a more robust
system with clearer error messages than "syntax error found a :: when
expecting a (".


The three examples I recall most vividly are: 1) in SQL certain
clauses only can apply to non-null lists, but the grammar doesn't
restrict them that way (and user's I've worked with who have tried to
make such restrictions grammatical have all failed and given up). 2)
In certain object-oriented languages like C++ their are times when
references must be (and conversely must not be) fully qualified, again
it is essentially impossible to write a grammar that covers exactly
the right cases, and it is better to always accept the fully-qualified
name syntactically and then use a semantic check to disallow it where
inappropriate. 3) some language authors get the idea that it would be
nice to have only boolean expressions in if statements and try to
write their language so that one can syntactically determine if a
result is boolean--the result ends up being a horrid mess with almost
every expression rule duplicated in a boolean and non-boolean form.


It is important that authors realize that certain problems cannot be
solved well grammatically.


Hope this helps,
-Chris


******************************************************************************
Chris Clark Internet: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. or: compres@world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.