Re: Bison Parser negate production rules

Chris F Clark <cfc@shell01.TheWorld.com>
29 Mar 2007 00:57:34 -0400

From comp.compilers

Related articles
Bison Parser negate production rules royzlife@gmail.com (2007-03-23)
Re: Bison Parser negate production rules gneuner2@comcast.net (George Neuner) (2007-03-26)
Re: Bison Parser negate production rules cfc@shell01.TheWorld.com (Chris F Clark) (2007-03-26)
Re: Bison Parser negate production rules mco333@sympatico.ca (mco333) (2007-03-27)
Re: Bison Parser negate production rules gneuner2@comcast.net (George Neuner) (2007-03-27)
*Re: Bison Parser negate production rules cfc@shell01.TheWorld.com (Chris F Clark)* (2007-03-29)**
Re: Bison Parser negate production rules Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2007-03-29)
Re: Bison Parser negate production rules gneuner2@comcast.net (George Neuner) (2007-03-29)
Re: Bison Parser negate production rules cfc@shell01.TheWorld.com (Chris F Clark) (2007-04-02)

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	29 Mar 2007 00:57:34 -0400
Organization:	The World Public Access UNIX, Brookline, MA
References:	07-03-084 07-03-090 07-03-094 07-03-097
Keywords:	parse, theory
Posted-Date:	29 Mar 2007 00:57:34 EDT

George Neuner <gneuner2@comcast.net> writes:

> On 26 Mar 2007 14:26:17 -0400, Chris F Clark
> <cfc@shell01.TheWorld.com> wrote:
>
>>Now, the "modern" parsing methodologies like GLR, PEGs, and predicated
>>grammars are closed under complement, so if you use a technique like
>>that, you can add a NOT rule and still create a parser.
>
> I'd appreciate some references.

Try this link found on google:
http://research.microsoft.com/conferences/pldi06/presentations/rats_pldi06.pdf

If you are looking for tools that support the various methodologies:

For GLR, look into ADF+SDF and DMS Toolkit
For PEG, look into RATS
For predicated parsers, look at ANTLR, Yacc++, GrammarForge (aka Meta-S)

> It seems to me that the complement of a rule can't be concretely
> defined without limiting the universe to those token strings which
> would be legal in the language defined by the complete set of rules.

Yes, all closures are aginst Sigma* (sigma being the vocbulary).

> It also seems that with such limitation, the complemented rule should
> be parsable using the same model given sufficient lookahead. That is,
> the restriction means the complement subset is just the language minus
> the subset which is the rule.

Yes, but there is no complement, intersect, or difference operator in
BNF. Thus, there is no way to say "minus" in a context-free grammar.
Moreover, it can be proven, that you is no algorithm that can
construct the complement of an arbitrary context-free grammar. So,
although you can do "minus" is some simple cases, you can't do it
generally.

However, the technique you just described is the essential way one
specifies the solution in predicated grammars (or PEG), although some
notations offer direct operators (e.g. we do in Yacc++) to make the
expression more terse. And, this gets to the heart of the question
that started this thread. In a predicated (or PEG) grammar, one can
say do this rule if it applies, and if not do the next rule. This
implies an ordering of the rules.

So, taking the original example:

A: B;
B: C
  | D
  | E
  | Sigma* // parse anything
  ;

Using a predicated grammar, you can force the rules to be ordered and
tested in order. In PEGs, the rules are always ordered, so this
always happens. (More on that later.)

So, what does one do to make the rules ordered in a predicated
grammar, add predicates. I'll use Yacc++ notation, which is "x >> y",
for check predicate x and if it matches, apply y. The ANTLR notation
is essentially to replace the >> with ?, but that is a normal regular
expression operator, so we wanted something unique.

In each case, we add a predicate that matches exactly the same string
we want to reduce, as in:

B: C >> C
  | D >> D
  | E >> E
  | Sigma* // parse anything
  ; // in a PEG, the first rule is always treated like this rule.

If when trying to parse a B you see a C, then parse the B as a C.
If you don't see a C, but instead see a D, then parse the B as a D.
If you don't see a C or D, but see an E, then parse the B as an E.
If you see none, of the above, parse a B as whatever you see.

Now, the one gotcha is that predicates (or PEGs, which are essentially
a use of predicates throughout a grammar) are not a panacea. In
particular, they hide ambiguities. If in the above example, C and D
were ambiguous, that is there is some string that is both a C and a D,
then no error message is generated because the parser will always
treat it as a C. However, if the case is subtle enough, someone
writing the string might intend it to be a D and not realize that it
is also a C. There is no algorithm to tell you (in all cases) if the
intersection of two arbitrary context-free grammars (i.e. C and D) is
empty (they are unambiguous). The LR (and LL) restrictions guarantee
that for the cases they can handle, that the rules are unambiguous,
but the LR restrictions are what predicated grammars (and PEGs) are
trying to get-around (loosen).

>>In fact, Yacc++ can be thought of as taking Yacc/Bison and adding in
>>regular expressions and predicates (then cleaning up the syntax a
>>little). In it we have a not predicate that allows exactly what you
>>are asking for. You will find similar functionality in PCCTS/ANTLR
>>although the underlying technology is LL not LR for them, so you
>>cannot use left recursion (or precedence rules).
>
> I'm not familiar with Yacc++, but I do use ANTLR and I was unaware
> that it has this capability and I don't see any (obvious) syntax for
> it in the manual. Do you have a cite or an example?

Here is a wikipedia pointer that was on my first page of a google
search for: ANTLR predicate.

http://en.wikipedia.org/wiki/Syntactic_predicate

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Bison Parser negate production rules

Chris F Clark <cfc@shell01.TheWorld.com>29 Mar 2007 00:57:34 -0400

Chris F Clark <cfc@shell01.TheWorld.com>
29 Mar 2007 00:57:34 -0400