Re: regular expression operators in CF grammars

"Chris F Clark" <cfc@TheWorld.com>
14 Sep 2002 00:21:08 -0400

          From comp.compilers

Related articles
[18 earlier articles]
Re: regular expression operators in CF grammars kgw-news@stiscan.com (2002-07-21)
Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10)
Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10)
Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-23)
Re: regular expression operators in CF grammars tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-09-03)
Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-09-12)
Re: regular expression operators in CF grammars cfc@TheWorld.com (Chris F Clark) (2002-09-14)
RE: regular expression operators in CF grammars qjackson@shaw.ca (Quinn Tyler Jackson) (2002-09-14)
RE: regular expression operators in CF grammars qjackson@shaw.ca (Quinn Tyler Jackson) (2002-09-14)
| List of all articles for this month |

From: "Chris F Clark" <cfc@TheWorld.com>
Newsgroups: comp.compilers
Date: 14 Sep 2002 00:21:08 -0400
Organization: Compilers Central
References: 02-05-142 02-05-151 02-06-024 02-06-056 02-06-090 02-07-010 02-07-025 02-07-043 02-08-035 02-08-078 02-09-018 02-09-070 <20020912232254.48B8712D0@0lsen.net>
Keywords: parse, design
Posted-Date: 14 Sep 2002 00:21:08 EDT
In-reply-to: <20020912232254.48B8712D0@0lsen.net> (clint@0lsen.net)

> I was wondering if you can define 'predicated' in this context. For
> example, Terence Parr who wrote ANTLR calls it a predicated LL-k parser...


Yes, I mean roughly the same thing as Terence meant (see below --- for
explanation). Quinn Tyler Jackson is in the process of working out a
new parser generator, Meta-S (and the theoretical semantics behind
it). It incorporates an extension of predicated grammars (actually
more than a couple of different extensions). While some of those
extensions are of the van Wijngaarden (two-level) grammar type, there
are several extensions that are purely predicated forms (in Terence's
sense).


I'm, of course, interested in the topic, since one of the things
Yacc++ implements is predicated LR grammars. Working with Quinn has
helped me understand some new extensions that we could implement (and
that he already has).


It has also made me aware of a whole different development stream on
predicated grammars, starting with Boris Burshteyn, who wrote a couple
of different parser generators, one of them being USSA (universal
syntax and semantics analyzer, if I recall correctly), which is the
root of this different stream.


It is quite possible that USSA got the idea of predicated grammars
from PCCTS (also written by Terence), ANTLR and JavaCC's predecessor,
since Boris, Terence, and I have communicated several times over the
years exchanging ideas. It is also possible that he developed the
idea separately. (Yacc++ certainly borrowed the idea of predicates
from Terence's PCCTS, as I remember seeing the idea and saying, "wow,
this is something we should do," and so we did.)


----------------------------------------------------------------------


Now, perhaps you were wondering what "predicated" means in a grammar,
so I will explain that.


A predicated grammar has one or more rules of the form:


nterm: predicate normal-right-hand-side;


In PCCTS, it was written:


nterm: predicate? normal-right-hand-side;


In Yacc++, we write this as (because we use ? for the "optional"
regular expression operator):


nterm: predicate >> normal-right-hand-side;


In Meta-S, it is written more like:


nterm: x<-(normal-right-hand-side) <x=predicate>;


However, in all three cases, the idea is roughly the same, the rule
for nterm has two requirements for matching, the input text must match
both the normal-right-hand-side and the predicate. Thus, the nterm
rule takes the intersection of two CFGs. Importantly, CFGs are NOT
closed under inersection, which means nterm can match something that
one could not normally parse with an LL or LR grammar.


One of the cannoical examples is a**i b**i c**i, something with three
parts which match in length. No non-predicated CFG can specify that.
Non-predicated CFGs can only pair up two things. An example grammar
(in Yacc++ notation) for that is something like(*):


nterm: anbn >> "A"* bncn; // if there are matched A and B sets,
                                                    // skip the As and match B and C sets
anbn: "A" anbn? "B";
bnbn: "B" bncn? "C";


That is the essence of predicated grammars.


*) Note: to be precise the grammar above is not exactly correct. It
  has 2 flaws. It doesn't ensure that there aren't more Bs than As.
  It also doesn't work for the case where i==0. The following change
  to the 1st rule fixes both flaws, but is more Yacc++ idiosyncratic
  in notation (i.e. in Yacc++ predicates are just another regular
  expression operator and can be embedded anywhere in a rule).


nterm: ((anbn "C") >> "A"* bncn)?;


----------------------------------------------------------------------


There are lots of little subtle places where each implementor has had
to make choices and that gives the three different versions slightly
different semantics, as well as different notations. However, their
power is roughly comparable (i.e. if you can write a grammar that
works in one of the dialects, a close variation of that grammar will
probably work in another). Part of my interest, of course, has been
understanding Quinn's semantics, so that things which are easier to
express in Quinn's notation than in Yacc++'s can be adopted by Yacc++
someday.


Quinn's interest has been flushing out the semantics of what he has
implemented (and adding easy extensions to it) so that he can explain
how and why his tool solves problems that were previously too
difficult.


Equally importantly, Quinn and I share the notion that we are striving
for a notation that allows the problems to be solved in a rigorous
maner. It is possible to cobble together a solution with two (or
more) yacc parsers and some hacks that solve the problems that
predicates address, but the solution is non-robust and difficult to
understand and maintain.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.