14 Sep 2002 00:21:08 -0400

Related articles |
---|

[18 earlier articles] |

Re: regular expression operators in CF grammars kgw-news@stiscan.com (2002-07-21) |

Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10) |

Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10) |

Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-23) |

Re: regular expression operators in CF grammars tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-09-03) |

Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-09-12) |

Re: regular expression operators in CF grammars cfc@TheWorld.com (Chris F Clark) (2002-09-14) |

RE: regular expression operators in CF grammars qjackson@shaw.ca (Quinn Tyler Jackson) (2002-09-14) |

RE: regular expression operators in CF grammars qjackson@shaw.ca (Quinn Tyler Jackson) (2002-09-14) |

From: | "Chris F Clark" <cfc@TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | 14 Sep 2002 00:21:08 -0400 |

Organization: | Compilers Central |

References: | 02-05-142 02-05-151 02-06-024 02-06-056 02-06-090 02-07-010 02-07-025 02-07-043 02-08-035 02-08-078 02-09-018 02-09-070 <20020912232254.48B8712D0@0lsen.net> |

Keywords: | parse, design |

Posted-Date: | 14 Sep 2002 00:21:08 EDT |

In-reply-to: | <20020912232254.48B8712D0@0lsen.net> (clint@0lsen.net) |

*> I was wondering if you can define 'predicated' in this context. For*

*> example, Terence Parr who wrote ANTLR calls it a predicated LL-k parser...*

Yes, I mean roughly the same thing as Terence meant (see below --- for

explanation). Quinn Tyler Jackson is in the process of working out a

new parser generator, Meta-S (and the theoretical semantics behind

it). It incorporates an extension of predicated grammars (actually

more than a couple of different extensions). While some of those

extensions are of the van Wijngaarden (two-level) grammar type, there

are several extensions that are purely predicated forms (in Terence's

sense).

I'm, of course, interested in the topic, since one of the things

Yacc++ implements is predicated LR grammars. Working with Quinn has

helped me understand some new extensions that we could implement (and

that he already has).

It has also made me aware of a whole different development stream on

predicated grammars, starting with Boris Burshteyn, who wrote a couple

of different parser generators, one of them being USSA (universal

syntax and semantics analyzer, if I recall correctly), which is the

root of this different stream.

It is quite possible that USSA got the idea of predicated grammars

from PCCTS (also written by Terence), ANTLR and JavaCC's predecessor,

since Boris, Terence, and I have communicated several times over the

years exchanging ideas. It is also possible that he developed the

idea separately. (Yacc++ certainly borrowed the idea of predicates

from Terence's PCCTS, as I remember seeing the idea and saying, "wow,

this is something we should do," and so we did.)

----------------------------------------------------------------------

Now, perhaps you were wondering what "predicated" means in a grammar,

so I will explain that.

A predicated grammar has one or more rules of the form:

nterm: predicate normal-right-hand-side;

In PCCTS, it was written:

nterm: predicate? normal-right-hand-side;

In Yacc++, we write this as (because we use ? for the "optional"

regular expression operator):

nterm: predicate >> normal-right-hand-side;

In Meta-S, it is written more like:

nterm: x<-(normal-right-hand-side) <x=predicate>;

However, in all three cases, the idea is roughly the same, the rule

for nterm has two requirements for matching, the input text must match

both the normal-right-hand-side and the predicate. Thus, the nterm

rule takes the intersection of two CFGs. Importantly, CFGs are NOT

closed under inersection, which means nterm can match something that

one could not normally parse with an LL or LR grammar.

One of the cannoical examples is a**i b**i c**i, something with three

parts which match in length. No non-predicated CFG can specify that.

Non-predicated CFGs can only pair up two things. An example grammar

(in Yacc++ notation) for that is something like(*):

nterm: anbn >> "A"* bncn; // if there are matched A and B sets,

// skip the As and match B and C sets

anbn: "A" anbn? "B";

bnbn: "B" bncn? "C";

That is the essence of predicated grammars.

*) Note: to be precise the grammar above is not exactly correct. It

has 2 flaws. It doesn't ensure that there aren't more Bs than As.

It also doesn't work for the case where i==0. The following change

to the 1st rule fixes both flaws, but is more Yacc++ idiosyncratic

in notation (i.e. in Yacc++ predicates are just another regular

expression operator and can be embedded anywhere in a rule).

nterm: ((anbn "C") >> "A"* bncn)?;

----------------------------------------------------------------------

There are lots of little subtle places where each implementor has had

to make choices and that gives the three different versions slightly

different semantics, as well as different notations. However, their

power is roughly comparable (i.e. if you can write a grammar that

works in one of the dialects, a close variation of that grammar will

probably work in another). Part of my interest, of course, has been

understanding Quinn's semantics, so that things which are easier to

express in Quinn's notation than in Yacc++'s can be adopted by Yacc++

someday.

Quinn's interest has been flushing out the semantics of what he has

implemented (and adding easy extensions to it) so that he can explain

how and why his tool solves problems that were previously too

difficult.

Equally importantly, Quinn and I share the notion that we are striving

for a notation that allows the problems to be solved in a rigorous

maner. It is possible to cobble together a solution with two (or

more) yacc parsers and some hacks that solve the problems that

predicates address, but the solution is non-robust and difficult to

understand and maintain.

Hope this helps,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

3 Proctor Street voice : (508) 435-5016

Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.