10 Aug 2002 02:13:52 -0400

Related articles |
---|

[13 earlier articles] |

Re: regular expression operators in CF grammars joachim_d@gmx.de (Joachim Durchholz) (2002-07-02) |

Re: regular expression operators in CF grammars soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2002-07-04) |

Re: regular expression operators in CF grammars parsersinc@yahoo.com (SLK Parsers) (2002-07-04) |

Re: regular expression operators in CF grammars rboland@unb.ca (Ralph Boland) (2002-07-04) |

Re: regular expression operators in CF grammars cfc@world.std.com (Chris F Clark) (2002-07-15) |

Re: regular expression operators in CF grammars kgw-news@stiscan.com (2002-07-21) |

Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10) |

Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10) |

Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-23) |

Re: regular expression operators in CF grammars tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-09-03) |

Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-09-12) |

Re: regular expression operators in CF grammars cfc@TheWorld.com (Chris F Clark) (2002-09-14) |

RE: regular expression operators in CF grammars qjackson@shaw.ca (Quinn Tyler Jackson) (2002-09-14) |

From: | "Sönke Kannapinn" <nospam@snafu.de> |

Newsgroups: | comp.compilers |

Date: | 10 Aug 2002 02:13:52 -0400 |

Organization: | [Posted via] Inter.net Germany GmbH |

References: | 02-05-142 02-05-151 02-06-024 02-06-056 02-06-090 02-07-010 02-07-025 02-07-043 |

Keywords: | parse |

Posted-Date: | 10 Aug 2002 02:13:51 EDT |

Chris F Clark writes:

*> > ... "Regular Expressions into Finite Automata" by Anne*

*> > Brueggemann-Klein (TCS 120, 1993).*

*>*

*> ... can I have a little better bibliographic data ...?*

Sorry for replying with great delay. (School summer holidays in

Berlin.) And sorry for my having been too brief. I meant

@article{ Brueggemann-Klein:93

, author = {Br\"ugge\-mann{-}Klein, Anne}

, title = {Regular Expressions into Finite Automata}

, journal = tcs

, year = 1993

, volume = 120

, pages = {197--213}

*}*

where "tcs" abbreviates "Theoretical Computer Science".

In general, I think it could be helpful if I try to clarify a few

points on ELR parsing in this posting.

Let me come back to a statement of your's from your second-latest

posting where you talked about ambiguous rhs regular expressions.

*> There are several possibilities for dealing with that problem.*

*> One of the simplest, is not attempting to distinguish where the*

*> constituents come from, after all the constituents do *not**

*> represent non-terminals, but merely fragments of the rhs*

*> definition of a non-terminal. As I said, it is possible to*

*> consider the erasure of such boundaries a fundamental part of*

*> regular expressions.*

*>*

*> More importantly, the erasure of the boundaries is important for*

*> regular expressions to have the ability to decrease look-ahead*

*> requirements.*

I would like to say two things here:

1. I do not really accept that in compiler construction, rhs regular

expression constituents do not represent non-terminals but merely

fragments of the rhs definition. From an abstract point of view,

compilers realize functions that assign each sentence of some source

language a (semantically equivalent) sentence of some target language

(be it machine language or some intermediate language). Now, source

languages contain an infinite number of sentences, and we want to

model these functions with finite means. So there seems to be no

other useful way to do this than to use structural induction. An

unambiguous finite context-free grammar is a useful skeleton.

Looking at EBNF grammars, we also have a finite number of grammar

rules. But each rhs regular expression itself describes a potentially

infinite language (of rhs expansions). So, again, when we want to

assign semantics with finite means we need a useful finite and

unambiguous structural skeleton. And the rhs regular expressions'

natural inductive structure is perfect for that - why should we

deliberately forget about it?

So in my eyes, it is more or less natural to ask for rhs regular

expressions to be (strongly) unambiguous.

2. Regular expressions do *not* establish additional boundaries with

respect to look-ahead requirements if you choose the ecfg-to-cfg

transformation scheme in a way that is adequate for the LR parsing

technique: The decision whether to reduce according to some EBNF rhs

regex must be postponed until it is clear that the *whole* regex

matches the next input. This is achieved by transforming, e.g., an

ecfg rule containing a regular *-operator

A -> <alpha> <beta>* <gamma>

into cfg rules

A -> <alpha> A'

A' -> <beta> A'

A' -> <gamma>

The cfg obtained through transformation from the ecfg "applies" the

*-operator (by reducing A' -> <beta> A') only after seeing <gamma>

in the input.

You may want to look up the work of Madsen and Kristensen (1976) for a

complete LR-adequate ecfg-to-cfg transformation scheme. Other

transformation schemes are also conceivable as long as they

essentially yield unambiguous right-linear sub-grammars - cf.

Heilbrunner (1979).

A "right-biased" transformation strategy such as the one sketched

above does *not* introduce new look-ahead boundaries "in the middle"

of rhs regex's. All the (sub-)reductions that "apply" regex operators

happen "at the end" of the regex using the look-ahead requirements

there. With respect to the ecfg, the reduction of a (complex) handle

to an ecfg rule's lhs will be performed by a sequence of related

(sub-)reductions of the according cfg; these sub-reductions

effectively constitute a sort of backward walk through the rhs regular

expression from which it can be derived *how* the handle is matched by

the regex's inductive operator structure. Therefore, semantic actions

can be associated with the regex's operators. Reconstruction of any

handle's inductive structure is always uniquely possible as long as

all rhs regex's are (strongly) unambiguous.

Now let me comment on the example grammar parts that you presented in

your latest posting.

*> > You only gain in power if you allow *ambiguous* rhs regular*

*> > expressions (non-stronly-unambiguous, to be precise), or ambiguous*

*> > rhs NFA's. Are you able to write down a compiler-related example*

*> > where you really make wise use of this freedom?*

*>*

*> [...] I can describe the rough outline of what*

*> the problem was and why having an ambiguous regular expression was a*

*> key to solving it.*

*>*

*> The original grammar had productions something like the following:*

*>*

*> obj_id: id | obj_id "." id; // a recursive definition of id ("." id)**

*>*

*> expr: obj_id*

*> | simple_func*

*> | obj_func*

*> | expr "+" expr*

*> ...*

*> ;*

*>*

*> simple_func: id "(" asgn_list ")"; // simple funcs have complex params*

*>*

*> asgn_list: (obj_id "=" expr ";")+;*

*>*

*> obj_func: obj_id "(" expr_list? ")"; // obj funcs have simple params*

*>*

*> expr_list: expr ("," expr)*;*

*>*

*> The problem is this grammar isn't LR(k) for any k. [...]*

So far, so good.

*> The solution that seemed*

*> best was to use regular expressions (relying on their ambiguity) and*

*> replace the following two rules:*

*>*

*> obj_id: id ("." id)*;*

*>*

*> obj_func: id ("." id)* "(" expr_list? ")";*

*>*

*> Doing so, we pushed the lookahead requirements to the point where the*

*> parser generator could resolve them. More importantly, we did so with*

*> minimal changes to the structure of the grammar. There may be an*

*> LR(1) grammar for the language in question. However, I doubt that it*

*> reflects the author's intended structure as well as these minor*

*> modifications did. Moreover, we could recover the structure by using*

*> a grammar predicate, getting the best of both worlds.*

*>*

*> obj_func: id ("." id)* "(" expr_list? ")" >>*

*> obj_id "(" expr_list? ")";*

*>*

*> Since, I don't presume you are familiar with the notation I use for*

*> predicates. The predicate says, if you can match with the regular*

*> expression before the >> operator, apply the regular expression after*

*> the >> operator. Thus, we pattern match with the ambiguous rule and*

*> then apply the non-ambiguous rule to have a nice representation.*

I think I understand what you did here and why you did so. I will

not comment on your concept of "grammar predicates" here. But as far

as I understand "regular expression ambiguity", your grammar doesn't

contain any such ambiguity. To me, from what you have said, it seems

you believe that

obj_func: id ("." id)* "(" expr_list? ")"

has an ambiguous rhs regular expression. With my understanding of

"regular expression ambiguity", this is not the case. Any regex

"a (b c)* d e? f" is clearly strongly unambiguous because, for

each of the regex's extensions, there is always only one unique way

how to apply the regex's operators in order to produce an extension.

(In contrast, "a**" is weakly unambiguous, but not strongly; and

"a* a*" is neither weakly nor strongly unambiguous.)

*> However, I can still see the advantage to treating regular*

*> expression operators as ambiguous in some cases.*

I also do not really understand what you mean by "treating regular

expression operators as ambiguous".

Anyway, I hope it has become a little clearer what *I* mean by

"unambiguous regular expression" and what my idea of an "ELR(k)

grammar" is.

BTW: The (corrected) grammar that you presented is in fact an

ELR(1) grammar as far as I (and Madsen and Kristensen 1976)

understand that term. So an ELR(1) parser is able to unveil the

inductive structure of all handles during reductions and make it

accessible to the user's semantic action code.

*> Thanks for the stimulating exchange,*

*> -Chris*

Thank you, too.

Sönke

Note: The short references to literature that I used in this

posting refer to the full references provided at the end of my

earlier posting.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.