23 Aug 2002 11:15:48 -0400

Related articles |
---|

[15 earlier articles] |

Re: regular expression operators in CF grammars parsersinc@yahoo.com (SLK Parsers) (2002-07-04) |

Re: regular expression operators in CF grammars rboland@unb.ca (Ralph Boland) (2002-07-04) |

Re: regular expression operators in CF grammars cfc@world.std.com (Chris F Clark) (2002-07-15) |

Re: regular expression operators in CF grammars kgw-news@stiscan.com (2002-07-21) |

Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10) |

Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10) |

Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-23) |

Re: regular expression operators in CF grammars tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-09-03) |

Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-09-12) |

Re: regular expression operators in CF grammars cfc@TheWorld.com (Chris F Clark) (2002-09-14) |

RE: regular expression operators in CF grammars qjackson@shaw.ca (Quinn Tyler Jackson) (2002-09-14) |

From: | "Chris F Clark" <cfc@shell01.TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | 23 Aug 2002 11:15:48 -0400 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 02-05-142 02-05-151 02-06-024 02-06-056 02-06-090 02-07-010 02-07-025 02-07-043 02-08-035 |

Keywords: | parse |

Posted-Date: | 23 Aug 2002 11:15:48 EDT |

Sönke Kannapinn writes:

*> In general, I think it could be helpful if I try to clarify a few*

*> points on ELR parsing in this posting.*

From your posting, I see that although we have not completely

understood each other, we are not as far off as it would have seemed.

*> This is achieved by transforming, e.g., an*

*> ecfg rule containing a regular *-operator*

*>*

*> A -> <alpha> <beta>* <gamma>*

*>*

*> into cfg rules*

*>*

*> A -> <alpha> A'*

*> A' -> <beta> A'*

*> A' -> <gamma>*

...

*> A "right-biased" transformation strategy such as the one sketched*

*> above does *not* introduce new look-ahead boundaries "in the middle"*

*> of rhs regex's.*

Point well made. I had "assumed" that the only "valid" transformation was

A -> <alpha> A' <gamma>

A' -> <beta> A' | empty

as that is the transformation that reflects the original structure of

the grammar. In particular, that is the "shape" of the resulting tree

that I would expect to see constructed.

So, thank you, I learned something. The concept of a right-biased

expansion is intuitively obvious, now that it's been pointed out to

me :-).

My next question is do you then massage the resulting tree to be in

the form I would expect above or do you leave it in the same shape

that the right-biased transformation makes?

The reason I ask this question is that it mirrors a question someone

recently asked about grammar transformation tools. In a sense, the

right-biased transformations you are applying "change the shape" of

the grammar from what a naive grammar write would expect. Now, that's

still better than entirely losing the structure of regular

expressions, but if one does not put the tree back in there naive

shape, it still seems that the user will have to understand the

grammar transformations.

*> BTW: The (corrected) grammar that you presented is in fact an*

*> ELR(1) grammar as far as I (and Madsen and Kristensen 1976)*

*> understand that term. So an ELR(1) parser is able to unveil the*

*> inductive structure of all handles during reductions and make it*

*> accessible to the user's semantic action code.*

Yes, this does not surprise me now that I better understand you. The

right-biased transformations would not introduce the boundaries that

our "ambiguous treatement of regular expressions" was attempting to

avoid. I was not trying to suggest handling ambiguous regular

expressions (more on that in a moment). I was merely saying that one

does not want to detect the right edge prematurely. We solved that

problem by erasing both edges (which is what I mean by treating them

ambiguously, one cannot determine where the regular expression starts

or finishes at the reduction point of the containing rule. The

right-biased solution should allow one to recover the edges working

from the right end of the rule. Thus, it is a better solution.

More no ambiguous regular expressions:

*> (In contrast, "a**" is weakly unambiguous, but not strongly; and*

*> "a* a*" is neither weakly nor strongly unambiguous.)*

I believe we specifically disallow a** as it adds no meaning that a*

does not. (That also helps us reserve ** as an operator for future

regular expression extensions.)

However, a* a*, we accept. Internally, it works essentially treating

the first a* as greedy (and the 2nd a* as always empty), but since one

cannot see the regular expression boundaries, that is irrelevant. The

more interesting case is a* (a|b)*, where the 1st b determines that

one must use the 2nd closure, or (a|b)* (a|c)* where "somewhere"

between that last b and the 1st c, one must switch closures.

Disallowing these ambiguities (especially the last one) makes the

grammar writers job harder (and for that I would suggest allowing

ambiguities).

A similar interesting case, we have worried about is A->A* or A->A* a,

etc. where recursion and regular expressions are mixed. In that case,

we "choose" to iterate over the regular expression rather than

building the recursively structured tree.

Part of this bias comes from the original orientation of Yacc++. It

was originally developed as part of another tool (called JADE) which

was a variation of Emacs that imposed structure (via grammars) on the

buffers. In editors, one wants to keep the structure as flat (and

minimal) as possible to avoid tree reshaping artifacts that most

syntax directed editors suffer from. So, when looking at a regular

expression, I don't ask does the resulting tree tower to the left or

to the right, to me it should not "tower" at all, it is simply a list

and it has n-entries in it. However, that is just my personal bias.

I think one can see an even more extreme case of that bias in

"languages" like HTML, which are just barely hierarchical. The

essential idea is that one has a stream of text and one inserts markup

to distinguish portions of it, but the markup should not interfere too

much with the text as text.

Best regards,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

3 Proctor Street voice : (508) 435-5016

Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.