Tue, 31 Mar 2009 13:30:40 +0200

Related articles |
---|

additional regular expression operators rpboland@gmail.com (Ralph Boland) (2009-03-29) |

Re: additional regular expression operators m.helvensteijn@gmail.com (2009-03-30) |

Re: additional regular expression operators zaimoni@zaimoni.com (2009-03-30) |

Re: additional regular expression operators haberg_20080406@math.su.se (Hans Aberg) (2009-03-30) |

Re: additional regular expression operators torbenm@pc-003.diku.dk (2009-03-31) |

Re: additional regular expression operators rpboland@gmail.com (Ralph Boland) (2009-03-31) |

Re: additional regular expression operators torbenm@pc-003.diku.dk (2009-04-14) |

Re: additional regular expression operators zayenz@gmail.com (MZL) (2009-04-15) |

Re: additional regular expression operators anton@mips.complang.tuwien.ac.at (2009-04-16) |

Re: additional regular expression operators gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-04-16) |

Re: additional regular expression operators torbenm@pc-003.diku.dk (2009-04-17) |

[1 later articles] |

From: | torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen) |

Newsgroups: | comp.compilers |

Date: | Tue, 31 Mar 2009 13:30:40 +0200 |

Organization: | Department of Computer Science, University of Copenhagen |

References: | 09-03-111 |

Keywords: | lex |

Ralph Boland <rpboland@gmail.com> writes:

*> I plan to provide support for at least three additional*

*> unary operators to the standard three (?,*, and +)*

*> (plus new binary operators but that's a topic for another day).*

*> For a regular expression R they are:*

*>*

*> R! :does not accept the empty string but otherwise*

*> accepts every expression that R does. Very useful.*

*>*

*> R~ :accepts exactly the set of strings that R does not accept.*

*> note that R = R~~.*

*>*

*> R% : equivalent to the string (R~)R*

I can see the usefulness of the first two, but not really the third.

Do you have an example?

In general, I find R\S (strings in R but not in S) more useful than

the above, and you can easily define the above using this: R! = R\e,

where e is the empty string and R~ = (.*)\R, where . represents any

character.

*> I have put these operators after the expression but in fact I prefer*

*> to have them in front*

*> because:*

*> 1) I prefer to have unary operators in front of their expressions*

*> as in "-5" rather than "5-".*

That is a matter of taste, but you would IMO need more compelling

reasons to change what has been standard for half a century.

*> 2) Parsing is easier and faster if the unary operator precedes*

*> the expression.*

Not really. If that was the case, early mathematical calculators

would have used prefix sine, log, etc., but even calculators with

infix arithmetic and precedence levels (such as Texas SR51) used

postfix unary operators.

*> Suggestions as to further operators to support welcome. Suggestions*

*> as to what are the best symbols to use for the new operators welcome.*

As I said, R\S is very useful. I have also often found use for "a

sequence of Rs separated by <symbol>", such as a list of names

separated by commas. Something like R_s, where s is the symbol could

be a possible notation, e.g. ([A-Za-z]*)_',' for the above example.

R_s expands to (Rs)*R, but since R occurs twice in this, you can

potentially reduce the size of the regexp a lot.

Also, you sometimes want a sequence of things in arbitrary order. So

something like {R,S,T}% as an abbreviation of RST|RTS|SRT|STR|TRS|TSR

would be useful. But it can lead to extremely large expanded

expressions or automatons.

Note that once you have complement or difference, the construction of

automatons get more complicated, as you can't construct an NFA

compositionally and then convert this to a DFA, you have to construct

a DFA directly. There are standard methods for this based on

derivatives of regular expressions, though, so it is not a major

issue. It just means that you can't easily modify an existing tool.

Torben

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.