Re: additional regular expression operators

torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Tue, 31 Mar 2009 13:30:40 +0200

From comp.compilers

Related articles
additional regular expression operators rpboland@gmail.com (Ralph Boland) (2009-03-29)
Re: additional regular expression operators m.helvensteijn@gmail.com (2009-03-30)
Re: additional regular expression operators zaimoni@zaimoni.com (2009-03-30)
Re: additional regular expression operators haberg_20080406@math.su.se (Hans Aberg) (2009-03-30)
*Re: additional regular expression operators torbenm@pc-003.diku.dk* (2009-03-31)**
Re: additional regular expression operators rpboland@gmail.com (Ralph Boland) (2009-03-31)
Re: additional regular expression operators torbenm@pc-003.diku.dk (2009-04-14)
Re: additional regular expression operators zayenz@gmail.com (MZL) (2009-04-15)
Re: additional regular expression operators anton@mips.complang.tuwien.ac.at (2009-04-16)
Re: additional regular expression operators gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-04-16)
Re: additional regular expression operators torbenm@pc-003.diku.dk (2009-04-17)
[1 later articles]

| List of all articles for this month |

From:	torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Newsgroups:	comp.compilers
Date:	Tue, 31 Mar 2009 13:30:40 +0200
Organization:	Department of Computer Science, University of Copenhagen
References:	09-03-111
Keywords:	lex
Posted-Date:	31 Mar 2009 14:28:05 EDT

Ralph Boland <rpboland@gmail.com> writes:

> I plan to provide support for at least three additional
> unary operators to the standard three (?,*, and +)
> (plus new binary operators but that's a topic for another day).
> For a regular expression R they are:
>
> R! :does not accept the empty string but otherwise
> accepts every expression that R does. Very useful.
>
> R~ :accepts exactly the set of strings that R does not accept.
> note that R = R~~.
>
> R% : equivalent to the string (R~)R

I can see the usefulness of the first two, but not really the third.
Do you have an example?

In general, I find R\S (strings in R but not in S) more useful than
the above, and you can easily define the above using this: R! = R\e,
where e is the empty string and R~ = (.*)\R, where . represents any
character.

> I have put these operators after the expression but in fact I prefer
> to have them in front
> because:
> 1) I prefer to have unary operators in front of their expressions
> as in "-5" rather than "5-".

That is a matter of taste, but you would IMO need more compelling
reasons to change what has been standard for half a century.

> 2) Parsing is easier and faster if the unary operator precedes
> the expression.

Not really. If that was the case, early mathematical calculators
would have used prefix sine, log, etc., but even calculators with
infix arithmetic and precedence levels (such as Texas SR51) used
postfix unary operators.

> Suggestions as to further operators to support welcome. Suggestions
> as to what are the best symbols to use for the new operators welcome.

As I said, R\S is very useful. I have also often found use for "a
sequence of Rs separated by <symbol>", such as a list of names
separated by commas. Something like R_s, where s is the symbol could
be a possible notation, e.g. ([A-Za-z]*)_',' for the above example.
R_s expands to (Rs)*R, but since R occurs twice in this, you can
potentially reduce the size of the regexp a lot.

Also, you sometimes want a sequence of things in arbitrary order. So
something like {R,S,T}% as an abbreviation of RST|RTS|SRT|STR|TRS|TSR
would be useful. But it can lead to extremely large expanded
expressions or automatons.

Note that once you have complement or difference, the construction of
automatons get more complicated, as you can't construct an NFA
compositionally and then convert this to a DFA, you have to construct
a DFA directly. There are standard methods for this based on
derivatives of regular expressions, though, so it is not a major
issue. It just means that you can't easily modify an existing tool.

Torben

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: additional regular expression operators

torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)Tue, 31 Mar 2009 13:30:40 +0200

torbenm@pc-003.diku.dk (Torben =?iso-8859-1?Q?=C6gidius?= Mogensen)
Tue, 31 Mar 2009 13:30:40 +0200