Re: additional regular expression operators

Hans Aberg <haberg_20080406@math.su.se>
Mon, 30 Mar 2009 20:30:52 +0200

          From comp.compilers

Related articles
additional regular expression operators rpboland@gmail.com (Ralph Boland) (2009-03-29)
Re: additional regular expression operators m.helvensteijn@gmail.com (2009-03-30)
Re: additional regular expression operators zaimoni@zaimoni.com (2009-03-30)
Re: additional regular expression operators haberg_20080406@math.su.se (Hans Aberg) (2009-03-30)
Re: additional regular expression operators torbenm@pc-003.diku.dk (2009-03-31)
Re: additional regular expression operators rpboland@gmail.com (Ralph Boland) (2009-03-31)
Re: additional regular expression operators torbenm@pc-003.diku.dk (2009-04-14)
Re: additional regular expression operators zayenz@gmail.com (MZL) (2009-04-15)
Re: additional regular expression operators anton@mips.complang.tuwien.ac.at (2009-04-16)
Re: additional regular expression operators gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-04-16)
[2 later articles]
| List of all articles for this month |

From: Hans Aberg <haberg_20080406@math.su.se>
Newsgroups: comp.compilers
Date: Mon, 30 Mar 2009 20:30:52 +0200
Organization: Aioe.org NNTP Server
References: 09-03-111
Keywords: lex
Posted-Date: 31 Mar 2009 14:25:49 EDT

Ralph Boland wrote:
> I am building a tool for translating regular expressions
...
> I plan to provide support for at least three additional
> unary operators to the standard three (?,*, and +)
> (plus new binary operators but that's a topic for another day).
> For a regular expression R they are:
>
> R! :does not accept the empty string but otherwise
> accepts every expression that R does. Very useful.
>
> R~ :accepts exactly the set of strings that R does not accept.
> note that R = R~~.
>
> R% : equivalent to the string (R~)R


Here is some formalism, that perhaps is useful:


Let C be a set of characters, and C* the set of empty strings from C
(i.e., the free monoid). A language L is a subset of C*. Denote the set
of languages of C by Lang(C).


Given two languages L, M, one can form the following set operations:
      complement, union, intersection, difference, symmetric difference
and the monoid operations:
      (concatenation) L * M b  {x*y|x in L or y in M}
      (exponent) L^k b  {x_1 *...*x_k|x_i in L}
      (monoid closure) L* b  union of all L^k, zero or more concatenations
(positive closure) L+ b  union of all L^k, k > 0, one or more
                                                  concatenations
      (S closure) L^S b  union of all L^i, i in S
The monoid closure of L is the submonoid generated by L, and also the
smallest submonoid containing L. The monoid closure is also called the
Kleene closure.


A regular expression language contains symbols representing elements
from C, Lang(C), C*, the set and monoid operations above, plus
parenthesizes, with the obvious semantics.


Now, your operation R~ above is the set complement, and R! is R minus
the empty language. The last one can also easily be constructed from the
other, so it may not be needed to be added explictly.


All the operations above are traditionally written (superscript)
postfix, except the complement which is (inline) prefix, but that is
often not added in regular expression algebras (or only as complements
in C, i.e. for a character set A, take language set difference C-A). But
if you add it, it might be natural to have it prefix.


      Hans Aberg



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.