Mon, 30 Mar 2009 20:30:52 +0200

Related articles |
---|

additional regular expression operators rpboland@gmail.com (Ralph Boland) (2009-03-29) |

Re: additional regular expression operators m.helvensteijn@gmail.com (2009-03-30) |

Re: additional regular expression operators zaimoni@zaimoni.com (2009-03-30) |

Re: additional regular expression operators haberg_20080406@math.su.se (Hans Aberg) (2009-03-30) |

Re: additional regular expression operators torbenm@pc-003.diku.dk (2009-03-31) |

Re: additional regular expression operators rpboland@gmail.com (Ralph Boland) (2009-03-31) |

Re: additional regular expression operators torbenm@pc-003.diku.dk (2009-04-14) |

Re: additional regular expression operators zayenz@gmail.com (MZL) (2009-04-15) |

Re: additional regular expression operators anton@mips.complang.tuwien.ac.at (2009-04-16) |

Re: additional regular expression operators gah@ugcs.caltech.edu (glen herrmannsfeldt) (2009-04-16) |

[2 later articles] |

From: | Hans Aberg <haberg_20080406@math.su.se> |

Newsgroups: | comp.compilers |

Date: | Mon, 30 Mar 2009 20:30:52 +0200 |

Organization: | Aioe.org NNTP Server |

References: | 09-03-111 |

Keywords: | lex |

Posted-Date: | 31 Mar 2009 14:25:49 EDT |

Ralph Boland wrote:

*> I am building a tool for translating regular expressions*

...

*> I plan to provide support for at least three additional*

*> unary operators to the standard three (?,*, and +)*

*> (plus new binary operators but that's a topic for another day).*

*> For a regular expression R they are:*

*>*

*> R! :does not accept the empty string but otherwise*

*> accepts every expression that R does. Very useful.*

*>*

*> R~ :accepts exactly the set of strings that R does not accept.*

*> note that R = R~~.*

*>*

*> R% : equivalent to the string (R~)R*

Here is some formalism, that perhaps is useful:

Let C be a set of characters, and C* the set of empty strings from C

(i.e., the free monoid). A language L is a subset of C*. Denote the set

of languages of C by Lang(C).

Given two languages L, M, one can form the following set operations:

complement, union, intersection, difference, symmetric difference

and the monoid operations:

(concatenation) L * M b {x*y|x in L or y in M}

(exponent) L^k b {x_1 *...*x_k|x_i in L}

(monoid closure) L* b union of all L^k, zero or more concatenations

(positive closure) L+ b union of all L^k, k > 0, one or more

concatenations

(S closure) L^S b union of all L^i, i in S

The monoid closure of L is the submonoid generated by L, and also the

smallest submonoid containing L. The monoid closure is also called the

Kleene closure.

A regular expression language contains symbols representing elements

from C, Lang(C), C*, the set and monoid operations above, plus

parenthesizes, with the obvious semantics.

Now, your operation R~ above is the set complement, and R! is R minus

the empty language. The last one can also easily be constructed from the

other, so it may not be needed to be added explictly.

All the operations above are traditionally written (superscript)

postfix, except the complement which is (inline) prefix, but that is

often not added in regular expression algebras (or only as complements

in C, i.e. for a character set A, take language set difference C-A). But

if you add it, it might be natural to have it prefix.

Hans Aberg

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.