Re: How to compute a regex which is the difference between two regexes?

"Etienne M. Gagnon" <egagnon@j-meg.com>
Sun, 03 Jan 2010 16:08:28 -0500

          From comp.compilers

Related articles
How to compute a regex which is the difference between two regexes? pengyu.ut@gmail.com (Peng Yu) (2010-01-03)
Re: How to compute a regex which is the difference between two regexes mhelvens@gmail.com (Michiel) (2010-01-03)
Re: How to compute a regex which is the difference between two regexes egagnon@j-meg.com (Etienne M. Gagnon) (2010-01-03)
Re: How to compute a regex which is the difference between two regexes gene.ressler@gmail.com (Gene) (2010-01-03)
Re: How to compute a regex which is the difference between two regexes mciura@gmail.com (Marcin Ciura) (2010-01-04)
Re: How to compute a regex which is the difference between two regexes max@gustavus.edu (Max Hailperin) (2010-01-04)
Re: How to compute a regex which is the difference between two regexes kkylheku@gmail.com (Kaz Kylheku) (2010-01-05)
Re: How to compute a regex which is the difference between two regexes kkylheku@gmail.com (Kaz Kylheku) (2010-01-06)
Re: How to compute a regex which is the difference between two regexes cfc@shell01.TheWorld.com (Chris F Clark) (2010-01-11)
| List of all articles for this month |
From: "Etienne M. Gagnon" <egagnon@j-meg.com>
Newsgroups: comp.compilers
Date: Sun, 03 Jan 2010 16:08:28 -0500
Organization: Compilers Central
References: 10-01-013
Keywords: lex
Cc: Peng Yu <pengyu.ut@gmail.com>
Posted-Date: 04 Jan 2010 11:19:58 EST

Hi Peng,


See my answer below.


> Suppose that I have two regular sets A and B that are generated by two
> regular expressions. I'm wondering if there is always a regular
> expression that can generated the difference between the two sets,
> i.e., A-B. If there is such regular expression, is there a mechanical
> way to generated such regular expression.


Yes, the difference between two regular sets is a regular set. Here is a
mechanical way to produce it:


1- Transform each of regular expressions A and B into automatons AUTA
and AUTB.


2- Create the following NFA:


StartState
|
+--(epsilon)--AUTA
|
\--(epsilon)--AUTB


3- Transform the NFA into a DFA, but determine its accepting state as
follows: A state is an accepting state if and only if it corresponds to
accepting state of AUTA and not to an accepting state of AUTB.


4- You can then use well-known DFA-to-regular-expression transformations
to get a (possibly ugly) equivalent regular expression.


It should be fairly obvious that this is theoretically sound. We are
simply computing a DFA that walks A and B in parallel, and that accepts
all elements of A that are not elements of B.


The beta version of SableCC 4.0 ( http://sablecc.org/ ) implements this
difference (called Except) operator for lexer specifications. You would
express it as follows:


Language my_test_language;


Lexer


    // regular definitions


    a = Any+;
    b = 'Etienne';
    a_difference_b = a Except b;


  Token
    a_difference_b; // list of tokens to be recognized by lexer




Etienne


--
Etienne M. Gagnon, Ph.D.
SableCC: http://sablecc.org



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.