From: | "Etienne M. Gagnon" <egagnon@j-meg.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 03 Jan 2010 16:08:28 -0500 |
Organization: | Compilers Central |
References: | 10-01-013 |
Keywords: | lex |
Cc: | Peng Yu <pengyu.ut@gmail.com> |
Posted-Date: | 04 Jan 2010 11:19:58 EST |
Hi Peng,
See my answer below.
> Suppose that I have two regular sets A and B that are generated by two
> regular expressions. I'm wondering if there is always a regular
> expression that can generated the difference between the two sets,
> i.e., A-B. If there is such regular expression, is there a mechanical
> way to generated such regular expression.
Yes, the difference between two regular sets is a regular set. Here is a
mechanical way to produce it:
1- Transform each of regular expressions A and B into automatons AUTA
and AUTB.
2- Create the following NFA:
StartState
|
+--(epsilon)--AUTA
|
\--(epsilon)--AUTB
3- Transform the NFA into a DFA, but determine its accepting state as
follows: A state is an accepting state if and only if it corresponds to
accepting state of AUTA and not to an accepting state of AUTB.
4- You can then use well-known DFA-to-regular-expression transformations
to get a (possibly ugly) equivalent regular expression.
It should be fairly obvious that this is theoretically sound. We are
simply computing a DFA that walks A and B in parallel, and that accepts
all elements of A that are not elements of B.
The beta version of SableCC 4.0 ( http://sablecc.org/ ) implements this
difference (called Except) operator for lexer specifications. You would
express it as follows:
Language my_test_language;
Lexer
// regular definitions
a = Any+;
b = 'Etienne';
a_difference_b = a Except b;
Token
a_difference_b; // list of tokens to be recognized by lexer
Etienne
--
Etienne M. Gagnon, Ph.D.
SableCC: http://sablecc.org
Return to the
comp.compilers page.
Search the
comp.compilers archives again.