Re: regular expression question (David Wagner)
8 Jun 2005 22:35:52 -0400

          From comp.compilers

Related articles
regular expression question (Gijs) (2005-06-08)
Re: regular expression question (glen herrmannsfeldt) (2005-06-08)
Re: regular expression question (Karsten Nyblad) (2005-06-08)
Re: regular expression question (2005-06-08)
Re: regular expression question (Gijs) (2005-06-09)
Re: regular expression question (Nicola Musatti) (2005-06-09)
Re: regular expression question (Chris F Clark) (2005-06-09)
Re: regular expression question (Scott Nicol) (2005-06-10)
Re: regular expression question (Scott Nicol) (2005-06-10)
Re: regular expression question (Karsten Nyblad) (2005-06-10)
[3 later articles]
| List of all articles for this month |

From: (David Wagner)
Newsgroups: comp.compilers
Date: 8 Jun 2005 22:35:52 -0400
Organization: University of California, Berkeley
References: 05-06-045
Keywords: lex, DFA
Posted-Date: 08 Jun 2005 22:35:52 EDT

Gijs wrote:
>So for example I want to have a RE that matches all strings except for
>the string "hello". How do you do this?

Build a DFA M that accepts "hello"; build a complement DFA M' (which
accepts everything other than "hello"); convert M' to a RE.

Given a DFA M = (S, \Sigma, ->, q_0, F) with statespace S, alphabet
\Sigma, transition relation -> \subseteq S x \Sigma x S, initial state
q_0, and set F \subseteq S of final states, the complement DFA is given
by M' = (S, \Sigma, ->, q_0, S \ F). In other words, accepting states
become rejecting states, and vice versa.

For example, here is the DFA M that accepts "hello":
                  h e l l o
  -> q0 -----> q1 -----> q2 -----> q3 -----> q4 -----> q5
            \ \ \ \ \ \
              `---------`---------`---------`---------`---------`--> q6
where q5 is the only accepting state; every state qi has an edge
qi -> q6 for every other letter (e.g., q0 -> q6 on every letter other
than 'h'); and q6 has a self-loop q6 -> q6 on every letter.
M = ({q0,..,q6}, {'a',..,'z'}, ->, q0, {q5}).

Here is the complement DFA M':
                  h e l l o
  -> q0 -----> q1 -----> q2 -----> q3 -----> q4 -----> q5
            \ \ \ \ \ \
              `---------`---------`---------`---------`---------`--> q6
M' = ({q0,..,q6}, {'a',..,'z'}, ->, q0, {q0,q1,q2,q3,q4,q6}).

Now you use standard methods to convert M' to a RE, i.e., to construct
a RE that accepts the same language M' accepts. I'm too lazy to do the
computation, but there are standard algorithms for this task. I'm guessing
you'll get something like this:
        \epsilon | h | he | hel | hell |
            ( (([^h])|(h[^e])|(he[^l])|(hel[^l])|(hell[^o])|(hello[a-z])) [a-z]* )
Does that look right?

>[The complement of a RE is usually not a RE.]

Really? I thought the complement of a RE can always be expressed
as a RE. In particular, I thought that a RE accepts a regular language,
and the complement of any regular language is regular, and that any
regular language is accepted by some RE. Or, to put it another way,
every RE is equivalent to some DFA, and vice versa -- and given any
DFA, you can compute a complement DFA (by the procedure listed above).
I can imagine you might get an exponential blowup in the size of the
resulting RE, though.

Or are we talking about some non-standard type of regexp, such as
Perl regexps (which accept non-regular languages, in general)?
[Mea culpa. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.