Related articles |
---|
regular expression question gvheurn@gmail.com (Gijs) (2005-06-08) |
Re: regular expression question gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-06-08) |
Re: regular expression question 148f3wg02@sneakemail.com (Karsten Nyblad) (2005-06-08) |
Re: regular expression question daw@taverner.cs.berkeley.edu (2005-06-08) |
Re: regular expression question gvheurn@gmail.com (Gijs) (2005-06-09) |
Re: regular expression question nicola.musatti@gmail.com (Nicola Musatti) (2005-06-09) |
Re: regular expression question cfc@shell01.TheWorld.com (Chris F Clark) (2005-06-09) |
Re: regular expression question snicol@apk.net (Scott Nicol) (2005-06-10) |
Re: regular expression question snicol@apk.net (Scott Nicol) (2005-06-10) |
Re: regular expression question d148f3wg02@sneakemail.com (Karsten Nyblad) (2005-06-10) |
[3 later articles] |
From: | daw@taverner.cs.berkeley.edu (David Wagner) |
Newsgroups: | comp.compilers |
Date: | 8 Jun 2005 22:35:52 -0400 |
Organization: | University of California, Berkeley |
References: | 05-06-045 |
Keywords: | lex, DFA |
Posted-Date: | 08 Jun 2005 22:35:52 EDT |
Gijs wrote:
>So for example I want to have a RE that matches all strings except for
>the string "hello". How do you do this?
Build a DFA M that accepts "hello"; build a complement DFA M' (which
accepts everything other than "hello"); convert M' to a RE.
Given a DFA M = (S, \Sigma, ->, q_0, F) with statespace S, alphabet
\Sigma, transition relation -> \subseteq S x \Sigma x S, initial state
q_0, and set F \subseteq S of final states, the complement DFA is given
by M' = (S, \Sigma, ->, q_0, S \ F). In other words, accepting states
become rejecting states, and vice versa.
For example, here is the DFA M that accepts "hello":
h e l l o
-> q0 -----> q1 -----> q2 -----> q3 -----> q4 -----> q5
\ \ \ \ \ \
`---------`---------`---------`---------`---------`--> q6
where q5 is the only accepting state; every state qi has an edge
qi -> q6 for every other letter (e.g., q0 -> q6 on every letter other
than 'h'); and q6 has a self-loop q6 -> q6 on every letter.
M = ({q0,..,q6}, {'a',..,'z'}, ->, q0, {q5}).
Here is the complement DFA M':
h e l l o
-> q0 -----> q1 -----> q2 -----> q3 -----> q4 -----> q5
\ \ \ \ \ \
`---------`---------`---------`---------`---------`--> q6
M' = ({q0,..,q6}, {'a',..,'z'}, ->, q0, {q0,q1,q2,q3,q4,q6}).
Now you use standard methods to convert M' to a RE, i.e., to construct
a RE that accepts the same language M' accepts. I'm too lazy to do the
computation, but there are standard algorithms for this task. I'm guessing
you'll get something like this:
\epsilon | h | he | hel | hell |
( (([^h])|(h[^e])|(he[^l])|(hel[^l])|(hell[^o])|(hello[a-z])) [a-z]* )
Does that look right?
>[The complement of a RE is usually not a RE.]
Really? I thought the complement of a RE can always be expressed
as a RE. In particular, I thought that a RE accepts a regular language,
and the complement of any regular language is regular, and that any
regular language is accepted by some RE. Or, to put it another way,
every RE is equivalent to some DFA, and vice versa -- and given any
DFA, you can compute a complement DFA (by the procedure listed above).
I can imagine you might get an exponential blowup in the size of the
resulting RE, though.
Or are we talking about some non-standard type of regexp, such as
Perl regexps (which accept non-regular languages, in general)?
[Mea culpa. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.