8 Jun 2005 22:35:52 -0400

Related articles |
---|

regular expression question gvheurn@gmail.com (Gijs) (2005-06-08) |

Re: regular expression question gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-06-08) |

Re: regular expression question 148f3wg02@sneakemail.com (Karsten Nyblad) (2005-06-08) |

Re: regular expression question daw@taverner.cs.berkeley.edu (2005-06-08) |

Re: regular expression question gvheurn@gmail.com (Gijs) (2005-06-09) |

Re: regular expression question nicola.musatti@gmail.com (Nicola Musatti) (2005-06-09) |

Re: regular expression question cfc@shell01.TheWorld.com (Chris F Clark) (2005-06-09) |

Re: regular expression question snicol@apk.net (Scott Nicol) (2005-06-10) |

Re: regular expression question snicol@apk.net (Scott Nicol) (2005-06-10) |

Re: regular expression question d148f3wg02@sneakemail.com (Karsten Nyblad) (2005-06-10) |

[3 later articles] |

From: | daw@taverner.cs.berkeley.edu (David Wagner) |

Newsgroups: | comp.compilers |

Date: | 8 Jun 2005 22:35:52 -0400 |

Organization: | University of California, Berkeley |

References: | 05-06-045 |

Keywords: | lex, DFA |

Gijs wrote:

*>So for example I want to have a RE that matches all strings except for*

*>the string "hello". How do you do this?*

Build a DFA M that accepts "hello"; build a complement DFA M' (which

accepts everything other than "hello"); convert M' to a RE.

Given a DFA M = (S, \Sigma, ->, q_0, F) with statespace S, alphabet

\Sigma, transition relation -> \subseteq S x \Sigma x S, initial state

q_0, and set F \subseteq S of final states, the complement DFA is given

by M' = (S, \Sigma, ->, q_0, S \ F). In other words, accepting states

become rejecting states, and vice versa.

For example, here is the DFA M that accepts "hello":

h e l l o

-> q0 -----> q1 -----> q2 -----> q3 -----> q4 -----> q5

\ \ \ \ \ \

`---------`---------`---------`---------`---------`--> q6

where q5 is the only accepting state; every state qi has an edge

qi -> q6 for every other letter (e.g., q0 -> q6 on every letter other

than 'h'); and q6 has a self-loop q6 -> q6 on every letter.

M = ({q0,..,q6}, {'a',..,'z'}, ->, q0, {q5}).

Here is the complement DFA M':

h e l l o

-> q0 -----> q1 -----> q2 -----> q3 -----> q4 -----> q5

\ \ \ \ \ \

`---------`---------`---------`---------`---------`--> q6

M' = ({q0,..,q6}, {'a',..,'z'}, ->, q0, {q0,q1,q2,q3,q4,q6}).

Now you use standard methods to convert M' to a RE, i.e., to construct

a RE that accepts the same language M' accepts. I'm too lazy to do the

computation, but there are standard algorithms for this task. I'm guessing

you'll get something like this:

\epsilon | h | he | hel | hell |

( (([^h])|(h[^e])|(he[^l])|(hel[^l])|(hell[^o])|(hello[a-z])) [a-z]* )

Does that look right?

*>[The complement of a RE is usually not a RE.]*

Really? I thought the complement of a RE can always be expressed

as a RE. In particular, I thought that a RE accepts a regular language,

and the complement of any regular language is regular, and that any

regular language is accepted by some RE. Or, to put it another way,

every RE is equivalent to some DFA, and vice versa -- and given any

DFA, you can compute a complement DFA (by the procedure listed above).

I can imagine you might get an exponential blowup in the size of the

resulting RE, though.

Or are we talking about some non-standard type of regexp, such as

Perl regexps (which accept non-regular languages, in general)?

[Mea culpa. -John]

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.