27 May 1999 23:28:12 -0400

Related articles |
---|

Generating strings from regular expressions kgatlin@cs.ucsd.edu (KSG) (1999-05-24) |

Re: Generating strings from regular expressions Helmut.Richter@lrz-muenchen.de (1999-05-27) |

Re: Generating strings from regular expressions ucapjab@ucl.ac.uk (Jonathan Barker) (1999-05-27) |

Re: Generating strings from regular expressions std10381@moritz.dial.techfak.uni-kiel.de (Carsten Fritz) (1999-05-27) |

Re: Generating strings from regular expressions djello@well.com (1999-05-27) |

From: | djello@well.com (Darius Bacon) |

Newsgroups: | comp.compilers,comp.theory |

Date: | 27 May 1999 23:28:12 -0400 |

Organization: | Whole Earth Networks News |

Distribution: | inet |

References: | 99-05-119 99-05-123 |

Keywords: | lex |

*>KSG <kgatlin@cs.ucsd.edu> writes:*

*>>Does anyone know of any code or an algorithm that does the following:*

*>>input: a regular expression, R, and an integer k.*

*>>output: a list of all strings of length k that are in R*

I'd try a bounded-depth traversal of the DFA for R (where the depth

bound is k).

define traverse(state, prefix, bound):

if 0 = bound and accepting(state):

output prefix

else if 0 < bound:

for arc in outgoing_arcs(state):

traverse(head(arc), prefix ++ [label(arc)], bound - 1)

traverse(start_state(dfa_of_regex(R)), [], k)

where [foo, ...] is a list, ++ appends lists, and the rest should be

obvious. The code could be tuned to use a stack instead of a list,

and to avoid traversing state/bound pairs more than once.

You could instead do a simple traversal without the bound, if you

first intersect R with the regular expression .^k (that is, the regex

with k successive arbitrary characters). I'm not sure whether that's

worthwhile -- I haven't tested this.

--

Darius Bacon http://www.well.com/~djello

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.