Re: Generate text, given a regex

m.helvensteijn@gmail.com
Thu, 27 Mar 2008 00:49:58 +0100

          From comp.compilers

Related articles
Generate text, given a regex midhatali@gmail.com (Midhat) (2008-03-23)
Re: Generate text, given a regex m.helvensteijn@gmail.com (2008-03-27)
Re: Generate text, given a regex gene.ressler@gmail.com (Gene) (2008-03-26)
Generate text, given a regex domenico.bianculli@lu.unisi.ch (Domenico Bianculli) (2008-03-27)
Re: Generate text, given a regex gene.ressler@gmail.com (Gene) (2008-04-11)
Re: Generate text, given a regex rsc@swtch.com (Russ Cox) (2008-04-11)
| List of all articles for this month |

From: m.helvensteijn@gmail.com
Newsgroups: comp.compilers
Date: Thu, 27 Mar 2008 00:49:58 +0100
Organization: Wanadoo
References: 08-03-095
Keywords: lex
Posted-Date: 26 Mar 2008 23:09:29 EDT

Midhat wrote:


> Hi. I want to generate text based on a given regex. just any text that
> satisifies the regex. Are there any existing tools/libraries to do
> that.


I don't know any tools/libraries, but it seems like this problem isn't so
difficult.


* Create a syntax tree of the regex.


* Generate text (T(r)) for the leafs of the tree (r). The atomic regexes:
    - If r = a character, T(r) = that character
    - If r = a character class, T(r) = any character satisfying that class
    - If r = . , T(r) = any character (except perhaps \n)


* Recursively generate text (T(R)) for the non-atomic sub-regexes (R) of the
tree, assuming text has already been generated for all the immediate
sub-regexes of R:
    - If R = x*, T(R) = a concatenation of any number of T(x).
    - If R = x+, T(R) = a concatenation of 1 or more of T(x).
    - If R = x?, T(R) = the empty string or T(x).
    - If R = xy, T(R) = the concatenation of T(x) and T(y).
    - If R = x|y, T(R) = T(x) or T(y)
    - If R = (x), T(R) = T(x)


Any choices you make here will lead to T(regex), the text you want. Of
course, if any text will do, you can immediately return empty strings for
x* and x?, without processing x.


And I am assuming here that we are talking about regular expressions in the
classical sense. So without recalling previously captured sub-expressions
and such. However, if you want to have those anyway, there are still
relatively easy adaptations you can make to this algorithm to make it work.


Good luck!


--
Michiel Helvensteijn


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.