Re: Generating strings from regular expressions

"Jonathan Barker" <ucapjab@ucl.ac.uk>
27 May 1999 23:23:37 -0400

From comp.compilers

Related articles
Generating strings from regular expressions kgatlin@cs.ucsd.edu (KSG) (1999-05-24)
Re: Generating strings from regular expressions Helmut.Richter@lrz-muenchen.de (1999-05-27)
*Re: Generating strings from regular expressions ucapjab@ucl.ac.uk (Jonathan Barker)* (1999-05-27)**
Re: Generating strings from regular expressions std10381@moritz.dial.techfak.uni-kiel.de (Carsten Fritz) (1999-05-27)
Re: Generating strings from regular expressions djello@well.com (1999-05-27)

| List of all articles for this month |

From:	"Jonathan Barker" <ucapjab@ucl.ac.uk>
Newsgroups:	comp.compilers,comp.theory
Date:	27 May 1999 23:23:37 -0400
Organization:	Deja.com - Share what you know. Learn what you don't.
Distribution:	inet
References:	99-05-119
Keywords:	lex

    KSG <kgatlin@cs.ucsd.edu> wrote:
> Does anyone know of any code or an algorithm that does the following:
>
> input: a regular expression, R, and an integer k.
> output: a list of all strings of length k that are in R
>
> I've thought about building the NFA and traversing that, but I'm not
> sure if building the NFA is really necessary. Is there a simpler way?
>
> Thanks,
>
> KSG
> [Simpler? Sure. Enumerate all the strings of length k, run them through
> the RE matcher of your choice, and list the one that match. But I will
> admit that for large k it could be kind of slow. -John]

Surely it'll be kind of slow whatever - for example the number of
matches of (a|b)* of length k is obviously 2^k so we can't expect to
do much better than exponential time (in k).

I was intrigued by this problem - here is a solution based on the
syntax tree. I am not 100% sure that it is correct but it looks ok to
me. It may not be very efficient to implement since it contains a lot
of set manipulation and recursion.

I tried implementing the algorithm in C++. I you like I'll email you
a copy of my source but I warn you now, it is pretty poor (and
probably buggy) code :-)

Regards

Jonathan

DEFINITIONS AND NOTATION

Express regular expressions as given by the
following "grammar":

        R -> CAT(R,R) // contatentation
        R -> OR(R,R) // alternation
        R -> STAR(R) // closure
        R -> CHAR(a) // symbols from alphabet

where a is taken from the alphabet A. That is,
the input to the algorithm is a syntax tree.

If a is a character from the alphabet
then let STRING(a) be the string of length
1 whose only character is a.

Let EMPTY denote the empty string. Note that
concatenation with EMPTY leaves a string
unchanged.

Define the cross product operation on sets of
strings by:

        A * B := { ab | a in A and b in B }

where ab is the concatenation of a and b. If
A and B are sets of strings, then A*B denotes
the set of all strings formed by taking a string
from A and concatenating a string from B.

For notational convenience, whenever I write
the expression:

        Union[i=1,m] f(i)

I will mean:

f(1) union f(2) union ... union f(m)

THE ALGORITHM

Inputs:
R - a regular expression (as above)
k - a positive integer

Outputs:
S - a set of strings, each of length k.

The algorithm is defined recursively in pseudo-code
as follows:

function ENUM(R,k)

// There are four cases depending on
// the form of R:

// Case 1: R represents a leaf of the tree
// carrying the character a from the alphabet

if R = CHAR(a)
{
        if k = 1 then
S = {STRING(a)}
else
S = {}
}

// Case 2: R represents an alternation node
// with left node A and right node B

else if R = OR(A,B)
{
        S = Enum(A,k) union Enum(B,k)
}

// Case 3: R represents a concatenation node
// with left node A and right node B.

else if R = CAT(A,B)
{
        S = Union[i=0,k] Enum(A,i)*Enum(B,k-i)
}

// Case 4: R represets a "star" node
// with child node A

else if R = STAR(A)
{
        if k = 0 then
S = {EMPTY}
else
S = Union[i=1,k] Enum(A,i)*Enum(R,k-i)
}

// Now S contains the result (it may be empty).

return S
end

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Generating strings from regular expressions

"Jonathan Barker" <ucapjab@ucl.ac.uk>27 May 1999 23:23:37 -0400

"Jonathan Barker" <ucapjab@ucl.ac.uk>
27 May 1999 23:23:37 -0400