Related articles |
---|
regular expression generation rpboland@gmail.com (Ralph Boland) (2010-01-31) |
Re: regular expression generation DrDiettrich1@aol.com (Hans-Peter Diettrich) (2010-02-02) |
Re: regular expression generation ott@mirix.org (Matthias-Christian Ott) (2010-02-02) |
From: | Hans-Peter Diettrich <DrDiettrich1@aol.com> |
Newsgroups: | comp.compilers |
Date: | Tue, 02 Feb 2010 09:03:37 +0100 |
Organization: | Compilers Central |
References: | 10-02-007 |
Keywords: | lex |
Posted-Date: | 02 Feb 2010 23:08:20 EST |
Ralph Boland schrieb:
> I am building a finite state machine generator.
> It currently has the ability to generate random
> regular expressions for testing the engine
> but the generation of random character classes
> (i.e. expressions of the form [ab-ew-zA-CD;@&])
> is not supported.
> I was wondering if anyone has done this and can
> give me ideas on the best way to do so.
I don't understand what your particular problem is, it looks to me
more like establishing rules for creating well formed (nice looking)
expressions - that's up to you.
For nice looking classes you can use predefined character classes
(digits, lower/upper case characters, punctuators...), and randomly
merge subsets of these classes into a new character class.
For handling character classes, Delphi/Pascal has Sets of up to 256
elements, i.e. covering single-byte character codes. The
implementation uses an array of bits, indexed by e.g. the character
byte code. This makes the test of membership of a character (byte) in
an given set almost one machine insctruction. For random selection
from a character class, a string (array of characters) containing all
characters will be faster, using an random index to pick one of the
stored codes. An implementation of a character class may contain and
use both representations internally.
Creating fully random character classes can be implemented by a random
set size first, resulting in the accordingly dimensioned character
array, which then is filled with random character values. Duplicates
can be omitted by according checks, or the code can be incremented on
a collission when character clusters are preferred. Or you create
random pairs of a starting S and ending E character for ranges, where
an S<E will represent an range <S>-<E>, otherwise a single character
<S>.
Finally the created (any) character class can be translated into the
according expression syntax.
DoDi
Return to the
comp.compilers page.
Search the
comp.compilers archives again.