Looking for a DFA implementation (please help)

"mike" <mvp9@cornell.edu>
8 Nov 2002 11:01:07 -0500

          From comp.compilers

Related articles
Looking for a DFA implementation (please help) mvp9@cornell.edu (mike) (2002-11-08)
Re: Looking for a DFA implementation (please help) mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2002-11-12)
Re: Looking for a DFA implementation (please help) clint@0lsen.net (Clint Olsen) (2002-11-12)
Re: Looking for a DFA implementation (please help) buraq@lampsun1.epfl.ch (Burak Emir) (2002-12-01)
| List of all articles for this month |

From: "mike" <mvp9@cornell.edu>
Newsgroups: comp.compilers
Date: 8 Nov 2002 11:01:07 -0500
Organization: Cornell University
Keywords: lex, question, comment
Posted-Date: 08 Nov 2002 11:01:07 EST

Hi,
    I'm working on an AI project and one part of it requires an
implementation of fast string matching.
Specifically I need a DFA (or an analog) that can match a string against
a set of others and give a
different output depending on which string it matched (not just that a
match was found)
for example, i have a set of strings
xoo?.?$ (1)
?$?x.? (2)
o$?.. (3)
(where ? [xo.] and $ is [xo])
and I need to if another string C, for example 'xxo?' matches anything
in the set, and if so, what is the number of the string that it matches


There's no complicated back matching, etc, and in fact the tested string
C can only match in the beginning, like saying "^xxo[xo.]".
I've been using such a DFA I got out of GnuGo, but i have problems with
it. I looked around, but all cases/packages of pattern matching i found
(regex, rx, regexpp) seem to only support building a dfa for
1 string at a time and offer boolean return values.


If anyone knows of a good implementation of above please email me.


considerations for the DFA use in my project:
+a dfa is built at one time for a given set of strings, that is, once it
is built, i never add new strings to it, nor remove anything from it.


+However, the speed of both DFA creation and matching is of utmost
importance, for the string matching is the most frequently performed
operation in my project. DFA creation also occurs quite frequently and
must be fast.


+string length stays roughly below 100, and mostly ranges below 50.


+The set of strings can be as large as several hundred.


+Space is not an issue, ie there is only 2 such dfas at any given time,
and there is comparatively little other memory used.


I realize something like this is not terribly complicated, but it is not
topically relevant to my project, and so I cannot afford to spend time
implementing and debugging it myself.
I know such implementations exist (for example, lexer generator, but it
is way to large and slow for this), but I just cannot find one.


Please let me know if you have come across an open source implementation
of this; I will greatly appreciate it.
Mike
[I suggested extracting the useful bits from flex. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.