Re: Looking for a DFA implementation (please help)

"Burak Emir" <buraq@lampsun1.epfl.ch>
1 Dec 2002 22:39:44 -0500

          From comp.compilers

Related articles
Looking for a DFA implementation (please help) mvp9@cornell.edu (mike) (2002-11-08)
Re: Looking for a DFA implementation (please help) mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2002-11-12)
Re: Looking for a DFA implementation (please help) clint@0lsen.net (Clint Olsen) (2002-11-12)
Re: Looking for a DFA implementation (please help) buraq@lampsun1.epfl.ch (Burak Emir) (2002-12-01)
| List of all articles for this month |

From: "Burak Emir" <buraq@lampsun1.epfl.ch>
Newsgroups: comp.compilers
Date: 1 Dec 2002 22:39:44 -0500
Organization: EPFL
References: 02-11-039
Keywords: lex
Posted-Date: 01 Dec 2002 22:39:44 EST

Hello,


mike wrote:


> Hi,
> I'm working on an AI project and one part of it requires an
> implementation of fast string matching.
> Specifically I need a DFA (or an analog) that can match a string against
> a set of others and give a
> different output depending on which string it matched (not just that a
> match was found)


- - -


Here is a theoretical consideration: Imagine a traditional DFA. Its
set of states is actually split in two, namely final and non-final
states, attaching a binary piece of information to each state.


What is needed to generalize this to more than 2 cases is nothing but
including integer instead of binary tags :
we replace { "is final (0)" / "is non-final (1)" } with
{ does not "match (0)", "matches (1)", "matches(2)", ..., "matches(n)" }


You could then take the tagged union of your regular expressions (or
simple strings) and - voila - have efficient pattern matching. For
implementing, the AMoRE (Automata, Monoids, and regular expressions)
library might be useful ( amore.sourceforge.net ), it has fast
implementations of many automata algorithms in C, and it is quite easy
to locate the flag in the data structure.


Unfortunately, you will need to change (or rewrite) two procedure to
respect the "tags":
1. the one that produces the union of two regular languages
2. the one that minimizes / produces the DFA from the NFA


I am around for support if you want to go for it : )


Note that no matter how you do this, you will have to set up a policy
to specify, which one of the cases is preferred if they overlap, e.g.
"first match". Few libraries allow you to check language inclusion to
avoid overlaps throughout. AMoRE does provide this, however it is
somewhat operating on an abstract level (all automata work on
*integers* instead of characters, you will need to translate your
input ...)


Hope this helps.
Burak


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.