|How to do this odd kind of regex match? email@example.com (Tony Finch) (2002-07-15)|
|Re: How to do this odd kind of regex match? firstname.lastname@example.org (Michael Parker) (2002-07-21)|
|Re: How to do this odd kind of regex match? email@example.com (Joachim Durchholz) (2002-07-21)|
|Re: How to do this odd kind of regex match? Martin.Ward@durham.ac.uk (Martin Ward) (2002-07-21)|
|Re: How to do this odd kind of regex match? firstname.lastname@example.org (Simon Cozens) (2002-07-24)|
|From:||"Michael Parker" <email@example.com>|
|Date:||21 Jul 2002 01:51:34 -0400|
|Organization:||EarthLink Inc. -- http://www.EarthLink.net|
|Posted-Date:||21 Jul 2002 01:51:34 EDT|
Tony Finch wrote:
> Can anyone point me in the direction of literature on matching regexes
> with capturing brackets (as in sed) efficiently, i.e. without using a
> backtracking NFA. Is it possible to do with a DFA?
The usual way to do this is to scan with a DFA until you find a match,
then run the NFA over that to snag the submatches. There are also
some very quick NFA's out there that substantially minimise
backtracking and other forms of book-keeping. I wrote one in Common
Lisp that is an order of magnitude or more faster than the GNU regex
engine, and presumably a C implentation would be several times faster
> I'd also like to be able to match several regexes against the same
> text in parallel, preferably without running N regex engines in
> parallel. This is different from the standard tokenization problem
> because the several matches may overlap rather than occurring one
> after the other. Would it be useful to think of this as
> (re1)|(re2)|(re3) with capturing brackets, all of which capture if
yes. A DFA isn't really sensitive to regex complexity the way a NFA
is -- it's one state transition per input character, no matter what.
The table sizes can get pretty big, though.
> (The aim is to speed up heuristic spam detection such as SpamAssassin.)
Return to the
Search the comp.compilers archives again.