Related articles |
---|
How to do this odd kind of regex match? dot@dotat.at (Tony Finch) (2002-07-15) |
Re: How to do this odd kind of regex match? michaelparker@earthlink.net (Michael Parker) (2002-07-21) |
Re: How to do this odd kind of regex match? joachim_d@gmx.de (Joachim Durchholz) (2002-07-21) |
Re: How to do this odd kind of regex match? Martin.Ward@durham.ac.uk (Martin Ward) (2002-07-21) |
Re: How to do this odd kind of regex match? simon.cozens@computing-services.oxford.ac.uk (Simon Cozens) (2002-07-24) |
From: | "Michael Parker" <michaelparker@earthlink.net> |
Newsgroups: | comp.compilers |
Date: | 21 Jul 2002 01:51:34 -0400 |
Organization: | EarthLink Inc. -- http://www.EarthLink.net |
References: | 02-07-041 |
Keywords: | lex |
Posted-Date: | 21 Jul 2002 01:51:34 EDT |
Tony Finch wrote:
> Can anyone point me in the direction of literature on matching regexes
> with capturing brackets (as in sed) efficiently, i.e. without using a
> backtracking NFA. Is it possible to do with a DFA?
The usual way to do this is to scan with a DFA until you find a match,
then run the NFA over that to snag the submatches. There are also
some very quick NFA's out there that substantially minimise
backtracking and other forms of book-keeping. I wrote one in Common
Lisp that is an order of magnitude or more faster than the GNU regex
engine, and presumably a C implentation would be several times faster
still.
> I'd also like to be able to match several regexes against the same
> text in parallel, preferably without running N regex engines in
> parallel. This is different from the standard tokenization problem
> because the several matches may overlap rather than occurring one
> after the other. Would it be useful to think of this as
> (re1)|(re2)|(re3) with capturing brackets, all of which capture if
> possible?
yes. A DFA isn't really sensitive to regex complexity the way a NFA
is -- it's one state transition per input character, no matter what.
The table sizes can get pretty big, though.
> (The aim is to speed up heuristic spam detection such as SpamAssassin.)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.