How to do this odd kind of regex match?

"Tony Finch" <dot@dotat.at>
15 Jul 2002 23:43:23 -0400

          From comp.compilers

Related articles
How to do this odd kind of regex match? dot@dotat.at (Tony Finch) (2002-07-15)
Re: How to do this odd kind of regex match? michaelparker@earthlink.net (Michael Parker) (2002-07-21)
Re: How to do this odd kind of regex match? joachim_d@gmx.de (Joachim Durchholz) (2002-07-21)
Re: How to do this odd kind of regex match? Martin.Ward@durham.ac.uk (Martin Ward) (2002-07-21)
Re: How to do this odd kind of regex match? simon.cozens@computing-services.oxford.ac.uk (Simon Cozens) (2002-07-24)
| List of all articles for this month |

From: "Tony Finch" <dot@dotat.at>
Newsgroups: comp.compilers
Date: 15 Jul 2002 23:43:23 -0400
Organization: dotat labs
Keywords: lex, question
Posted-Date: 15 Jul 2002 23:43:23 EDT

Can anyone point me in the direction of literature on matching regexes
with capturing brackets (as in sed) efficiently, i.e. without using a
backtracking NFA. Is it possible to do with a DFA?


I'd also like to be able to match several regexes against the same
text in parallel, preferably without running N regex engines in
parallel. This is different from the standard tokenization problem
because the several matches may overlap rather than occurring one
after the other. Would it be useful to think of this as
(re1)|(re2)|(re3) with capturing brackets, all of which capture if
possible?


(The aim is to speed up heuristic spam detection such as SpamAssassin.)


Tony.
--
f.a.n.finch <dot@dotat.at> http://dotat.at/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.