|Regular expression search algorithm email@example.com (Harry) (2002-07-15)|
|Re: Regular expression search algorithm firstname.lastname@example.org (Joachim Durchholz) (2002-07-21)|
|Re: Regular expression search algorithm email@example.com (Ralph Corderoy) (2002-07-21)|
|Re: Regular expression search algorithm firstname.lastname@example.org (Peter S Tillier) (2002-07-24)|
|Re: Regular expression search algorithm email@example.com (Aharon Robbins) (2002-09-03)|
|regular expression search algorithm firstname.lastname@example.org (1993-03-18)|
|Re: regular expression search algorithm email@example.com (1993-03-19)|
|From:||"Aharon Robbins" <firstname.lastname@example.org>|
|Date:||3 Sep 2002 00:32:34 -0400|
|Organization:||Pioneer Consulting, Ltd.|
|References:||02-07-036 02-07-069 02-07-102|
|Posted-Date:||03 Sep 2002 00:32:34 EDT|
Peter S Tillier <email@example.com> wrote:
> awk, egrep and others usually use DFAs, which tend to be slower to compile
> and faster to execute than NFAs, but don't support back references.
> gawk uses both - DFA most of the time, but NFA when back references
> are needed.
Gawk doesn't do back references. (Regexes of the form (abc)foo\1 )
They're not in the awk language.
Gawk uses dfa matching only for the cases of:
x ~ y
x !~ y
Other matches, such as split(), match(), and regex-based field and record
splitting, require knowing the start and end position of the match,
which the dfa matcher doesn't provide.
The dfa code is that from GNU grep/egrep, and the NFA code is GNU regex.
(Which I think does NFA stuff deep down under the hood, but as the code
is A Big Ball Of Mud, I'm not sure. I treat it strictly as a black box,
and I loathe the interface.)
Fortunately, there's improvement on the horizon. GNU Libc now has a
written-from-scratch version of the GNU regex routines that attempt to
use a dfa-based algorithm where possible, and which falls back to an
NFA algorithm when not. This code can be gotten from the glibc public
CVS archive at sources.redhat.com.
In the in-testing version of gawk I've removed the dfa code entirely and
now use the new version of regex. (The code in gawk is now cleaner as
a result, which is a pleasant side-effect.) Said version is currently
undergoing the trial-by-fire of portability to different Unix systems,
non-Unix systems, and various compilers. So it's getting closer to
being ready for prime time, but isn't quite there yet.
For what it's worth. :-)
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd. firstname.lastname@example.org
P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 928 569 9018
Nof Ayalon Cell Phone: +972 51 297-545
D.N. Shimshon 99785 ISRAEL
Return to the
Search the comp.compilers archives again.