Related articles |
---|
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05) |
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06) |
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07) |
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13) |
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16) |
Re: How to rewrite a regexp without word boundaries? dot@dotat.at (Tony Finch) (2009-08-16) |
From: | hu47121@usenet.kitty.sub.org (Hannah Schroeter) |
Newsgroups: | comp.compilers,comp.theory |
Date: | Sun, 16 Aug 2009 00:37:24 +0000 (UTC) |
Organization: | The Pond |
References: | 09-07-003 09-07-004 09-07-008 09-07-011 |
Keywords: | lex |
Posted-Date: | 15 Aug 2009 22:48:47 EDT |
Hello!
Andrew Tomazos <andrew@tomazos.com> wrote:
>[...]
>Also one thing to note is that in formal language theory "regular
>expression" has a well-defined meaning. See Chomsky. Perl's regular
>expressions do *not* classify as regular expressions under the formal
>definition.
> -Andrew.
>[Perl's regex engine is swell, but if performance is an issue it's nowhere
>near as fast as a DFA generated by flex or re2c. -John]
IIRC perl's regex implementation (as well as glibc, btw) even lose
compared to *NFA emulation*, for example with a?^na^n, matched against
a^n (where "^n" is manually expanded to mean n times, i.e. with n=3
it'd be the regular expression a?a?a?aaa against string aaa). Java
loses, too (been there, tested it). OpenBSD's libc doesn't (seems to
be plain vanilla NFA emulation when it sees the expression is in fact
regular; yields the expected, roughly quadratic runtime behaviour).
See http://swtch.com/~rsc/regexp/regexp1.html for information about that
problem.
I also tested the same thing with flex, using a small script that
generates flex input and a test program for a given n. Horrendous
turnaround times (flex generation mostly) for increasing n, blazingly
fast match times, both as expected.
Kind regards,
Hannah.
[Spamassassin has an option to compile its patterns with re2c and link that into
perl. It takes the better part of an hour to compile, but it runs really fast. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.