compiling case insensitive regular expressions

"Armel" <armelasselin@hotmail.com>
Mon, 1 Nov 2010 22:17:43 +0100

From comp.compilers

Related articles
*compiling case insensitive regular expressions armelasselin@hotmail.com (Armel)* (2010-11-01)**
Re: compiling case insensitive regular expressions gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-11-03)
Re: compiling case insensitive regular expressions benhanson2@icqmail.com (2010-11-03)
Re: compiling case insensitive regular expressions armelasselin@hotmail.com (Armel) (2010-11-04)
Re: compiling case insensitive regular expressions rsc@swtch.com (Russ Cox) (2010-11-04)
Re: compiling case insensitive regular expressions gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-11-05)
Re: compiling case insensitive regular expressions cr88192@hotmail.com (BGB) (2010-11-06)

| List of all articles for this month |

From:	"Armel" <armelasselin@hotmail.com>
Newsgroups:	comp.compilers
Date:	Mon, 1 Nov 2010 22:17:43 +0100
Organization:	les newsgroups par Orange
Keywords:	lex, question
Posted-Date:	02 Nov 2010 17:33:12 EDT

Hello,

I need to compile regular expressions which are case insensitive,
there are two cases, the part which must be matched case insensitvely
might be just a portion but it can be the entire RE as well. The RE
will be Unicode enabled and must be compiled to AFD.

I am wondering what is the "best practice" in this field, is it
generally more efficient to precompute for each RE its "case
insensitive RE" (by replacing each symbol by a ORed version of it,
i.e; hello becomes (h|H)(e|E)(l|L)(l|L)(o|O)..) and compile that in
place of the original or is it as good to simply replace the input
symbols by their lowercase version and compile a "lowercase only"
version of the RE?

any idea?

Best regards
Armel
[I doubt it would make much difference. If you build the case folding into
the RE, the tables are bigger, if you do it at runtime, the code path is
slightly longer but the overall program is likely to be a little smaller.
-John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

compiling case insensitive regular expressions

"Armel" <armelasselin@hotmail.com>Mon, 1 Nov 2010 22:17:43 +0100

"Armel" <armelasselin@hotmail.com>
Mon, 1 Nov 2010 22:17:43 +0100