Re: Regexps from shell wilcards

henry@zoo.toronto.edu (Henry Spencer)
Mon, 5 Apr 1993 20:57:57 GMT

          From comp.compilers

Related articles
Regexps from shell wilcards colas@opossum.inria.fr (1993-04-02)
Re: Regexps from shell wilcards henry@zoo.toronto.edu (1993-04-05)
| List of all articles for this month |
Newsgroups: comp.compilers
From: henry@zoo.toronto.edu (Henry Spencer)
Keywords: lex
Organization: U of Toronto Zoology
References: 93-04-012
Date: Mon, 5 Apr 1993 20:57:57 GMT

colas@opossum.inria.fr (Colas Nahaboo) writes:
>Is there an algorithm to convert shell-expressions into regular
>expressions? (i.e. generate the string ".*[.]c" from the input "*.c")


The mapping is fairly trivial, but depends on the exact shell syntax you
are interested in. In general, all the constructs are present in both
forms, and you can just map construct-by-construct, but you have to watch
details. For example, mapping shell "*" to regexp ".*" is wrong, because
shell "*" does not match "/". If you write down the exact rules for the
shell syntax you're using, transforming it to regular expressions is
typically easy.


>In the same vein, is there an algorithm to generate case-independent
>regular expressions from nomal ones? (i.e. generate the string
>"[aA][bB][cC][eEfFgG]*" from the input "abc[efg]*")


Again, the real question is defining what you mean by "case-independent
regular expression". It's not trivial; does [^x]y match Xy? As I recall,
those of us on the POSIX.2 regular-expressions working group noticed this
question too late, and the standard as shipped will be rather vague on the
subject. Our informal conclusion, which we hope will make it into an
eventual tidying-up of the standard, was that the right way for
case-independent regular expressions to act is based on a model in which
case distinctions vanish from the alphabet. You can't take that too
literally or complexities arise, but it's a good guide. So no,
case-independent [^x]y does not match Xy, because the [^x] covers all
kinds of X's, be they uppercase or lowercase.


Again, once you have defined what you're talking about, implementation
is easy. For the case-distinctions-vanish model, any literal letter x
becomes [xX], and the contents of a bracket expression [xyz] are augmented
with any case counterparts of the things in it, giving [xyzXYZ]. The hard
thing to do in a portable way, actually, is to find out which characters
have case counterparts and what they are.
--
Henry Spencer @ U of Toronto Zoology, henry@zoo.toronto.edu utzoo!henry
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.