Related articles |
---|
Regexps from shell wilcards colas@opossum.inria.fr (1993-04-02) |
Re: Regexps from shell wildcards imp@Boulder.ParcPlace.COM (Warner Losh) (1993-04-05) |
Re: Regexps from shell wildcards kanze@us-es.sel.de (1993-04-05) |
Re: Regexps from shell wildcards macrakis@osf.org (1993-04-05) |
Re: Regexps from shell wildcards gnb@leo.bby.com.au (1993-04-06) |
Newsgroups: | comp.compilers |
From: | gnb@leo.bby.com.au (Gregory N. Bond) |
Keywords: | lex |
Organization: | Burdett, Buckeridge & Young, Melbourne, Australia |
References: | 93-04-012 93-04-018 |
Date: | Tue, 6 Apr 1993 23:23:09 GMT |
Warner Losh <imp@Boulder.ParcPlace.COM> writes:
if you wanted to do /bin/csh shell expressions, then you'll find that
things like "*.{c,C,H,h,cf}" cause problems and cause the output string
length to grow wildly.
Worse than that, the csh {foo,bar} construct is not a file glob and
in general has semantics that cannot be duplicated with REs:
- Order is preserved, so *.{h,c} is NOT the same as *.[hc]
- Is expanded regardles of matches, so "echo {foo,bar}.c" will work
whether or not foo.c or bar.c exist.
Of course, in any one application these may not be a problem, and
more-or-less mechanical conversion to (foo|bar) might be acceptable.
Just as a hint, here is some perl code I use to convert sh-type globs
to REs in a Perl package. The input glob pattern is known to contain
no '/' characters (the handling of which is "interesting" recursion).
I make no promises about this, but it hasn't failed me yet.
# Convert shell-style glob pattern to regex
$pat =~ s/[.=<>+_\\-]/\\$&/g;
$pat =~ s/\?/./g;
$pat =~ s/\*/.*/g;
# Hide leading . from wildcards
$pat =~ s/^\.\*/[^.].*/; # .* -> [^.].*
$pat =~ s/^\.([^\*])/[^.]$1/; # .x -> [^.]x
$pat =~ s/^\*/[^.]*/;
# Anchor the pattern
$pat = "^$pat\$";
# could do some optimising here, but leave it to perl!
# e.g. "^.*" => ""
# ".*$" => ""
--
Gregory Bond <gnb@bby.com.au>
Burdett Buckeridge & Young Ltd Melbourne Australia
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.