Related articles |
---|
perl regular expression grammar alan@oursland.net (2001-07-17) |
Re: perl regular expression grammar merlyn@stonehenge.com (2001-07-18) |
Re: perl regular expression grammar ralph@inputplus.demon.co.uk (2001-07-18) |
Re: perl regular expression grammar johnmillaway@yahoo.com (John W. Millaway) (2001-07-18) |
Re: perl regular expression grammar mjd@plover.com (2001-07-18) |
Re: perl regular expression grammar abigail@foad.org (2001-07-18) |
Re: perl regular expression grammar alan@oursland.net (2001-07-23) |
Re: perl regular expression grammar usenet11522@itz.pp.sci.fi (Ilmari Karonen) (2001-07-23) |
Re: perl regular expression grammar mjd@plover.com (2001-08-02) |
From: | abigail@foad.org (Abigail) |
Newsgroups: | comp.lang.perl.misc,comp.compilers |
Date: | 18 Jul 2001 20:02:27 -0400 |
Organization: | Abigail's Kinderboerderijen |
References: | 01-07-080 |
Keywords: | syntax |
Posted-Date: | 18 Jul 2001 20:02:27 EDT |
X-Date: | MMDCCCLXXVIII September MCMXCIII |
Alan Oursland (alan@oursland.net) wrote on MMDCCCLXXVIII September
MCMXCIII
## I've been looking for a complete perl 5 regular expression grammar
## and, having been unsuccessful in my search, have attempted to write
## one myself. I was wondering if anyone could help me find any errors in
## it (excluding grammar syntax errors). I've left out embedded modifiers
## from the grammar -- I'm not sure how they fit into the grammar. I've
## also skimmed over the non-meta character production. One area I am
## confused is the "\c[" control character (described at
## http://www.perldoc.com/perl5.6/pod/perlre.html). How does this work?
##
## Alan Oursland
##
## Here is the grammar:
## <re> ::= <union>
## <union> ::= <concat>"|"<union> | <concat>
## <concat> ::= <quant><concat> | <quant>
## <quant> ::= <group>"*" | <group>"+" | <group>"?" | <group>"{"<bound>"}" | <group>
## <group> ::= "("<re>")" | <term>
## <term> ::= "." | "$" | "^" | <char> | <set>
## <bound> ::= <num> | <num>"," | <num>","<num>
## <char> ::= <non-meta> | "\"<escaped>
## <non-meta> ::= any non-meta char
## <escaped> ::= <meta>|<control>|<special>|<assert>
## <meta> ::= "."|"^"|"$"|"?"|"*"|"+"|"|"|"["|"("|")"|"\"|"{"
## <control> ::= "t"|"n"|"r"|"f"|"a"|"e"|"l"|"u"|"L"|"U"|"E"|"Q"
## <special> ::= <backoctal>|<hexchar>|<controlchar>|<class>
## <assert> ::= "b"|"B"|"A"|"z"|"Z"|"G"
## <backoctal> ::= <digit> | <digit><digit> | "0"<oct><oct> | "+" | "&" | "`" | "'"
## <hexchar> ::= "x"<hex><hex> | "x{"<hex><hex><hex><hex>"}"
## <controlchar> ::= "c["
## <namedchar> ::= "N{"<name>"}"
## <class> ::= "w"|"W"|"s"|"S"|"d"|"D"|"X"|"C" |"p"<name>|"P"<name>|"[:"<posixclass>":]"|"[:^"<posixclass>":]"
## <posixclass> ::= "alpha"|"alnum"|"ascii"|"cntrl"|"digit"|"graph"|"lower"|"print"|"punct"|"space"|"upper"|"word"|"xdigit"
## <name> ::= <unicodeclass>
## <unicodeclass> ::= "IsAlpha"|"IsAlnum"|"IsASCII"|"IsCntrl"|"IsDigit"|"IsGraph"|"IsLower"|"IsPrint"|"IsPunct"|"IsSpace"|"IsUpper"|"IsWord"|"IsXDigit"
## <set> ::= "[" <set-items> "]" | "[^" <set-items> "]"
## <set-items> ::= <set-item> | <set-item> <set-items>
## <set-item> ::= <range> | <char>
## <range> ::= <char> "-" <char>
## <num> ::= <digit><num> | <digit>
## <oct> ::= "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"
## <digit> ::= "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
## <hex> ::= "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F"
## <mod> ::= "\i"|"\m"|"\s"|"\x"
Some regexes that I cannot parse using the above grammar:
/3*?/
/g{,1}/
/\cM/
/aa/
/[*]/
/(?!!)/
Some regexes can be parsed ambigiously with the above grammar:
/\+/
(why are "+", "&", "`" and "'" mentioned in <backoctal>?)
Here are some modifications of the grammar that fixes some of the
issues:
Fixes /3*?/:
<quant> ::= <group><quantifier><greedy> | <group>
<quantifier> ::= "*" | "?" | "+" | "{" <bound> "}"
<greedy> ::= "?" | ""
Fixes /aa/:
<term> ::= "." | "$" | "^" | <chars> | <set>
<char> ::= <non-meta> | "\"<escaped>
Fixes /\cM/:
<controlchar> ::= "c" <any-char>
<any-char> ::= Any possible character.
(but that doesn't fix /\c/ and would allow /\c\/ which doesn't parse in Perl).
Abigail
--
srand 123456;$-=rand$_--=>@[[$-,$_]=@[[$_,$-]for(reverse+1..(@[=split
//=>"IGrACVGQ\x02GJCWVhP\x02PL\x02jNMP"));print+(map{$_^q^"^}@[),"\n"
__END__
A bee crawling in // the branches of a hazel. A // pair of bears. Bankei.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.