A BNF grammar for a simplistic regex language

spur4444@yahoo.com (Spur)
30 May 2004 13:27:16 -0400

          From comp.compilers

Related articles
A BNF grammar for a simplistic regex language spur4444@yahoo.com (2004-05-30)
Re: A BNF grammar for a simplistic regex language wyrmwif@tsoft.com (SM Ryan) (2004-06-06)
| List of all articles for this month |
From: spur4444@yahoo.com (Spur)
Newsgroups: comp.compilers
Date: 30 May 2004 13:27:16 -0400
Organization: http://groups.google.com
Keywords: parse, question
Posted-Date: 30 May 2004 13:27:16 EDT

Hello All,


For educational purposes I'm writing a regular expression engine. My
regular expressions are restricted to a simplistic subset: the
operators are * (highest precedence), . (concatenation) and | (lowest
precedence).
Parentheses are allowed, and only single chars a..z act as atoms. For
instance, the following is a legal regex in this language:


(a|b)*.a.b.b (which means "0 or more of a or b, and then abb")


The parse tree (rotated 90 degrees counterclockwise) for this regex
should be:


      b
. b
      . a
            .
                  * a
                      |
                            b


I want to parse these regexes into a parse tree with a recursive
descent parser. For this purpose, I want to define a BNF grammar for
them first. For now, I've came up with the following:


expr ::= concat '|' concat
                      concat


concat ::= rep '.' rep
                  | rep


rep ::= atom '*'
                  | atom


atom ::= '(' expr ')'
                  | char


char ::= a..z


Does this look OK to you ? Does this syntax correctly specifies the
simple regexes I described above ?
To me the grammar looks suitable for RD parsing (i.e. not left
recursive), am I right ?


Thanks a lot in advance


Eli


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.