Related articles |
---|
A BNF grammar for a simplistic regex language spur4444@yahoo.com (2004-05-30) |
Re: A BNF grammar for a simplistic regex language wyrmwif@tsoft.com (SM Ryan) (2004-06-06) |
From: | SM Ryan <wyrmwif@tsoft.com> |
Newsgroups: | comp.compilers |
Date: | 6 Jun 2004 15:12:40 -0400 |
Organization: | Quick STOP Groceries |
References: | 04-05-099 |
Keywords: | parse |
Posted-Date: | 06 Jun 2004 15:12:40 EDT |
# I want to parse these regexes into a parse tree with a recursive
# descent parser. For this purpose, I want to define a BNF grammar for
# them first. For now, I've came up with the following:
#
# expr ::= concat '|' concat
# concat
#
# concat ::= rep '.' rep
# | rep
#
# rep ::= atom '*'
# | atom
#
# atom ::= '(' expr ')'
# | char
#
# char ::= a..z
The grammar will not accept two or more '|' or '.' at the same level.
expr ::= concat | concat '|' expr
concat ::= rep | rep '.' concat
rep ::= atom '*' | atom
atom ::= '(' expr ')' | char
char ::= 'a' | ... | 'z'
which looks like it might match your parse tree.
# Does this look OK to you ? Does this syntax correctly specifies the
# simple regexes I described above ?
# To me the grammar looks suitable for RD parsing (i.e. not left
# recursive), am I right ?
The grammar is not LL(1); an LL(1) grammar for this language would
produce a different parse tree. If you're constructing the parse trees
by hand, or don't mind slightly different parse trees, it's not that
hard to do the conversion.
expr ::= concat or-exprs
or-exprs ::= '|' concat or-exprs | empty
concat ::= rep and-concats
and-concats ::= '.' concat and-concats | empty
rep ::= atom rep-option
rep-option ::= '*' | empty
atom ::= '(' expr ')' | char
char ::= 'a' | ... | 'z'
--
SM Ryan http://www.rawbw.com/~wyrmwif/
I hope it feels so good to be right. There's nothing more
exhilirating pointing out the shortcomings of others, is there?
Return to the
comp.compilers page.
Search the
comp.compilers archives again.