Related articles |
---|
LL(1) grammar for regex not@www-no.ucsd.edu (James Brown \[MVP\]) (2006-03-22) |
Re: LL(1) grammar for regex schmitz@i3s.unice.fr (Sylvain Schmitz) (2006-03-27) |
Re: LL(1) grammar for regex cfc@shell01.TheWorld.com (Chris F Clark) (2006-03-27) |
From: | "James Brown \[MVP\]" <not@www-no.ucsd.edu> |
Newsgroups: | comp.compilers |
Date: | 22 Mar 2006 23:41:45 -0500 |
Organization: | Compilers Central |
Keywords: | lex, question |
Posted-Date: | 22 Mar 2006 23:41:45 EST |
All,
I'm currently trying to build a regex engine by hand (using C++). The aim is
to build a top-down parser for regex expressions. Therefore I am attempting
to build a LL(1) grammar for regex's. I am at the early stage of building a
syntax-tree from my regex grammar, but I am having a hard time with regex
'concatenation':
I am basically copying the code (and grammar) found in the following regex
tutorial:
http://www.gamedev.net/reference/programming/features/af5/
Here's the BNF grammar:
//
// expr ::= concat '|' expr
// | concat
//
// concat ::= rep . concat
// | rep
//
// rep ::= atom '*'
// | atom
//
// atom ::= chr
// | '(' expr ')'
//
// char ::= alphanumeric character
The grammar above represents regex concatenation by a '.' (dot) operator.
This means that regular expressions of the form "abcde" must be written as
"a.b.c.d.e". I can appreciate that it is much simpler to parse in this form,
however, I'd like to remove the reliance on the concatenation operator.
The current pseudo-code for parsing the concatenation is:
node *concat()
{
node *left = rep();
if(ch == '.')
{
ch = nextch();
node *right = concat();
return new node(CONCAT, 0, left, right);
}
return left;
}
I briefly tried to describe concatenation as:
concat ::= rep concat
| rep
node *concat()
{
node *left = rep();
if(ch) // note I don't test for '.'
{
node *right = concat();
return new node(CONCAT, 0, left, right);
}
return left;
}
Of course, this destroys the parsing because I've introduced a recursion
which consumes the rest of the input-stream. I think my version of the
'concat' rule is no longer LL(1)..??
Can anyone advise me on where I should look for help with this? The article
I am using is perfect for me because the grammar is understandable (to me)
and the source-code is nice and simple also....but I just can't figure out
how to proceed from here.
thanks,
James
Return to the
comp.compilers page.
Search the
comp.compilers archives again.