Related articles |
---|
Seamlessly parsing left-recursive grammars with LL parsers ? louis@mulliemedia.com (Louis Mullie) (2005-03-20) |
Re: Seamlessly parsing left-recursive grammars with LL parsers ? angagon@earthlink.net (Matt O'Connor) (2005-03-27) |
Re: Seamlessly parsing left-recursive grammars with LL parsers ? branco.medeiros@gmail.com (2005-03-31) |
Re: Seamlessly parsing left-recursive grammars with LL parsers ? louis@mulliemedia.com (Louis Mullie) (2005-04-02) |
From: | Louis Mullie <louis@mulliemedia.com> |
Newsgroups: | comp.compilers |
Date: | 20 Mar 2005 11:14:35 -0500 |
Organization: | Compilers Central |
Keywords: | parse, question, LL(1) |
Posted-Date: | 20 Mar 2005 11:14:35 EST |
Hello,
First of all I must say I am only a high school student, and I've been
looking into creating parsers for a month or so only; I have never had
any lessons on parsers, so my question may be useless or stupid. I
have successfully built a recursive-descent parser that builds a parse
tree based on a BNF grammar (in PHP); this will be a proposal for a
package at the PEAR repository (www.pear.php.net). However, I am
concerned with beginners not understanding what left-recursion is and
wonder why the parser fails to parse their stuff. I have come up with
a way (it's probably wrong) to use my LL parser and still parse left-
recursive grammars seamlessly; I'd like you to have a look at it, and
if it's wrong, please tell me why it wouldn't work. Here it is, in
pseudo-code :
To parse : 6 + 3 + 2
(token) num := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
add := add '+' term
term := num
Obviously, 'add' would infinitely recurse in an LL parser.
Now, consider the following transformation to the grammar :
----------------------------------------------------------------------------
input = reverseOrder(input);
foreach (grammar.nonterminals as symbol) {
if (symbol.isLeftRecursive()) symbol.rule =
reverseOrder(symbol.rule) + '|' + symbol;
}
-----------------------------------------------------------------------------
The grammar becomes :
2 + 3 + 6
(token) num := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
add := term '+' add | term
term := num
This transforms our left-recursion into right-recursion; the parse
tree will be different, so after the parsing we need to modify our
tree too. Here is the parsed tree :
Tree =
|
+- add
|
+-- term '+' add
|
+-- term '+' add
|
+-- term
Here's the tree transformation :
----------------------------------------------------
foreach (tree as node) {
/* If it's a non-terminal, if its left recursive (has been flipped)
-- marked when
transforming the grammar */
if (node.hasChild() && node.isLeftRecursive()) {
1. flip the node children;
2. flip the node childrens to come back to the original children
ordering;
(ex. :
0. add '+' term
|
+-- xxxx
1. term '+' add
|
+-- xxxx
2. term '+' add
|
+-- xxxx
)
}
/* Remove xxx => xxx, caused by adding "|symbol" to the rule */
lastChild = node.child[node.child.lenght()];
if (lastChild.type == lastChild.child[0].type &&
lastChild.child.lenght() == 1) {
lastChild.removeChild(0);
}
}
--------------------------------------------------------
Then we have the same tree as an SR parser would have parsed :
Tree =
|
+- add
|
+-- add '+' term
|
+-- add '+' term
What do you think of this ?
Regards,
Louis Mullie
<louis@mulliemedia.com>
Return to the
comp.compilers page.
Search the
comp.compilers archives again.