Seamlessly parsing left-recursive grammars with LL parsers ?

Louis Mullie <louis@mulliemedia.com>
20 Mar 2005 11:14:35 -0500

          From comp.compilers

Related articles
Seamlessly parsing left-recursive grammars with LL parsers ? louis@mulliemedia.com (Louis Mullie) (2005-03-20)
Re: Seamlessly parsing left-recursive grammars with LL parsers ? angagon@earthlink.net (Matt O'Connor) (2005-03-27)
Re: Seamlessly parsing left-recursive grammars with LL parsers ? branco.medeiros@gmail.com (2005-03-31)
Re: Seamlessly parsing left-recursive grammars with LL parsers ? louis@mulliemedia.com (Louis Mullie) (2005-04-02)
| List of all articles for this month |

From: Louis Mullie <louis@mulliemedia.com>
Newsgroups: comp.compilers
Date: 20 Mar 2005 11:14:35 -0500
Organization: Compilers Central
Keywords: parse, question, LL(1)
Posted-Date: 20 Mar 2005 11:14:35 EST

Hello,


First of all I must say I am only a high school student, and I've been
looking into creating parsers for a month or so only; I have never had
any lessons on parsers, so my question may be useless or stupid. I
have successfully built a recursive-descent parser that builds a parse
tree based on a BNF grammar (in PHP); this will be a proposal for a
package at the PEAR repository (www.pear.php.net). However, I am
concerned with beginners not understanding what left-recursion is and
wonder why the parser fails to parse their stuff. I have come up with
a way (it's probably wrong) to use my LL parser and still parse left-
recursive grammars seamlessly; I'd like you to have a look at it, and
if it's wrong, please tell me why it wouldn't work. Here it is, in
pseudo-code :


To parse : 6 + 3 + 2


(token) num := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
add := add '+' term
term := num


Obviously, 'add' would infinitely recurse in an LL parser.
Now, consider the following transformation to the grammar :
----------------------------------------------------------------------------
input = reverseOrder(input);
foreach (grammar.nonterminals as symbol) {
        if (symbol.isLeftRecursive()) symbol.rule =
reverseOrder(symbol.rule) + '|' + symbol;
}
-----------------------------------------------------------------------------
The grammar becomes :
2 + 3 + 6
(token) num := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
add := term '+' add | term
term := num


This transforms our left-recursion into right-recursion; the parse
tree will be different, so after the parsing we need to modify our
tree too. Here is the parsed tree :


Tree =
        |
        +- add
              |
              +-- term '+' add
                                                        |
                                                      +-- term '+' add
                                                                                                |
                                                                                              +-- term
Here's the tree transformation :
----------------------------------------------------
foreach (tree as node) {
        /* If it's a non-terminal, if its left recursive (has been flipped)
-- marked when
                transforming the grammar */
        if (node.hasChild() && node.isLeftRecursive()) {
                1. flip the node children;
                2. flip the node childrens to come back to the original children
ordering;
                (ex. :
                      0. add '+' term
                              |
                            +-- xxxx


                      1. term '+' add
                                                            |
                                                          +-- xxxx


                      2. term '+' add
                                |
                              +-- xxxx
                )
        }
        /* Remove xxx => xxx, caused by adding "|symbol" to the rule */
        lastChild = node.child[node.child.lenght()];
        if (lastChild.type == lastChild.child[0].type &&
                lastChild.child.lenght() == 1) {
                        lastChild.removeChild(0);
                }
}
--------------------------------------------------------
Then we have the same tree as an SR parser would have parsed :
Tree =
        |
        +- add
                    |
                    +-- add '+' term
                                      |
                                    +-- add '+' term


What do you think of this ?


Regards,
Louis Mullie
<louis@mulliemedia.com>



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.