Re: how to deal with left recursive?

Kaz Kylheku <kkylheku@gmail.com>
Fri, 6 Nov 2009 21:28:00 +0000 (UTC)

          From comp.compilers

Related articles
how to deal with left recursive? vincent.feng@yahoo.com (vincent Feng) (2009-11-05)
Re: how to deal with left recursive? kkylheku@gmail.com (Kaz Kylheku) (2009-11-06)
Re: how to deal with left recursive? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-11-07)
| List of all articles for this month |

From: Kaz Kylheku <kkylheku@gmail.com>
Newsgroups: comp.compilers
Date: Fri, 6 Nov 2009 21:28:00 +0000 (UTC)
Organization: A noiseless patient Spider
References: 09-11-018
Keywords: parse, LL(1)
Posted-Date: 06 Nov 2009 20:32:13 EST

On 2009-11-06, vincent Feng <vincent.feng@yahoo.com> wrote:
> I have found ansi c yacc grammar.
> A production of statement list like this
>
> statement_list
> : statement
> | statement_list statement
> ;
>
> is left recursive.
> How to elimilate left recursive in such case?


Since there are no semantic actions attached to the above grammar which depend
on order, you can just make it right recursive.


  statement_list
                  : statement
                  | statement statement_list
                  ;


This generates exactly the same symbol strings (i.e. language) as the previous
grammar.


Think about it: left recursion is a silly way to represent a list-like
language construct, anyway!


Why? Because the left recursion corresponds to a tree like this:




        /\
      /\
    /\
  /\


Rather than one like this:




  /\
    /\
      /\
        /\




So just change it to:




    statement_list : statement
                                  | statement statement_list
                                  ;


A (non-empty) statement list is either just one statement, or a statement
followed by more statement material.


In the txr language, I parse constructs like this and turn them into
Lisp-like lists.


Take a look here: http://git.savannah.gnu.org/cgit/txr.git/tree/parser.y


For example, there is this production:


clauses : clause { $$ = cons($1, nil); }
                | clause clauses { $$ = cons($1, $2); }
                ;


Why did I make it right recursive? The clue is in the simple semantic actions
on the right. If there is one clause, we build a one-element list. using
cons(<clause>, nil). If we have a clause followed by more clauses, we can cons
the clause up to the front of the list. The cons function just has to allocate
one binary cell and initialize its ``car'' and ``cdr'' fields with the argument
values.


A list of three clauses c1, c2, c3 looks like this cons structure:


    /\
  c1/\
    c2/\
      c3 nil


The tree mirrors the grammar.


But of course the list can't be empty, which is something we sometimes
want to allow. That's where this production
comes in: the optional clauses:


clauses_opt : clauses { $$ = $1; }
                        | /* empty */ { $$ = nil; }
                        ;


``If there are clauses, just copy them. Otherwise, produce the empty
list nil.''


It would silly to have the production left recursive, because we still want to
build a right-recursive list of conses; so we end up doing an append operation:


clauses : clause { $$ = cons($1, nil); }
                | clauses clause { $$ = append2($1, $2); }
                ;


append has to make a copy of the list structure. We could use
the destructive version of append, but that still has to walk down the
list to find the end. Either way we now have O(N*N) complexity to parse a list
of N clauses.


However, there is a cost. The right recursive approach has a deeper parsing
stack. In order to build the tree


    /\
  c1/\
    c2/\
      c3 nil


the very first semantic action that will happen will be the reduction which
conses up (c3 . nil) or (c3) if you will. This is followed by a reduction to
(c2 . (c3 . nil)). I.e. there is all this context built up on the yacc stack,
whose depth corresponds to the depth of the list being parsed. Only when you
hit the end of the list, can you start popping off all of the stack context
to produce the list!


We could do this instead. Use left recursion, and build up the list
of clauses in reverse. Then destructively reverse the list in O(N) time.


rev_clauses : clause { $$ = cons($1, nil); }
                        | rev_clauses clause { $$ = cons($2, $1); }
                        ;


Now when this production is reduced three times over three clauses c1, c2, c3,
we get a structure which looks like this:


    /\
  c3/\
    c2/\
      c1 nil


I.e. the backwards list (c3 c2 c1) corresopnding to the dot notation
(c3 . (c2 . (c1 . nil))).


we can fix it by inserting this shim production into the grammar:


clauses : rev_clauses { $$ = nreverse($1); }


This way we get constant yacc stack usage, and construct the list
in linear time, without having to switch to some clumsy list encapsulation that
doesn't support functional programming very well.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.