Re: understanding the intuition behind LL(k) parsers and LR(k) parsers

Max Hailperin <max@gustavus.edu>
23 Apr 2006 10:01:04 -0400

          From comp.compilers

Related articles
understanding the intuition behind LL(k) parsers and LR(k) parsers Mark.Felzer@gmail.com (Mark F.) (2006-04-21)
Re: understanding the intuition behind LL(k) parsers and LR(k) parsers tom@infoether.com (Tom Copeland) (2006-04-22)
Re: understanding the intuition behind LL(k) parsers and LR(k) parsers cfc@shell01.TheWorld.com (Chris F Clark) (2006-04-23)
Re: understanding the intuition behind LL(k) parsers and LR(k) parsers max@gustavus.edu (Max Hailperin) (2006-04-23)
Re: understanding the intuition behind LL(k) parsers and LR(k) parsers DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-04-23)
Re: understanding the intuition behind LL(k) parsers and LR(k) parsers pbmann@gmail.com (2006-04-28)
Re: understanding the intuition behind LL(k) parsers and LR(k) parsers pbmann@gmail.com (2006-05-01)
| List of all articles for this month |

From: Max Hailperin <max@gustavus.edu>
Newsgroups: comp.compilers
Date: 23 Apr 2006 10:01:04 -0400
Organization: Compilers Central
References: 06-04-124
Keywords: parse, comment
Posted-Date: 23 Apr 2006 10:01:04 EDT

"Mark F." <Mark.Felzer@gmail.com> writes:


> ...
> Maybe you can help me visualize the basic idea behind the two
> approaches to AST tree generation....


Conceptually, both an LL parser and an LR parser maintain a stack of
grammar symbols, that is, a stack where each element is either a
nonterminal or a terminal. However, the meaning of that stack is
completely opposite in the two styles of parser, and so is the
combination of two operations used to update the stack.


In an LR parser, if you read the stack from bottom to top, you get a
summary of the input that has already been read. In an LL parser, if
you read the stack from top to bottom, you get a prediction of the
input that remains to be read. In both cases, when I say the stack is
a "summary," I mean that it may be at a relatively abstract level
because it can contain nonterminals rather than only terminals. For
example, if an LR stack contains


+
Expr
(


it means that the input that was already read consists of a left
parenthesis, some sort of Expr, and then a plus sign. This is an
abstract summary, because it doesn't specify which particular terminal
symbols were read that constituted the Expr. Similarly, if an LL
stack contains


+
Expr
)


it means that the remaining input, which has not yet been read, is
expected to contain a plus sign, some sort of Expr, and then a right
parenthesis. Again, there is no specificity regarding what form the
Expr will take.


This is just the conceptual view; LR stacks in paricular actually
contain numeric "states", which are critical for understanding how the
parser efficiently decides on the next action to take, but which are
not critical for understanding the fundamental contrast between LL and
LR. Instead, a better next step would be to move from considering how
to read the stack to considering what the two actions available to the
parser are. For appreciating the LL/LR contrast, you can just assume
the appropriate action is magically chosen.


In an LR parser, the two possible actions are:


  (1) Read in one more terminal symbol from the input; to maintain the
          stack as a summary of the input that has been read, the terminal
          symbol needs to be pushed onto the stack. This is called
          "shifting" a symbol onto the stack.


  (2) Reduce the summary that is on the stack to a more abstract one by
          replacing the right-hand side of a production by the left-hand
          side. For example, if there is a production Factor -> ( Expr )
          and the stack is
                    )
                    Expr
                    (
                    (
          then a reduction would summarize the "( Expr )" part more
          concisely as being a Factor; this entails popping the top three
          symbols off the stack and pushing the Factor on, yielding
                    Factor
                    (
          which is still a summary of the same already-consumed input, just
          a more abstract one.


Because of these two actions, the LR parser is also called a
shift/reduce parser.


In an LL parser, on the other hand, the actions are the opposite,
reflecting the oppositeness of what its stack means:


    (1) If a terminal symbol is read in from the input, then the same
            terminal symbol must be popped off from the top of the stack,
            because the stack is a prediction of the remaining input. The
            action here consists of confirming one symbol worth of that
            prediction.


    (2) If instead the prediction is changed without reading input, this
            happens by expanding out the prediction to a less abstract one
            by replacing the left-hand side of a production by the
            right-hand side. For example, considering again the production
            Factor -> ( Expr ), an action could change the stack from just
                Factor
                )
            to
                (
                Expr
                )
                )
            so as to predict not just some sort of Factor followed by a
            right parenthesis, but rather more specifically a left
            parenthesis, Expr, and right parenthesis followed by a (further)
            right parenthesis.


I like to think of an LL parser as a "confirm/expand" parser by
analogy with the "shift/reduce", though this nomenclature doesn't seem
to be standard.


I hope seeing the duality will help you understand the relationship
between these two parsing techniques.


  -Max Hailperin
    Professor of Computer Science
    Chair, Mathematics and Computer Science Department
    Gustavus Adolphus College
    800 W. College Ave.
    St. Peter, MN 56082
    USA
    http://www.gustavus.edu/+max/
[Nice explanation. Thanks! -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.