Re: Parser question

Max Hailperin <max@gustavus.edu>
Sun, 13 Jul 2008 08:12:44 -0500

          From comp.compilers

Related articles
Parser question mike@sesamgames.com (Mike \(News\)) (2008-07-11)
Re: Parser question avron2006b@yahoo.com (Aeran) (2008-07-13)
Re: Parser question max@gustavus.edu (Max Hailperin) (2008-07-13)
Re: Parser question DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-07-15)
Re: Parser question mikael@sesamgames.com (Mikael =?ISO-8859-1?Q?Str=F6m?=) (2008-07-17)
Re: Parser question max@gustavus.edu (Max Hailperin) (2008-07-19)
| List of all articles for this month |
From: Max Hailperin <max@gustavus.edu>
Newsgroups: comp.compilers
Date: Sun, 13 Jul 2008 08:12:44 -0500
Organization: Compilers Central
References: 08-07-024
Keywords: parse
Posted-Date: 15 Jul 2008 01:32:01 EDT

"Mike (News)" <mike@sesamgames.com> writes:


> ...
> The implementation is a predictive parser that construct a syntax tree
> holding both expression and statements. As the grammar is recursive,
> so is the implementation. However, i wish to find some shortcuts to
> rewrite (at least some of) the recursive functions to be
> iterative. Are there any well known algorithms of doing this? I could
> not find much info on this topic in the dragon book.


There are well-known algorithms for tail-recursion elimination, but
using one of the fully-general algorithms is unlikely to help you much
getting over your hurdle of developing an intuition for the process.
I think you would be better off with what you ask for later in your
post, an example. Your old dragon book shows an example of manually
doing the tail-recursion elimination on a parsing procedure for a
grammar like your example (or rather, like your example once
debugged). However, it is shown in a context where no syntax tree is
being built. Later, the dragon book shows how to use an L-attributed
definition to build the syntax tree. Putting these two examples
togeher would solve your problem, but it isn't obvious to a newcomer
how that is done. That, I think, is the crux of your problem, and I
address it below.


However, I should also note that depending on what language you are
writing your compiler in, and in some cases what compiler you are
using to compile your compiler, doing the tail-recursion elimination
by hand may be pointless. The compiler that is compiling your
compiler may do it for you. As an example, I took the two parsers
shown in C in the red dragon book, one tail-recursive, one with a
loop, and ran them both through the gcc compiler (with optimization
turned on). I told gcc to produce assembly language output rather
than binary object code, and I compared the two versions. The result:
essentially no difference. In this case, where the parsing routine
was written in C, the fact that tail-recursion elimination was
performed automatically was a property of the specific C compiler
used, gcc. But there are some other languages, such as Scheme, where
tail-recursion elimination is guaranteed by the language
specification, and so you can feel free to leave your parser in the
tail-recursive form, knowing that it will still generate an iterative
process.


>
> For instance the grammar for:
>
> expr -> term moreterms
>
> moreterms -> + term | - term | N5


I think you goofed on the productions for moreterms. The N5 is surely
supposed to be an epsilon, but there is also a more fundamental
problem: you missed the tail recursion. The first two productions
for moreterms should presumably allow further terms at the end:


moreterms -> + term moreterms | - term moreterms | epsilon


> Is inherently recursive. How would I write iterative code for this using
> my tree function makenode_op(operand, left, right)?
>
> An example in pseudo code would be most helpful to get me started.


First, let's write this using tail recursion and not building up the
syntax tree:


expr():
    term()
    moreterms()


moreterms():
    if lookahead = '+':
        advance lookahead
        term()
        moreterms()
    else if lookahead = '-':
        advance lookahead
        term()
        moreterms()
    else if lookahead can legally follow:
        return
    else:
        report a syntax error


Note that in practice, the check for a legally following symbol is
frequently omitted.


Now we can take this in two different directions, which we later merge
back together. One is to eliminate the tail-position calls to
moreterms(), the other is to introduce the building of a syntax-tree
using an L-attributed definition. You can read these in either order.


To eliminate the tail calls, we convert the moreterms procedure into a loop:


expr():
    term()
    // initially fall into moreterms
    repeat until exited: // this loop is moreterms
        if lookahead = '+':
            advance lookahead
            term()
            // loop back for moreterms
        else if lookahead = '-':
            advance lookahead
            term()
            // loop back for moreterms
        else if lookahead can legally follow:
            return
        else:
            report a syntax error


If instead we want to leave the tail calls intact, but build the
syntax tree, we would introduce an inherited attribute (procedure
parameter) for the left context and a synthesized attribute (procedure
return value) for the resulting syntax tree. I've written the below
pseudo-code rather ploddingly, with a lot of unnecessary assignment
statements to give meaningful names to the intermediate results, in
the hopes that it helps you see what is going on:


expr():
    firstTerm = term()
    return moreterms(firstTerm)


moreterms(left):
    if lookahead = '+':
        advance lookahead
        right = term()
        newLeft = makenode_op('+', left, right)
        return moreterms(newLeft)
    else if lookahead = '-':
        advance lookahead
        right = term()
        newLeft = makenode_op('-', left, right)
        return moreterms(newLeft)
    else if lookahead can legally follow:
        return left
    else:
        report a syntax error


Now, we can combine these two versions, following the overall looping
form of the version that doesn't have tail calls, but keeping track of
the left context. In place of the parameter "left", we'll now have a
local variable by that name, which will get its succesive values by
assignment statements rather than parameter passing:


expr():
    left = term()
    // initially fall into moreterms
    repeat until exited: // this loop is moreterms
        if lookahead = '+':
            advance lookahead
            right = term()
            newLeft = makenode_op('+', left, right)
            left = newLeft
            // loop back for moreterms
        else if lookahead = '-':
            advance lookahead
            right = term()
            newLeft = makenode_op('-', left, right)
            left = newLeft
            // loop back for moreterms
        else if lookahead can legally follow:
            return left
        else:
            report a syntax error


Of course, I will re-emphasize that this is intentionally plodding.
At a minimum, you would ordinarily eliminate the newLeft variable and
instead directly assign the result from makenode_op to the left
variable.


>
> Thanks in advance!
>
> /Mike


You're welcome.
  -Max Hailperin
    Professor of Computer Science
    Gustavus Adolphus College


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.