Re: Parsing nested structures with yacc

Max Hailperin <max@gustavus.edu>
Wed, 20 Aug 2008 07:02:32 -0500

          From comp.compilers

Related articles
Parsing nested structures with yacc amoebae@gmail.com (amoe) (2008-08-07)
Re: Parsing nested structures with yacc cfc@shell01.TheWorld.com (Chris F Clark) (2008-08-08)
Re: Parsing nested structures with yacc max@gustavus.edu (Max Hailperin) (2008-08-09)
Re: Parsing nested structures with yacc kamalpr@hp.com (kamal) (2008-08-11)
Re: Parsing nested structures with yacc amoebae@gmail.com (amoe) (2008-08-19)
Re: Parsing nested structures with yacc max@gustavus.edu (Max Hailperin) (2008-08-20)
| List of all articles for this month |

From: Max Hailperin <max@gustavus.edu>
Newsgroups: comp.compilers
Date: Wed, 20 Aug 2008 07:02:32 -0500
Organization: Compilers Central
References: 08-08-014 08-08-034
Keywords: parse, Lisp
Posted-Date: 20 Aug 2008 14:00:58 EDT

amoe <amoebae@gmail.com> writes:
....
> However, I'd be interested to know - can Max's solution be converted
> to a left-recursive one that can manually allocate arbitrary hunks of
> memory for new tokens? How much more complicated would that solution
> be, and what would be the general strategy?


Let me start by paraphrasing your question, to make sure I'm actually
answering what you are asking.


If I understand you correctly, you are asking about leaving unchanged
the way nesting is handled -- the ability of a list-element to itself
be a list, as in (((a))) -- and just changing how the linear
decomposition of each level of list is handled, in which the list is
broken down into one element and the rest -- the part that is relevant
in a situation like (a b c).


You want to have the decomposition within the grammar be the reverse
of that in the data structure. The grammar would take the viewpoint
    (a b c) = (a b) + c
but the data strcuture would be based on
    (a b c) = a + (b c)
as it is in Lisp.


There are two ways you could do that. One was already suggested by
Chris Clark earlier in this thread: build the data structure from left
to right, at each step modifying the last pair to reference a new
pair. That is, you would start out with the list (a), then by
mutating its last pair turn it into (a b), then by mutating the new
last pair turn it into (a b c).


The other approach may be simpler for you to fit into your existing
infrastructure; build the list in reverse order and then at the end
reverse it. That is, you would use ordinary constructor operations to
build up from (a) to (b a) to (c b a), and then once the list is
complete, you would flip it around to (a b c).


In my earlier post, I had said


> His grammar (which is simpilified relative to any full Lisp) would say
> (1) An s-expression is either a symbol or a list.
> (2) A list always starts with an open paren; we can call what follows
> that the tail.
> (3) A tail is either
> (a) just a close paren (forming an empty list), or
> (b) an s-expression followed by a further tail (forming a
> nonempty list).


For ease of switching this around to the left-recursive form you are
now asking about, first recognize that it could be rewritten as
follows, while remaining right-recursive:


(1) An s-expression is either a symbol or a list.
(2) A list always starts with an open paren and ends with a close
        paren; we can call what comes in between them the body.
(3) A body is either
                (a) just the empty string (forming an empty list), or
                (b) an s-expression followed by a further body (forming a
                        nonempty list).


Now, to make it left-recursive, you would just modify part (3):


(3) A body is either
                (a) just the empty string (forming an empty list), or
                (b) a body followed by an s-expression (forming a
                        nonempty list).


However, the "forming a nonempty list" part is now where life gets
interesting, because the pair constructor tacks an element onto the
front of a list, not the back.


As I said, Chris Clark's approach of modifying the tail end of the
list is indeed reasonable. But the reason I showed the modified
grammar was to point out that my alternative suggestion of building
the list in reverse order and then reversing it once it is complete
fits quite naturally into the grammar, as well as not requiring any
fanciness at the data-structure level.


Namely, notice that we've already got two separate nonterminals for "a
list" versus "a body". This positions you perfectly for having the
synthesized attribute value for a list be in the left-to-right order,
(a b c) in the example, while the synthesized attribute value for the
body is in the right-to-left order, (c b a). There are natural places
to hang the two different kinds of actions you need, one to build the
list using your pair constructor and one to do the list reversal.


The same technique can be used even in grammars for non-parenthesized
lists, which don't naturally provide two separate nonterminals -- you
just have to recognize the need for introducing an additional
nonterminal with the trivial production


    A -> B


solely for the sake of providing a setting for the reversal action.
But here the setting is already in place, just waiting to be used.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.