Re: Parsing nested structures with yacc

Max Hailperin <>
Sat, 09 Aug 2008 07:49:38 -0500

          From comp.compilers

Related articles
Parsing nested structures with yacc (amoe) (2008-08-07)
Re: Parsing nested structures with yacc (Chris F Clark) (2008-08-08)
Re: Parsing nested structures with yacc (Max Hailperin) (2008-08-09)
Re: Parsing nested structures with yacc (kamal) (2008-08-11)
Re: Parsing nested structures with yacc (amoe) (2008-08-19)
Re: Parsing nested structures with yacc (Max Hailperin) (2008-08-20)
| List of all articles for this month |

From: Max Hailperin <>
Newsgroups: comp.compilers
Date: Sat, 09 Aug 2008 07:49:38 -0500
Organization: Compilers Central
References: 08-08-014 08-08-016
Keywords: parse
Posted-Date: 09 Aug 2008 09:20:42 EDT

Chris F Clark <> writes:

> David <> writes:
>> I'm trying to write a basic parser for S-expressions, basic list
>> structures used in Lisp. ...
> When manipulating a list, one often needs both access to the head and
> the tail, because you insert the head into a list above it, but you
> append things to the tail of the list. So, imagine that your parser
> at each level returned a struct that had head and tail of the
> underlying list (or if it is a single elt, both head and tail point to
> the same thing). See, if that model doesn't allow you to add code to
> your third grammar. ...

Although you are quite correct that this techinque of tracking list
tails for mutation can be useful in general, I don't think David needs
to go to that length. He can stick with straightforward functional
construction of the lists, using actions like $$ = pair($1,$2), where
"pair" is his constructor analogous to Lisp's "cons". All he needs to
do is get the underlying grammar right; his problem was less with the
datastructure and actions than it was with the grammar. (Although his
code does suggest he might have misunderstood the datastructure by not
recognizing that the cdr of a nonempty list is itself always a list,
one element shorter.)

His grammar (which is simpilified relative to any full Lisp) would say

(1) An s-expression is either a symbol or a list.

(2) A list always starts with an open paren; we can call what follows
        that the tail.

(3) A tail is either
        (a) just a close paren (forming an empty list), or
        (b) an s-expression followed by a further tail (forming a nonempty

In 3b, the car and cdr of the nonempty list are already available by
the time the grammar production is reduced, so there is no reason not
to construct the nonempty list (i.e., the pair) in a functional
fashion, rather than by mutation.

As David suggested, this is making good use of yacc's stack. The list
is being built up starting with an empty list and tacking elements
onto the front, starting with the last element and continuing forward
until the first element is tacked on -- the opposite of the order in
which the elements are read. That reversal of order is coming from
yacc's stack, and explains why there is no need to be mutating the
tail end of a list.

Yacc's stack also addresses David's other concern, the ability to nest
lists within lists as the constituent elements. That is achieved just
by having 3b use an s-expresion as the car of the nonempty list,
rather than only a symbol. Of course, this requires that the lists
are delimited by the open and close parentheses, whereas David's
grammar productions omitted those (though they were declared as
terminals and recognized by the lexical analyzer).

  -Max Hailperin
    Professor of Computer Science
    Gustavus Adolphus College

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.