Rich versus Simple ASTs

"Johan Tibell" <johan.tibell@gmail.com>
28 Jul 2006 18:45:36 -0400

          From comp.compilers

Related articles
Rich versus Simple ASTs johan.tibell@gmail.com (Johan Tibell) (2006-07-28)
Re: Rich versus Simple ASTs haberg@math.su.se (2006-07-29)
Re: Rich versus Simple ASTs jakacki@gmail.com (Grzegorz Jakacki) (2006-07-29)
Re: Rich versus Simple ASTs holgersiegel74@yahoo.de (Holger Siegel) (2006-07-29)
| List of all articles for this month |

From: "Johan Tibell" <johan.tibell@gmail.com>
Newsgroups: comp.compilers
Date: 28 Jul 2006 18:45:36 -0400
Organization: http://groups.google.com
Keywords: parse, AST
Posted-Date: 28 Jul 2006 18:45:36 EDT

(This post is a result of my thinking since I've posted "Representing
Closures in C" at an earlier date in this news group).


Considered a grammar for lambda calculus that allows abbreviation of
nested lambda abstractions:


\ x. \ y. x - y


this can be abbreviated as:


\ x y. x - y


(where \ is the lambda symbol)


This convenient abbreviation can be implemented in (at least?) two
different ways. Either the parser translates each occurrence of the
second expression into the same AST representation used by the first or
the AST uses a "richer" representation for lambda abstractions that
allows for more than one variable. In Haskell syntax:


data Expr = Lam Id Expr -- lambda abstraction
                    | App Expr Expr -- function application
                    | ...


or the richer representation:


data Expr = Lam [Id] Expr -- lambda abstraction
                    | App Expr Expr -- function application
                    | ...


where [Id] denotes a list of identifiers


Would it be useful to have an AST representation like the second? More
specifically, when you want compile the code and find a source line
that looks like this:


(\ x y. x - y) 1 2


Using the second representation we could possibly detect that the
function (of two arguments) is in fact applied to two arguments and
thus there's no need to generate an intermediate closure. Or is it
better two just use the first representation and apply some
optimization that removes unnecessary closures? If so how can I
implement such an optimization? Some pseudo code would be nice.


I guess this is a more general question. Should the richness of the
language you are compiling be represented in the AST or should the AST
be simplified as much as possible?


P.S. The second representation could also be useful for pretty printing
the expression exactly as it was written which is not possible with the
first.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.