Re: Object-oriented AST

Chris F Clark <>
12 Apr 2006 22:46:47 -0400

          From comp.compilers

Related articles
Object-oriented AST (Thomas Weiss) (2006-04-08)
Re: Object-oriented AST (Florian Liekweg) (2006-04-09)
Re: Object-oriented AST (Chris F Clark) (2006-04-12)
Re: Object-oriented AST (Thomas Weiss) (2006-04-16)
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: 12 Apr 2006 22:46:47 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 06-04-055 06-04-062
Keywords: OOP, AST
Posted-Date: 12 Apr 2006 22:46:47 EDT

Thomas Weiss wrote:
> I am looking for an
> efficient way to design abstract syntax trees in object-oriented
> languages.
... [example showing inheirtance matching parsing stratgy snipped]

This method was been followed [with some success I presume] in the
Cocktail toolkit by Josef Gro\:lsh. Note, that your issue of some
rule descending from two parents and thus inheriting from both isn't
really as frequent as you would think. In reality, the productions
that get resused in multiple productions are natural roots of trees,
e.g. "statement", "expression". That is, you may see expression used
in multiple contexts, but not and-expression, or-expression,
plus-expression and the rest, those will only occur inside expression.
Therefore, if you find places where there are two references to the
same non-terminal, that non-terminal is a likely base class. The same
thing helps one break cycles, since expression is usually
self-recursive. Once one has made a non-terminal into a base class,
one doesn't make it into a derived class of itself.

Florian Liekweg replied:
> From your example, it seems that you are trying to turn every BNF rule
> into an inheritance relationship. I would advise against doing this
> in this strictness. It seems more sensible to me to assemble the
> important parts of the AST (i. e. the nodes that match class decls,
> methods or fields ... control flow statements (if, for, while),
> statements and expressions) and create an inheritance hierarchy
> over that dimension....

There is lots of sense to this approach (making your hierarchy
semantic not syntactic), and we have used it in Yacc++. If you
divorce your inheritance hierarchy from the syntax, you will find that
it is much flatter, and only captures those distinctions you want
preserved for semantic reasons. Doing so makes it much more likely,
that you can put member functions in base classes and have them
actually apply to the inherited classes.

In particular, a common pattern that I have seen replicated many times
is three essential base classes:

class ast-base // everything inherits from this
                                // and this is what pointers, point to

class tk-base is-a ast-base // tokens derive from this
class nt-base is-a ast-base // non-terminals derive from this

From, this point on one can make sub-divisions that are appropriate to
your application. I find there are generally at least two sub-classes
of tk-base.

class tk-sym-base is-a tk-base // tokens with "symbols" for variable spellings
                                // derive from this
class tk-fixed-base is-a tk-base // tokens with fixed spellings
                                // derive from this

As Florian suggested, you will find some likely base classes for
non-terminals to be expressions and statements.

class nt-stmt-base is-a nt-base // non-terminals representing statements
                                // derive from this
class nt-expr-base is-a nt-base // non-terminals representing expressions
                                // derive from this

I also find, that it is often useful to sub-divide the expression (or
statement) classes even more. The clases below are typical and come
from a Verilog compiler I did a few years back--the compiler has more
than these, but these give you the flavor of what I did.

class nt-expr-binary-base is-a nt-expr-base
                                // non-terminals representing binary expressions
                                // derive from this (e.g. + *)

class nt-expr-unary-base is-a nt-expr-base
                                // non-terminals representing unary expressions
                                // derive from this (e.g. unary -, ~)

class nt-expr-binary-relation-base is-a nt-expr-binary-base
                                // non-terminals representing relational binary expressions
                                // derive from this (e.g. < <= == != > >=)
class nt-expr-binary-logic-base is-a nt-expr-binary-base
                                // non-terminals representing logical binary expressions
                                // derive from this (e.g. && ||)
class nt-expr-binary-bitwise-base is-a nt-expr-binary-base
                                // non-terminals representing bit-wise binary expressions
                                // derive from this (e.g. & |)

Note that, at times, I have found it very useful to have a distinct
class for every different syntactic rule. That way in the class, I
know exactly what fields are well-defined. At other times, I have
found it conventient to group two distinct alternatives into the same
class as they have the same semantics. However, I never have a deep
inheritance heirarchy that matches the grammar. I may have a deep
inheritance hierarchy the represents my semantics so that I can
code-share exactly the right functions and define them only once, but
that's a different story.

Hope this helps,

Chris Clark Internet :
Compiler Resources, Inc. Web Site :
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.