Re: parent pointers in AST nodes

"Ira Baxter" <>
Fri, 27 Nov 2009 15:43:11 -0600

          From comp.compilers

Related articles
parent pointers in AST nodes (eliben) (2009-11-27)
Re: parent pointers in AST nodes (Robert A Duff) (2009-11-27)
Re: parent pointers in AST nodes (Kenneth 'Bessarion' Boyd) (2009-11-27)
Re: parent pointers in AST nodes (Ira Baxter) (2009-11-27)
Re: parent pointers in AST nodes (Hans-Peter Diettrich) (2009-11-28)
Re: parent pointers in AST nodes (bartc) (2009-11-30)
Re: parent pointers in AST nodes (2009-11-30)
Re: parent pointers in AST nodes (Kaz Kylheku) (2009-12-01)
Re: parent pointers in AST nodes (Quinn Tyler Jackson) (2009-12-01)
Re: parent pointers in AST nodes (Gary Oblock) (2009-12-14)
| List of all articles for this month |

From: "Ira Baxter" <>
Newsgroups: comp.compilers
Date: Fri, 27 Nov 2009 15:43:11 -0600
Organization: Compilers Central
References: 09-11-060
Keywords: AST
Posted-Date: 29 Nov 2009 00:37:06 EST

"eliben" <> wrote in message
> When implementing an AST for some language, each AST node typically
> holds information about the language construct it represents and
> pointers to children nodes (such as a binary op node pointing to its
> left-hand and right-hand operands).
> Is it common / useful to supply a pointer to the node's parent as
> well?
> In favor:
> * This can simplify some AST processing tasks, especially when using
> the visitor pattern - we may get to an interesting node and then need
> to look at its ancestors to do the required analysis.
> Against:
> * Maintaining parent nodes makes the AST creation code more complex
> * Wastes space (another pointer for each node)
> What are your thoughts?

The DMS Software Reengineering Toolkit implements AST with parent
nodes. It is convenient to down the tree to find things, and
occassionaly walk back up again to a parent to find another child.
The overhead of managing the parent link isn't that high. If you
don't have such links, you will have to store parent links on you way
down the tree, so it isn't as though parent-links are always just
extra cost.

In fact, DMS actually stores AST nodes as "hypergraph" nodes, where
each node has a set of ports numbered 1..n, and each port may
reference 0, 1, or a list of other nodes (actually, other ports from
which the node can be easily computed). Links between nodes are
bi-directional on all ports, so you always navigate both ways.

This allows DMS to represent arbitrary graphs of which ASTs are an
easy subset (implemented via an API to offer an AST feel). Having
lists as children ports allows lists to be easily represented. Having
lists as parents allows ambigous trees (with DAG sharing) to be easily
represented, and this is a foundation for our GLR parser
implementation, which in turn I think is fundamental to DMS's success
at parsing difficult like C++.

The node types are indexed and easily enumerated; this allows pattern
matchers to go straight to potential matching nodes (rather than
navigating the tree from the root in a search), and once you get a
pattern match hit, now you may want to visit the parent. But you
don't have the parent-list, so you need a parent link.

This also makes it easy to patch the trees (graphs) locally.

Node management these days is more about how many memory touches you
have. If it fits in a cache line, you have only one touch, for reads
and writes, and since memory reads/writes are hundreds of CPU cycles,
it is better to avoid touching memory :-} Since we process trees with
hundreds of millions of nodes, this matters. We generally trade
caching a bit of extra information whereever we can, to avoid
additional memory touches.

We're pretty happy with this design.

Ira Baxter, CTO

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.