Re: [Q] How Is AST structured for entire program??

Hans-Peter Diettrich <>
Wed, 17 Oct 2007 17:06:21 +0200

          From comp.compilers

Related articles
[Q] How Is AST structured for entire program?? (2007-10-16)
Re: [Q] How Is AST structured for entire program?? (George Neuner) (2007-10-17)
Re: [Q] How Is AST structured for entire program?? (Hans-Peter Diettrich) (2007-10-17)
Re: [Q] How Is AST structured for entire program?? (Chris F Clark) (2007-10-18)
Re: [Q] How Is AST structured for entire program?? (Paul B Mann) (2007-10-24)
| List of all articles for this month |

From: Hans-Peter Diettrich <>
Newsgroups: comp.compilers
Date: Wed, 17 Oct 2007 17:06:21 +0200
Organization: Compilers Central
References: 07-10-051
Keywords: parse, practice
Posted-Date: 18 Oct 2007 00:45:01 EDT wrote:

> I am a very newbie and am learning compiler, after read (or tried to
> read) many compiler books, there is one thing I haven't been able to
> get a clear picture on. I understand that when we parse a program, a
> parse tree can be constructed, and I can construct a parse tree for
> ONE expression, but I can not see how this parse tree is constructed
> for the entire program.

When you have an tree builder (rule, production...) for one expression,
add another rule for building an "expressions" tree, whose members are
expression trees. Then continue with rules for building an "function"
tree, until you arrive at the "program" tree, which is the goal of your
task (grammar).

A typical program hierarchy is:
program -> function -> block -> statement -> expression -> ...

> Do we construct one tree for each line of expressions

Abandon the idea of lines, unless your language is strictly line-based.
Statements can be split into multiple lines, the line breaks usually are
ignored while parsing. Usually all "unimportant" characters (whitespace,
comments...) are recognized by the lexer, and are not passed to the
parser. The lexer deals with characters, collects multiple characters
into tokens (names, numbers, operators...), and passes these tokens on
to the parser. The parser arranges the tokens into higher level
structures, e.g. into an tree. The parser also can create one or more
symbol tables, for every subroutine or compound statement, so that you
can find out, later, which tree nodes refer to the same symbol

Did you ever have a look at a sample grammar, for an existing
programming language? You may start with a Pascal grammar, which is much
simpler than a grammar for C.

> I am curious do we get any help from some magical arrangement of the
> tree structure so that these types of redudencies can be detected or
> we have to write a special routine to scan through the code for this
> types of redundencies? Or my understanding of the whole thing is
> incorrect?

I don't know about such magic. Write down (part of) the output of your
parser as an tree, then write down the optimized tree, and then try to
find rules for the transformation of the raw tree into the optimized
tree. There exist some techniques for e.g. detecting loop invariant
statements, or common subexpressions. Continue reading about compiler
design principles and techniques, before you try to implement your very
own compiler. Have you already decided whether you want an (table
driven) bottom-up or an (recursive descent) top-down parser?


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.