Re: AST's, why are they necessary? (F.W. Schroeer)
5 Sep 1998 01:18:50 -0400

          From comp.compilers

Related articles
AST's, why are they necessary? (Phil) (1998-08-30)
Re: AST's, why are they necessary? (Scott Stanchfield) (1998-08-31)
Re: AST's, why are they necessary? (1998-08-31)
Re: AST's, why are they necessary? (1998-09-05)
| List of all articles for this month |

From: (F.W. Schroeer)
Newsgroups: comp.compilers
Date: 5 Sep 1998 01:18:50 -0400
Organization: Compilers Central
References: 98-08-200
Keywords: parse

Phil Royall <> wrote:
> I am writing translators to convert about 20 robot languages into our
> own proprietary language (a derivative of VB), and on advice I have
> been looking into using AST's. However, why would, in this case an
> AST give us an advantage over just hand coding the actions into
> fprintf statements (for example).

The Gentle compiler construction system has been used various times for
that purpose, the generated robot control software is in use at the
automobile industry (e.g at Volkswagen).

This methodology strongly recommends to separate issues
and to decompose the task of translation into subtasks:
* Discover the structure of the source program,
* Process this structure to generate the target program.
The abstract syntax representation serves as the interface between subtasks.
In the ancient days when syntax trees did not fit into 64K memory,
the structure of compilers (and even programming languages) was often
determined by the goal to process a program in a single pass.
Today's machines allow a compiler design in favour of productivity and
ease of maintenance.

There are many advantages of using abstract syntax.

* The compiler has a clean structure, independent subtasks are
    described independently. There is a clear, declarative interface
    between subtasks.
* Building an intermediate representation allows the reuse
    of compiler components: You may exchange the frontend without touching
    the backend, hence translating several source languages into the
    same target language (which would be hard if parsing and code generation
    would be intertwined). You may also provide different backends for the
    same source language.
* Semantic analysis and code generation is based on the logical structure,
    not on rules dictated by the parsing algorithm (e.g. by the need of
    left factorization or conflict resolution).
* Abstract syntax allows normalization: different syntax with the same
    meaning is handled only once (e.g A[i][j] and A[i,j] in Pascal).
* Abstract syntax allows simplification: in concrete syntax
    you have to deal with operator precedence, abstract syntax has an
    unambiguous structure.
* Modern languages (such as Java) allow the use of items that
    are not yet declared at the point of application. Such forward references
    are no problem in compilers that use an internal program representation.
    Algorithms based on abstract syntax trees may visit nodes several times
    and in any order. Even if you think that you can process the source
    in one pass from left to right, it is likely that you will be confronted
    with an unforeseen problem that cannot be solved in this way.
* Abstract syntax allows for automatically checking the consistency
    of the translation. Errors are discovered when processing the specification.

Linguistic support for defining and processing abstract syntax includes:

* Type definitions

Abstract syntax describes the hierarchical decomposition of programs.
A given item is constructed according to one of of several alternatives,
an alternative is constructed by certain constituents. Hence abstract syntax
is best described by a system of (recursive) types, where a type is
defined by listing its alternatives and an alternative is defined by listing
its constituents. For example, a type STATEMENT could be defined by

            assignment(VARIABLE, EXPRESSION)
            while(EXPRESSION, STATEMENT)
            /* ... */

A STATEMENT is an assignment, a while statement, ... .
An assignment is given by a VARIABLE and an EXPRESSION.
A while statement is given by an EXPRESSION and a STATEMENT.

* Constructors and Patterns

Elements of the abstract program may be denoted by terms. The term

        assignment(V, E)

may be a constructor that builds a statement S from a variable V
and an expression E. It may also be considered as a pattern: matching
a statement S against this pattern succeeds if S is an assignment,
thereafter V and E denote the constituents of that assignment.

* Attribution of concrete syntax grammars

The concrete syntax of a language is described by a context-free grammar.
For example, a concrete assignment statement is described by

      Stmt : Var ":=" Expr

The corresponding abstract syntax can be introduced by decorating this rule
with attributes:

      Stmt(-> assign(V, E)) : Var(-> V) ":=" Expr(-> E)

which means that if Var has the abstract representation V and Expr is
represented by E then the whole phrase has the abstract syntax
assign(V, E).

* Rule-based traversal of abstract syntax

Traversal of abstract syntax (for semantic analysis or code generation)
may be expressed by rules (similar to grammar rules). Very often these rules
follow the structure of the abstract syntax types: there is a rule for each
alternative, a rule processes the constituents of the alternative.
Rules are selected by pattern matching. For example, the rule

      StmntCode(assign(V, E)) : VarCode(V) ExprCode(E) STORE

expresses that the code for assign(V, E) is given by the code for V,
followed by the code for E, followed by the instruction STORE.

For simple source-to-source translation first-fit rule selection
is sufficient. For advanced code generation rules may be augmented with
cost values, and dynamic programming can be used for optimal rule selection.

A book presenting this approach is available online
(and also as PostScript file or HTML archive for downloading):

Friedrich Wilhelm Schroeer

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.