Re: Sytnax Tree Construction

blackmarlin@asean-mail.com (C)
9 Sep 2003 23:00:58 -0400

          From comp.compilers

Related articles
Sytnax Tree Construction chris@labino.net (2003-09-04)
Re: Sytnax Tree Construction blackmarlin@asean-mail.com (2003-09-09)
Re: Sytnax Tree Construction yves@news.be (Yves Deweerdt) (2003-09-09)
| List of all articles for this month |

From: blackmarlin@asean-mail.com (C)
Newsgroups: comp.compilers
Date: 9 Sep 2003 23:00:58 -0400
Organization: http://groups.google.com/
References: 03-09-024
Keywords: parse, analysis
Posted-Date: 09 Sep 2003 23:00:58 EDT

chris@labino.net (Chris) wrote in message news:03-09-024...


> Given that, im still slightly confused on the concept of a tree
> representation of a program.


As John said, there is no real concensus on how exactly a tree should
be constructed (unless you are targeting an existing backend such as
GCC), the basic rule of thumb is to write the simplest format for your
interpreter or compiler backend to read. For the code example you
gave I would generate a tree something like the following, others
will, no doubt, construct it slightly differently. (Tree shown on its
side.)


+-- (block)
    L-- (list)
    | +-- (assignment)
    | | L-- (identifier) : x
    | | R-- (addition)
    | | L-- (identifier) : x
    | | R-- (integer) : 2
    | |
    | +-- (block)
    | L-- (list)
    | +-- (condition)
    | | L-- (compare_equal)
    | | | L-- (identifier) : x
    | | | R-- (integer) : 12
    | | |
    | | R-- (list)
    | | +-- (print_fn)
    | | L-- (string) : "x=12"
    | |
    | +-- (condition)
    | | L-- (compare_equal)
    | | | L-- (identifier) : x
    | | | R-- (integer) : 14
    | | |
    | | R-- (list)
    | | +-- (print_fn)
    | | L-- (string) : "x=14"
    | |
    | +-- (list)
    | +-- (call_fn)
    | L-- (identifier) : function
    |
    R-- (repeat_condition)
        L-- (compare_not_equal)
            L-- (identifier) : x
            R-- (integer) : 20


This structure reuses several constructs, for example the (block) node
is used for both DO..LOOP and IF..ENDIF syntax. L/R indicate
left/right arm of a standard Node, + indicates an element of a List
node -- see below.


Note: This may well be overkill for BASIC, though it works quite
nicely for C/C++ and my pet language.


This sort of tree may be held in a simple structure consisting of a
linked list and a content node, such as ..


struct List { /* list of nodes */
    struct Node *content; /* node stored in list */
    struct List *next; /* next node on list */
};


struct Node { /* standard node */
    struct Node *left; /* content of this node */
    struct Node *right; /* body of this node */
    struct Node *root; /* to search back down tree */
    struct Data *value; /* data (if node is integer...) */
    struct List *list; /* list of nodes */
    int type; /* type code of node */
    int status; /* for generating unique labels */
};


or, if you are using an object oriented language you can have a
different object class for each node, tailored to both generate the
correct code and contain the right data, there is however a couple of
disadvantages with doing this (1) the -block- node can contain a
number of nodes (so is required to be a linked list or similar
construct) and (2) allocations are variable sized (so you cannot write
a highspeed custom allocator to replace individual malloc() or new
operations). The advantage is that code is all in one place,
therefore you do not get the ugly switch as shown below. Either way
such a structure is quite easy to construct with Bison and similarly
organised parsers.


The tree will then need to be checked. This should spot undeclared
variables, functions, &c. plus constructs the parser may miss. This
is quite simple for some languages BASIC being one as all variables
are global, though others are far more complex.


To generate the code you simply traverse the tree. (Which you could
have performed optimisations on beforehand.) For most of the nodes
you can generate code from a standard template. This could be done
using the function...


void traverseTree( Node * t ) {
    ASSERT( t != NULL );
    if( t->list ) traverseList( t->list );
    else if( isEquationType( t ) )
        buildEquation( t ); /* result in eax */
    else {
        generateEntryCodeUsingTemplate( t );
        if( t->left ) traverseTree( t->left );
        generateTestCodeUsingTemplate( t );
        if( t->right ) traverseTree( t->right );
        generateExitCodeUsingTemplate( t );
} }


void traverseList( List * l ) {
    ASSERT( l != NULL );
    do {
        ASSERT( l->content != NULL );
        traverseTree( l->content );
    } while( ( l = l->next ) != NULL );
}


void generateEntryCodeUsingTemplate( t ) {
    static int status = 0;
    ASSERT( t != NULL );
    switch( t->type ) {
    case node_block:
        printf( "begin_%u:\n", t->status = ++status );
        break; case node_condition:
    case node_repeat_condition:
        t->status = ++status;
        break;
    case /* add more here ... */




        break;
    default:
        ASSERT( FALSE ); /* oops */
} }


void generateTestCodeUsingTemplate( t ) {
    ASSERT( t != NULL );
    switch( t->type ) {
    case node_block:
        break;
    case node_condition:
        printf( "test eax, eax\njz next_%u\n", t->status );
        break;
    case node_repeat_condition:
    { struct Node *r = t;
        while( r->type != node_block ) r=r->root;
        printf( "test eax, eax\njnz begin_%u\n", r->status );
    } break;
    case /* more */




        break;
    default:
        ASSERT( FALSE ); /* oops */
} }


void generateExitCodeUsingTemplate( t ) {
    ASSERT( t != NULL );
    switch( t->type ) {
    case node_block:
        printf( "end_%u:\n", t->status );
        break;
    case node_condition:
    { struct Node *r = t;
        while( r->type != node_block ) r=r->root;
        printf( "jmp end_%u\nnext_%u\n", r->status, t->status );
    } break;
    case node_repeat_condition:
        break;
    case /* more... */






        break;
    default:
        ASSERT( FALSE ); /* oops */
} }


Note: This code only demonstrates the basic algorithm, it is both
incomplete and inefficient (in that the data structures could be made
much smaller and List merged into Node -- though that would make
things a little less clear). The IF/ELSIF/ELSE/ENDIF and DO/LOOP
WHILE constructs are implemented, though not much else. Implmenting C
style break/continue statements is trivial using this structure.


> Once the tree is built, is it neccessary to translate that
> representation into another intermediate representation in a
> non-optimizing compiler, or would directly generating code be ideal?


As demonstrated, the latter. Though a full optimising compiler would
both optimise the tree (constant propagation is quite easy with this
format and loop unrolling is not difficult) and then generate a series
of assembler instruction tupples and optimise them (normally using
peephole optimisation techniques). More advanced compilers will
reorder entire blocks of code, there are many book and web pages on
the subject -- I will not go any further down this route now (this
post is long enough).


> Thanks for any help!


Always welcome.


C
2003/9/5


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.