Re: Representing an AST

ok@cs.rmit.edu.au (Richard A. O'Keefe)
21 Jun 1996 17:09:41 -0400

          From comp.compilers

Related articles
Representing an AST rob@aimnet.com (1996-06-14)
Re: Representing an AST ok@cs.rmit.edu.au (1996-06-21)
| List of all articles for this month |

From: ok@cs.rmit.edu.au (Richard A. O'Keefe)
Newsgroups: comp.compilers
Date: 21 Jun 1996 17:09:41 -0400
Organization: Comp Sci, RMIT, Melbourne, Australia
References: 96-06-060
Keywords: analysis

rob@aimnet.com (Rob Duncan) writes:
>I'd be very grateful if someone could give me some advice on
>representing an abstract syntax tree for a fairly typical block
>structured programming language using Prolog terms.


>Is it better to represent each node of the AST with a separate fact,
>with arguments referring to labelled children?


Why on earth are you putting the AST in the data base at all?


You have a TREE, for heaven's sake, so why not represent it as a TREE?


>There will be thousands of nodes in the parse tree sometimes, and there
>are tens of different types of nodes. Performance is an issue.


There is a central rule of thumb for Prolog programming:
        when performance is an issue, DON'T change the data base,
        use terms.
- changing the data base can be *thousands* of times slower than building
    a term (which is very fast)
- terms are automatically garbage collected; facts in the data base have
    to be retracted explicitly. This means that terms fit very well with
    a backtracking parser, and facts in the data base are a very bad fit.
- facts in the data base tend to be much bigger than terms


>I'm sure there are standard techniques for doing this. Where can I look
>for references and examples?


How about David Warren's paper on compiling with Prolog?
Or how about Sterling and Shapiro, 2nd edition?


Let's take expressions as an example.
Every node in an expression has a type.
If you are keeping track of source positions, it also has a source position.


:- type expression_node -->
e(expression_node_variant, type_term, source_position).


:- type expression_node_variant -->
const(number)
        | id(atom)
        | unary(operator, expression_node)
        | binary(operator, expression_node, expression_node)
        | cond(expression_node, expression_node, expression_node).


Having that, we can state static conditions easily:


well_typed(e(Expr,Type,_), Context) :-
        well_typed_e(Expr, Type, Context).


well_typed_e(const(_), number, _).
well_typed_e(id(Ident), Type, Context) :-
        look_up_type(Context, Ident, Type).
well_typed_e(unary(Op,E), Type, Context) :-
        well_typed(E, T, Context),
        unary_op_type(Op, T, Type).
well_typed_e(binary(Op,E,F), Type, Context) :-
        well_typed(E, T, Context),
        well_typed(F, U, Context),
        binary_op_type(Op, T, U, Type).
well_typed_e(cond(C,E,F), Type, Context) :-
        well_typed(C, boolean, Context),
        well_typed(E, Type, Context),
        well_typed(F, Type, Context).


For example, the parser would convert (-x)/(x+1) to


e(binary(/,
e(unary(-,
e(id(x), T1, P1)), T2, P2),
e(binary(+,
e(id(x), T3, P3),
e(const(1), T4, P4)), T5, P5)), T6, P6)


where the P variables would actually have been filled in by the tokeniser.
Now if we type check this, the type check would not only succeed (showing
that the expression was well typed), but it would also fill in the type
variables.


--
Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.