|Representing an AST email@example.com (1996-06-14)|
|Re: Representing an AST firstname.lastname@example.org (1996-06-21)|
|From:||email@example.com (Richard A. O'Keefe)|
|Date:||21 Jun 1996 17:09:41 -0400|
|Organization:||Comp Sci, RMIT, Melbourne, Australia|
firstname.lastname@example.org (Rob Duncan) writes:
>I'd be very grateful if someone could give me some advice on
>representing an abstract syntax tree for a fairly typical block
>structured programming language using Prolog terms.
>Is it better to represent each node of the AST with a separate fact,
>with arguments referring to labelled children?
Why on earth are you putting the AST in the data base at all?
You have a TREE, for heaven's sake, so why not represent it as a TREE?
>There will be thousands of nodes in the parse tree sometimes, and there
>are tens of different types of nodes. Performance is an issue.
There is a central rule of thumb for Prolog programming:
when performance is an issue, DON'T change the data base,
- changing the data base can be *thousands* of times slower than building
a term (which is very fast)
- terms are automatically garbage collected; facts in the data base have
to be retracted explicitly. This means that terms fit very well with
a backtracking parser, and facts in the data base are a very bad fit.
- facts in the data base tend to be much bigger than terms
>I'm sure there are standard techniques for doing this. Where can I look
>for references and examples?
How about David Warren's paper on compiling with Prolog?
Or how about Sterling and Shapiro, 2nd edition?
Let's take expressions as an example.
Every node in an expression has a type.
If you are keeping track of source positions, it also has a source position.
:- type expression_node -->
e(expression_node_variant, type_term, source_position).
:- type expression_node_variant -->
| unary(operator, expression_node)
| binary(operator, expression_node, expression_node)
| cond(expression_node, expression_node, expression_node).
Having that, we can state static conditions easily:
well_typed(e(Expr,Type,_), Context) :-
well_typed_e(Expr, Type, Context).
well_typed_e(const(_), number, _).
well_typed_e(id(Ident), Type, Context) :-
look_up_type(Context, Ident, Type).
well_typed_e(unary(Op,E), Type, Context) :-
well_typed(E, T, Context),
unary_op_type(Op, T, Type).
well_typed_e(binary(Op,E,F), Type, Context) :-
well_typed(E, T, Context),
well_typed(F, U, Context),
binary_op_type(Op, T, U, Type).
well_typed_e(cond(C,E,F), Type, Context) :-
well_typed(C, boolean, Context),
well_typed(E, Type, Context),
well_typed(F, Type, Context).
For example, the parser would convert (-x)/(x+1) to
e(id(x), T1, P1)), T2, P2),
e(id(x), T3, P3),
e(const(1), T4, P4)), T5, P5)), T6, P6)
where the P variables would actually have been filled in by the tokeniser.
Now if we type check this, the type check would not only succeed (showing
that the expression was well typed), but it would also fill in the type
Richard A. O'Keefe; http://www.cs.rmit.edu.au/~ok; RMIT Comp.Sci.
Return to the
Search the comp.compilers archives again.