A Note on Designing an Abstract Syntax

Stephen J Bevan <bevan@computer-science.manchester.ac.uk>
Fri, 30 Oct 1992 20:56:38 GMT

          From comp.compilers

Related articles
A Note on Designing an Abstract Syntax bevan@computer-science.manchester.ac.uk (Stephen J Bevan) (1992-10-30)
Re: A Note on Designing an Abstract Syntax bevan@computer-science.manchester.ac.uk (Stephen J Bevan) (1992-11-04)
| List of all articles for this month |

Newsgroups: comp.compilers
From: Stephen J Bevan <bevan@computer-science.manchester.ac.uk>
Organization: Compilers Central
Date: Fri, 30 Oct 1992 20:56:38 GMT
Keywords: design, syntax, bibliography

I think I remember a couple of messages recently that asked about abstract
syntax. I also think I saw a followup that explained what it was and
pointed at McCarthy's first papers. However, beyond defining what it is,
it is rare to see any discussion of it i.e. how to design an AS ... etc.,
I therefore thought the following might be of some interest. It is taken
from some notes I made about AS and briefly describes some of the problems
concerning the interplay with defining context conditions (static
semantics) and designing an AS.

The note is take from a LaTeX document which uses a VDM style file, but in
the following I've stripped out most of the LaTeX (I've just left
headings) and ASCIIfied the VDM symbols. Hopefully I've done this
consistently and used symbols that look something like the VDM originals.
If you have any problems following it, just mail me. (If you don't know
what VDM is, there is a reference for it at the end).


Proposal :-

Try and represent as many possible context conditions in the abstract
syntax [McCarthy63,McCarthy64] of the language as structural constraints.
The motivation being that structural constraints are sometimes: relatively
easy to check and easier for the average reader to read that constraints
expressed with equations. These are subjective arguments, but if they
were not at least partly true language definitions would not contain a BNF
description or its equivalent.

\section{Level 1 Abstract Syntax}

Consider the following AS definition for a Pascal "program" :-

program :: decls: Decl* stmts: Stmt*
Decl = VarDecl | ConstDecl | TypeDecl | ProcDecl
ConstDecl :: name: Id value : ConstExression
VarDecl :: name: Id type: Type
Type = Integer | Boolean | Char | Real | Array | Record | Ref
Array :: lb: ConstExpression ub: ConstExpression type: Type
Statement = IfStmt | AssignStmt | WhileStmt | RepeatStmt ...
AssignStmt :: name:Id value: Expression
ConstExpression = Expression
Expression = LiteralInteger | LiteralChar | ... | Ref

This almost exactly matches the CS as given by the BNF. Consequently the
mapping of the CS to the AS is trivial. However, an AS with this form
does nothing to enforce the context conditions that go along with the BNF
definition in the reference manual :-

\item all declarations must have a unique name.
\item the declaration of an entity must preceed the any reference to
            that entity (pointer types are a special case).
\item any expression used in the definition of array bounds or a
            "const" must be constant i.e. only contain literal values or
            references to "const" defined values.

The first condition can easily be defined directly on the structure as it
is defined :-

wf-Program(mk-Program(decls,stmts)) =
    forall a,b in decls
        name(a) = name(b) -> a = b

forall i in inds decls
    case decls(i) of
        mk-ConstDecl(n,v) ->
        mk-VarDecl(n,t) ->
            mk-Array(lb, ub, ...) ->
                dbu(lb,decls(1..i-1)) /\ dbu(ub,decls(1..i-1))

dbu : Expr x Decl* -> Bool
dbu(expr,decls) =
    case expr of
        mk-Ref(id) -> exists d in decls . name(d) = id
        ... -> true

Note the above isn't the usual method that is used to check define before
use; the more common method is to pass along one or more "environments"
whilst doing a depth first walk over the AST. Note also that the above
method will only work for top level declarations in a program.

The third condition can be defined as :-

    forall d in decls .
        case d of
            mk-ConstDecl(n,v) ->
            mk-VarDecl(n,t) ->
                case t of
                    mk-Array(lb, ub, ...) ->
                        is-constant(lb, decls) /\ is-constant(ub, decls)

is-constant(expr, decls) =
    case expr of
        mk-Ref(id) -> exists d in decls . name(d) = id /\ d in ConstDecl
        ... -> true

This examines each declaration to see if it is a constant or variable
declaration. In the case of the former, and the latter if it is defining
an array, a check is made to see if the appropriate expressions are
constant. An expression is constant if it contains only literal values
(denoted by the ... -> true section) or if it is a reference to a value
which was defined via "const".

\section{Another Level 2 Abstract Syntax}

In an attempt to enforce some of the conditions structurally, the
following AS could be used :-

program ::
    decls: Id -m-> Decl
    stmts: Stmt*

i.e. declarations are changed from a sequence to a mapping from the
declaration name to the declaration itself. Uniqueness is now guaranteed
by the definition of a "map" in VDM. However, the definition of whether
this :-

var a : int;
var a : real;

is regarded as an error, or is just ignored, depends on what mechanism is
used to build up the mapping. If "+" (map overwrite) is used, then this
would indicate that duplicate definitions could be ignored and the latter
would always be chosen. If "U" (map union) is used, then is is regarded
as invalid to add an element if it is already in the map, and this could
therefore be used if it is an error to have duplicate definitions.

The following shows how this would be done for Pascal where duplicate
values are not allowed :-

retr-Program(mk-C-Program(decls, ...)) =
    mk-A-Program(retr-Decls(decls, {}), ...)

retr-Decls :: C-Decl* x (Id -m-> A-Decl) -> (Id -m-> A-Decl)
retr-Decls(decls,result) =
    case decls of
        [] -> result
        (h:t) -> retr-Decls(t, result U {name(h) |--> h})

retr-Decls has been written in a tail recursive manner not for efficiency
(though it is always helpful) but because this more closely models the
order relationship i.e. declarations should be added in the order they are
defined in the original program.

The decision whether duplicates are allowed or not is controlled by one
symbol. This is rather subtle, and the implications probably would not be
immediately clear to the average reader. The description would therefore
have to be clearly annotated to explain the significance of the chosen

Alternately, the mapping could be packaged into an ADT with more familiar
names for the operations.

Although this simplifies checking the first rule, it does nothing for the
other two rules, and actually makes the second one impossible, as the
ordering relationship on the nodes has dissapeared. This restriction
would therefore have to be defined on the level 1 AS as before.

Checking the third restriction can be slightly simplified as declarations
are now indexed by their name :-

is-constant(expr, decls) =
    case expr of
        mk-Ref(id) -> decls(id) in ConstDecl

\section{Another Level 2 Abstract Syntax}

The previous AS simplified the uniquness condition, but actually made the
dbu condition impossible to enforce. Another variation that also has this
property is that used by Andrews&Henhapl [AndrewsHenhapl82] in their
definition of Pascal :-

program ::
    vars: Id -m-> Var
    consts: Id -m-> Const
    types: Id -m-> Type
    procs: Id -m-> Proc
    stmts: Stmt*

This partitions the declarations into separate maps in order to make it
easier to define conditions that are only concerned with declarations of a
certain class e.g. only constant declarations.

is-constant(expr, c-decls) =
    case expr of
        mk-Ref(id) -> id in c-decls


Each of the preceeding ASs provided different views on the underlying AS
in order to simplify one or more aspects of the context conditions. As
the conditions conditions vary in type, it is not suprising that there is
no single structure that is natural for all conditions.

One solution therefore is to provide multiple views of the data, each
tailored for a specific context condition. Whilst this could lead to a
combinatorial explosion, it is envisaged that only a relatively small
number of views would be necessary in practice as there would be
considerable overlap between similar conditions.

For each view, it is necessary to provide a mapping from the CS to that
view. This is often ommited, but it is a vital part of the definition.

Perhaps because this mapping is necessary for any AS other than the first
level one, ASes beyond the first level aren't used very often.

Requring an explicit mapping has the effect of making the definition much
larger than an AS(1) version. This can/could be offputting.


{ AndrewsHenhapl82
, author= {Derek Andrews and Wolfgang Henhapl}
, chapter= {7}
, pages= {175--252}
, title= {Pascal}
, crossref= {BjornerJones82}

{ BjornerJones82
, author= {Dines Bj{\o}rner and Cliff B. Jones}
, title= {Formal Specification \& Software Development}
, booktitle= {Formal Specification \& Software Development}
, publisher= {Prentice Hall International}
, year= {1982}
, series= {Computer Science}

{ Jones86
, author= {Cliff B. Jones}
, title= {Systematic Software Development Using VDM}
, publisher= {Prentice Hall International}
, year= {1986}
, series= {Computer Science}

{ McCarthy63
, author= {J. McCarthy}
, title= {Towards a mathematical science of computation}
, booktitle= {Proceedings of the IFIP Congress}
, year= {1963}
, pages= {21--28}
, publisher= {North-Holland}

{ McCarthy64
, author= {John McCarthy}
, title= {A formal description of a subset of {ALGOL}}
, crossref= {Steele64a}
, pages= {1--12}

{ Steele64a
, title= {Formal Language Description Languages for Computer Programming,
Proceedings of an IFIP Working Conference}
, year= {1964}
, editor= {Steele, Jr, T. B.}

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.