Re: Building a parse tree that reflects C Semantics

"Josef Grosch" <grosch@cocolab.de>
18 Oct 2002 23:24:02 -0400

          From comp.compilers

Related articles
Building a parse tree that reflects C Semantics vbdis@aol.com (VBDis) (2002-10-13)
Re: Building a parse tree that reflects C Semantics grosch@cocolab.de (Josef Grosch) (2002-10-18)
Re: Building a parse tree that reflects C Semantics idbaxter@semdesigns.com (Ira Baxter) (2002-10-20)
Re: Building a parse tree that reflects C Semantics whopkins@alpha2.csd.uwm.edu (Mark) (2002-10-20)
Re: Building a parse tree that reflects C Semantics rbates@southwind.net (Rodney M. Bates) (2002-10-20)
Re: Building a parse tree that reflects C Semantics david.thompson1@worldnet.att.net (David Thompson) (2002-10-25)
Re: Building a parse tree that reflects C Semantics rbates@southwind.net (Rodney M. Bates) (2002-11-06)
Re: Building a parse tree that reflects C Semantics idbaxter@semdesigns.com (Ira Baxter) (2002-11-07)
| List of all articles for this month |

From: "Josef Grosch" <grosch@cocolab.de>
Newsgroups: comp.compilers
Date: 18 Oct 2002 23:24:02 -0400
Organization: CoCoLab, Achern, Germany
Keywords: parse, design
Posted-Date: 18 Oct 2002 23:24:02 EDT

> How to create an parse tree in general?


I would build an abstract syntax tree that reflects the semantics of a
language and not its concrete syntax. This might require some effort
for mapping the concrete syntax to the abstract syntax during tree
construction. However, usually that is profitable because it greatly
simplifies semantic analysis and code generation.


> How to interpret the type declaration syntax of C, in order to obtain
> general type descriptions for every declaration?


The concrete syntax of declarations in C and C++ is weird in my humble
opinion. A declaration for a variable should specify a name, a type,
and maybe an initial value as well as additional attributes. These
constituents are organized in a strange order in the concrete syntax
of C. Consider the following example:


      static unsigned int * a [5], b = 1;


A declaration in C begins with a specifier list (e. g. static unsigned
int). The specifier list describes the attributes (static) and the
"base type" (unsigned int) for a list of variables. The specifier list
is in general followed by a list of so-called declarators. A
declarator describes the name of a variable (a and b), its type (* [5]
meaning array of pointer), and optionally an initial value (1). In
general the name of the variable is found within the type
specification (* a [5]). However, the type specification at the
declarator is incomplete because the base type is missing. The base
type is given in the specifier list. In our example the complete type
of the variable a is "array of pointer to unsigned int".


If the abstract syntax would represent the concrete syntax directly
then the structure of the information would be weird. First, the name
of a variable would be found at the bottom of the type tree of a
declarator. Second, the base type would not be included in the type
tree of a declarator but it would be found in the specifier list mixed
with non-type specifiers (attributes).


For our C and C++ parsers we have designed a tree representation which
describes the name and the type of a variable separately and
completely. Also, the specifier list is converted into a bit vector
representation. For the above example a tree would be built as shown
by the following simplified XML notation:


<list_decl>
    specifier_l=<specifier_l> <s_static/> <s_unsigned/> <s_int/> </specifier_l>
    type=<simple_type specifiers='int unsigned'/>
    declaration_l=<statement_l>
    <var_decl specifiers='static int unsigned'>
        name=<name ident='a'/>
        type=<array> expression=<int_const value='5'/> </array>
            type=<pointer/>
                type=<simple_type specifiers='int unsigned'/>
    </var_decl>
    next=<init_decl specifiers='static int unsigned'>
        name=<name ident='b'/>
        type=<simple_type specifiers='int unsigned'/>
        expression=<int_const value='1'/>
    </init_decl>
    </statement_l>
</list_decl>


The node of type 'list_decl' is the head of a list of
declarations. Its child 'specifier_l' describes the specifier list as
given in the concrete syntax. This information is factored out for
all objects declared in the subtree rooted at the child
'declaration_l'. The child 'type' refers to a node from the class
'type' holding the type information contained in the specifier list.
In our example there is a node of type 'simple_type'. This node has an
attribute named 'specifiers'. This attribute holds a bit vector
representation of the specifiers describing types, e. g. 'unsigned
int'. The nodes of types 'var_decl' and 'init_decl' describe the
declared variables a and b. The subtrees name, type, and expression
represent the details according to an abstract syntax which reflects
the semantics.


More information about our abstract syntax for C++ can be found in the
document cpp.pdf that comes with the demo version of our C++ parser:


      http://www.cocolab.com -> Downloading -> cpp-win32.zip


> Where can I find sample C or C++ grammars (for download), with added
> actions for the semantical interpretation of the rules?


Our C grammar including a definition of an abstract syntax tree and
actions for tree construction can be downloaded from our web server:


      ftp://www.cocolab.com/products/cocktail/demo/examples.zip


Have a look into the directory cocktail/examples/c/c.
--
Dr. Josef Grosch
Email: grosch@cocolab.de


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.