Re: Building a parse tree that reflects C Semantics

"Ira Baxter" <idbaxter@semdesigns.com>
20 Oct 2002 22:51:08 -0400

          From comp.compilers

Related articles
Building a parse tree that reflects C Semantics vbdis@aol.com (VBDis) (2002-10-13)
Re: Building a parse tree that reflects C Semantics grosch@cocolab.de (Josef Grosch) (2002-10-18)
Re: Building a parse tree that reflects C Semantics idbaxter@semdesigns.com (Ira Baxter) (2002-10-20)
Re: Building a parse tree that reflects C Semantics whopkins@alpha2.csd.uwm.edu (Mark) (2002-10-20)
Re: Building a parse tree that reflects C Semantics rbates@southwind.net (Rodney M. Bates) (2002-10-20)
Re: Building a parse tree that reflects C Semantics david.thompson1@worldnet.att.net (David Thompson) (2002-10-25)
Re: Building a parse tree that reflects C Semantics rbates@southwind.net (Rodney M. Bates) (2002-11-06)
Re: Building a parse tree that reflects C Semantics idbaxter@semdesigns.com (Ira Baxter) (2002-11-07)
| List of all articles for this month |

From: "Ira Baxter" <idbaxter@semdesigns.com>
Newsgroups: comp.compilers
Date: 20 Oct 2002 22:51:08 -0400
Organization: Compilers Central
References: 02-10-058
Keywords: parse, optimize
Posted-Date: 20 Oct 2002 22:51:08 EDT

"Josef Grosch" <grosch@cocolab.de> wrote in message
> > How to create an parse tree in general?
>
> I would build an abstract syntax tree that reflects the semantics of a
> language and not its concrete syntax. This might require some effort
> for mapping the concrete syntax to the abstract syntax during tree
> construction. However, usually that is profitable because it greatly
> simplifies semantic analysis and code generation.
>
> > How to interpret the type declaration syntax of C, in order to obtain
> > general type descriptions for every declaration?
>
> The concrete syntax of declarations in C and C++ is weird in my humble
> opinion. [snip]


> If the abstract syntax would represent the concrete syntax directly
> then the structure of the information would be weird. First, the name
> of a variable would be found at the bottom of the type tree of a
> declarator. Second, the base type would not be included in the type
> tree of a declarator but it would be found in the specifier list mixed
> with non-type specifiers (attributes).


Another view is to build the parse tree that comes naturally as a
result of the grammar, and use attribute evaluation to collect pieces
from where they occur, and propagate them to the node representing the
declaration, where the declaration parts are processed and inserted in
the symbol table.


The argument in favor of a "semantically oriented" AST sounds great on
the surface, and I'm all in favor of it when you can make it work.
But all represesentations are good at one task, and not so good at
others. So when you choose a "semantically oriented" AST, you are in
essence choosing a particular set of tasks for which the
representation does well. The words "semantically oriented"
unfortunately do not generally tell you what that set of tasks is.


>For our C and C++ parsers we have designed a tree representation which
>describes the name and the type of a variable separately and
>completely. Also, the specified list is converted into a bit vector
>representation.


So, this representation is optimized for doing bit lookups for the
specifier list. I assume the bit vector representation is really
that, and not a set of tree nodes representing a bit vector.


We build tools that are intended to transform the program structure,
for arbitrary source languages. We wish to write transforms that can
be expressed in terms of the surface syntax of the language, e.g.,


            c = c + 1 ==> c++


To do this, we desire that language fragments (patterns) be mapped to
the same structures as entire programs;. So we've chosen a
representation which optimizes on matching trees. We too chose trees
to represent such structures, but not the "semantically oriented"
ones, especially not ones with specialized representations such as bit
vectors, because you can only produce such representations if you
parse "larger chunks" (i.e., then entire specifier list). If you
insist on this approach, it makes building either the fragment parser
considerably more difficult (it has to understand how to build
specialized representations, possibly in the face of not having enough
information, e.g., build a partial bit vector), or the matcher more
difficult (how do I match a fragment of a specifier list against a bit
vector representing the entire list), or both.


I don't doubt that you might be able combine these two techniques with
enough effort. We didn't have that much available energy :-{


Simple moral: how you represent your "tree" depends on what you want
to do.


FWIW, we don't build a pure concrete tree. For space reasons
(representing big programs) the parser compresses out
non-value-carrying terminal nodes and unary productions, and yes, that
makes the pattern match more complicated. I will say our system seems
pretty reasonable for carrying out transforms, so we got what we
wanted; YMMV.


--
Ira D. Baxter, Ph.D., CTO 512-250-1018
Semantic Designs, Inc. www.semdesigns.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.