Re: C Compiler in C++

"Rodney M. Bates" <>
13 May 2002 01:08:10 -0400

          From comp.compilers

Related articles
C Compiler in C++ (2002-05-08)
Re: C Compiler in C++ (2002-05-12)
Re: C Compiler in C++ (Diego Novillo) (2002-05-12)
Re: C Compiler in C++ (2002-05-12)
Re: C Compiler in C++ (2002-05-12)
Re: C Compiler in C++ (Rodney M. Bates) (2002-05-13)
Re: C Compiler in C++ (Lex Spoon) (2002-05-13)
Re: C Compiler in C++ (2002-05-17)
Re: C Compiler in C++ (2002-05-17)
Re: C Compiler in C++ (2002-05-17)
Re: C Compiler in C++ (2002-05-17)
Re: C Compiler in C++ (Joachim Durchholz) (2002-05-23)
[1 later articles]
| List of all articles for this month |

From: "Rodney M. Bates" <>
Newsgroups: comp.compilers
Date: 13 May 2002 01:08:10 -0400
Organization: EarthLink Inc. --
References: 02-05-039
Keywords: C, OOP, design
Posted-Date: 13 May 2002 01:08:10 EDT

Nemanja Stanarevic wrote:

> However, I have no idea how to go about representing C variable
> declarations in the parse tree. The problem is that C grammar itself
> is very loose on rules for variable declarations.

First, I strongly recommend an Abstract Syntax Tree (AST) instead
of a parse tree AKA derivation tree.

The most confusing thing about C declarations is that the type
constructors are inconsistent. All type constructors, viewed
semantically, construct a new type from other, already known types and
values. Syntactically, most type constructors have the already-known
things as subconstructs, i.e. they are below the type constructing
node, in whichever kind of tree you have.

The C declarators reverse this. They put an already-known type above
the type constructor and the name of the to-be-declared type or
variable below. Whether this is good or bad is bait for a major flame

But this inverted syntax cannot be applied consistently, because a
tree node, while having as many children as you like, can have only
one parent. So when you need to construct a new type out of more than
one already-known type or value, it won't work. The only type
constructor in C that uses this syntax style consistently is the
pointer declarator, which only needs one already-know construct,
namely, the referent type of the being-declared pointer type.

When it comes to arrays, the element type is above in the tree, but
the bounds are below. Similary, the result type of a function is
above, but its formal parameter types are below. The structs, unions,
and enums need multiple already-known constructs, but none of them is
distinguished, so they are all the normal way, i.e. below the
constructor, which is now called a type-specifier, not a declarator.

You can also think of all the storage class specifiers, type
qualifiers, etc. as following the normal syntax, i.e, they lie below
the new type constructor node, although this is not so significant,
because they are just single tokens, not subtrees of unbounded size.

In my experience, many C programmers do not really understand the
mixture of inverted and upright syntax for the type constructors and
avoid needing to by not writing very many instances of nested
constructors. That way, you can just treat each single type
constructor as linear, rather than tree structured, having the various
pieces in some order that you have memorized. I only came to really
understand them after having to build an AST for C declarations. When
doing that, you have to understand and handle the full generality of
all the ways they can be syntactically and semantically nested.

Which finally leads me to the point of this whole diatribe. I
designed an abstract tree with type constructor nodes for all of the
types, all of which had all of their constituents below them. During
parsing and AST construction, I reversed the sense of nesting for
pointer referents, array elements, and function results, while keeping
it the same as in the concrete syntax for every other relationship.
This was fairly confusing, and I spent a lot of time figuring it out,
but it simplified subsequent phases immensely.

P.S. I did not use a subclass hierarchy for the various kinds of AST
nodes, because I didn't have an object-oriented language. I used
variant records instead. But that is an orthoganal question to what I
have discussed here, namely the design of the abstract tree grammar.

(They also use already known values as well as types, e.g. for the
bounds of an array type, but to

Rodney M. Bates

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.