Re: OOP Parse tree generation?

Chris F Clark <>
3 Apr 2000 11:30:39 -0400

          From comp.compilers

Related articles
[3 earlier articles]
Re: OOP Parse tree generation? (Tom Payne) (2000-03-23)
Re: OOP Parse tree generation? (Nicolas Ojeda Bar) (2000-03-23)
Re: OOP Parse tree generation? (2000-03-23)
Re: OOP Parse tree generation? (George Nelan) (2000-03-23)
Re: OOP Parse tree generation? (Daniel C. Wang) (2000-03-25)
Re: OOP Parse tree generation? (Chris F Clark) (2000-04-03)
Re: OOP Parse tree generation? (Chris F Clark) (2000-04-03)
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: 3 Apr 2000 11:30:39 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 00-03-098
Keywords: OOP, parse

Nicolas wrote:
> Like Quinn, I also end up with a base class ...
> added subclases for the commodity of having
> constructors to fill in the proper union members, and destructors to
> allocate the proper union members. This left me with a good - sized
> base class, and lots (LOTS) of minimal classes with only constructors
> and destructors, with represented each posibility of the union. This
> of course does not avoid the class explosion.
> ... the base class will NOT have a union, and instead each union member
> will be contained in the subclass.
> ... if I want to access the particular data
> of a particular subclass, I have to typecast.

This is essentially one "correct" oop design. Essentially the base
class contains all the interface functions and possibly much of their
implementation. It makes the base class fairly "fat". It also
results in "many" minimal classes for each of the leaf types.

However, as George Nelan pointed out, the number of leaf classes is
linear in the number of alternative rules in your grammar, so while it
is "many", it exactly corresponds to the complexity of your grammar.
The Verilog project I am currently working on probably has 500 leaf
classes. However, most of the time I am only concerned with fewer
than a dozen for some particular aspect of the language. Further, you
will note that there are many tools such as Cocktail, JTB, JJTree,
SableCC, and Yacc++ that will create the leaf classes for you.

BTW, there is another set of tools (e.g. ANTLR and RDP) that make
trees where all the nodes are of the same class. It's another correct
OOP design, just reflecting a different set of tradeoffs.

There is only one thing missing from this design, intermediate base
classes that group similar leaf classes into groups. For example,
your grammar probably has several binary operators that all describe a
leaf class with two operands and the operator. It is quite useful to
make an intermediate base class for those classes. The intermediate
base class holds the two operand (pointers). The leaf class only
supplies the final operator that connects them.

If you make such intermediate classes, you will find that some of the
base class code moves into them making the base class a little
smaller, without moving it all the way to the leaf classes where the
number of copies explodes. (There are probably a dozen or two base
classes in the Verilog grammar I described above and when working on
one particular aspect of the language I generally have to only deal
with two or three of them at a time.)

The one thing about your implementation that worries me a bit is that
you mention typecasting to get at particular data. That suggests that
you may not have put some of the functions in the right spot. In
general, if your base class has the "right" functions (and they are
overridden by the "right" implementations in the derived classes), you
should rarely (almost never) need a typecast. Of the ten or so
typecasts in the code, half of them are simply to verify the tree is
correctly structured (i.e. the code does a dynamic cast to verify that
the operand is of an expected type, lest I have a bug that mangled my
tree). Another two or three were simply laziness on my part where I
could have factored the code differently and avoided the type cast.
Finally, there are two or three that actually are there to make the
code work.

Use two things to avoid typecasts.

- - Put virtual accessor functions in the base class.
e.g. get_left_operand() and set_left_operand(). Depending on
the circumstance the base class accessor either returns some
sort of special "invalid/innocuous" result (e.g. spelling()
returning a null string) or it asserts out.
    The various intermediate and leaf classes override the accessors
    as appropriate. (e.g. the binary expression base class implements
    get_left_operand() to return the correct operand).

- - Put the data pointers where they belong in the hierarchy and make
    them point to the correct base class (sometimes the root and sometimes
    an intermediate class).

The only place where this leaves one with typecasts is when a leaf
type has an operand that is more general in the base class where it is
defined. This is merely an artifact of C++ not allowing an overriding
function to return a more strict return type. (I.e. if one were using
ML or Sather, that might not be an issue.) Moreover, the typecast
only occurs when the semantics require some feature of the more strict

Of course, this can also be solved with additional accessor functions
(e.g. get_left_integer_operand() which returns null from the base
class where the operand cannot be guaranteed to be integral, but the
operand from the derived class where the guarantee can be made).

Note, the one thing which makes this style work is the willingness to
insert base classes into the hierarchy as needed. All but three or
four of the base classes are the result of refactoring the design as
the semantic implementation has proceeded. This should not be that
painful if you are using a tool to generate your leaf classes. It is
probably not even that painful if you have hand written classes. The
only classes that change are the ones getting a new base class.

Hope this helps,
- -Chris

Chris Clark Internet :
Compiler Resources, Inc. Web Site :
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.