From: | jdean@cs.washington.edu (Jeffrey Dean) |
Newsgroups: | comp.compilers |
Date: | 30 Jun 1996 16:40:24 -0400 |
Organization: | Computer Science & Engineering, U of Washington, Seattle |
References: | 96-06-047 96-06-064 96-06-117 96-06-134 |
Keywords: | OOP |
Greg Morrisett (jgm@CS.Cornell.EDU) wrote:
: This is a fundamental problem with taking the OO-paradigm too far.
: What we need are languages that allow programmers to view the organization
: or their programs in different ways in order to support different kinds
: of modifications. Ideally, when adding a new kind of primitive to the
: AST, the system would present the OO version, so that all I have to
: add is one new definition, with appropriate methods. But when I go to
: add a new transformation or optimization, I'd rather see the "old-
: fashioned" view of my program where I code one big procedure that
: switches on the different kinds of AST objects. This way, I don't have
: to touch all of the classes to add new methods.
Languages that support multi-methods, such as Cecil or Dylan, allow
you to structure code in both of the ways that you describe. For the
past few years, we (the Cecil group) have been developing Vortex, a
compiler for optimizing object-oriented languages. Vortex is entirely
written in Cecil, and as such, we've been able to take advantage of
Cecil's language features to structure code in both ways (i.e. in both
object-centric and function-centric), depending on which made sense
for the particular operation that we were defining. For example,
parts of our AST hierarchy are defined in an object-centric way, with
some of the methods relating to ASTs defined with the declarations of
the AST hierarchy. Here's some sample Cecil code showing this style
('@object_name' is Cecil syntax for specifying that a particular
method is specialized on that object: it is used during method lookup
to determine the most specific applicable method):
abstract object AST;
signature print_string(AST):string; -- All AST's understand print_string
abstract object decl isa AST; -- Declarations are one kind of AST
template object method_decl isa decl;
field method_name(@method_decl):string;
... -- A bunch of other field and method declarations omitted
method print_string(self@method_decl):string {
"method " || self.method_name } -- || is string concat operator
template object class_decl isa decl;
field class_name(@class_decl):string;
... -- A bunch of other field and method declarations omitted
method print_string(self@class_decl):string {
"class " || self.class_name }
On the hand, as you say, it sometimes makes sense group all the
methods of a generic function together. Most of our optimization
passes are structured this way, as are some of the . For example, we
have some code to compute the uses of each node in our intermediate
representation hierarchy. In this case, we grouped all the
definitions of uses methods together in a single source module:
method uses(t@rtl_assign):indexed[rtl_operand] { [t.value] }
method uses(t@rtl_unop):indexed[rtl_operand] { [t.arg] }
method uses(t@rtl_binop):indexed[rtl_operand] { [t.arg1, t.arg2] }
method uses(t@rtl_conversion):indexed[rtl_operand] { [t.value] }
method uses(t@rtl_nop):indexed[rtl_operand] { [] }
method uses(t@rtl_load):indexed[rtl_operand] { [t.base, t.offset] }
method uses(t@rtl_simple_store):indexed[rtl_operand] {
[t.base, t.offset, t.value] }
method uses(t@rtl_vector_creation):indexed[rtl_operand] { [t.length] }
...
: Another problem with the OO-approach is fixing the order of traversal
: in the Visitor pattern. In-order? Post-order? Pre-order? Visit
: right children first or left children? Depends a lot on the optimization
: or transformation you're performing. Perhaps this is the reason why
: attribute grammars and other forms of declarative tree/graph-rewriting
: seem attractive to compiler writers.
Writing in a language that supports closures (like Cecil or ML) allows
you to write user-defined control structures to handle these different
kinds of traversals. We have several different kinds of traversal
methods defined on our AST hierarchy, such as in_order, post_order,
etc., each of which takes a closure specifying what to do at each node
in the AST. Having these traversal routines lets you easily separate
out the code for the "work you want to do at each kind of node" from
the mechanics of walking the tree in the appropriate order.
Most of this thread has been devoted to a discussion of how to
structure AST hierarchies. However, in optimizing compilers, most of
the code deals with intermediate representations and optimization
passes over this intermediate representation. Anyone have any novel
structuring techniques for these parts of compilers? I know that
we've constantly redesigned and tweaked our intermediate
representation. We use a fairly standard control flow graph
representation, but one nice aspect of our system is that we've
designed an interface that makes it fairly simple for clients to write
new iterative dataflow passes (essentially it's a fancy user-defined
control structure, taking a closure that specifies what actions to
perform at each node of the CFG). The traversal implementation takes
care of traversing the CFG in the proper order, calling the
appropriate merge functions when control paths flow together, and
deciding when fixed-point has been reached on loops. I've seen other
compilers that had similarly structured traversal interfaces, but
what's unusual about ours is that the optimization passes can request
that the graph be modified in certain ways, even as the dataflow
information is being computed. For example, if the client is doing
constant folding and determines that the node 'x := y + 12' should be
replaced with 'x := 17' (because y is 5), then it requests that the +
node be replaced with this the new assignment node. The key is that
clients never modify the graph directly: they only request that the
traversal interface make a modification to the graph. This lets the
interface give the client the illusion that the graph has been
modified, even if the change is tentative because it was made inside a
loop that hasn't yet reached fixed point. Essentially, inside loops
the changes are batched up by the traversal implementation and are
either applied (if the last pass over the loop reached fixed point),
or are discarded (if fixed point wasn't reached). This eliminates the
more traditional two-pass process of collecting a bunch of dataflow
information on one pass, and then making another pass over the graph
to make modifications based on the results of the information. The
interface also makes it fairly easy to compose separate optimization
passes to run in parallel, giving each pass access to the other's
information. In some cases, running two passes in parallel results in
a better result than repeated application of the two passes
sequentially. In any event, we've found that this traversal
implementation makes it really easy to write a new optimization pass.
I'd love to hear about other people's techniques for structuring
optimizers or intermediate representations to allow them to be easily
extended, refined, etc.
More information about Cecil or the Vortex compiler can be obtained
from the Cecil WWW page:
http://www.cs.washington.edu/research/projects/cecil/www/cecil-home.html
-- Jeff
------------------------------------------------------------------------------
Jeffrey Dean (jdean@cs.washington.edu) Graduate Student, Cecil Project
Dept. of Computer Science & Engineering University of Washington
http://www.cs.washington.edu/homes/jdean/index.html
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.