# Re: Interprocedural optimization and code reuse

## rbe@yrloc.ipsa.reuter.COM (Robert Bernecky)Wed, 3 Jul 91 15:16:10 GMT

From comp.compilers

Related articles
Interprocedural optimization and code reuse ssr@stokes.princeton.edu (1991-06-25)
Re: Interprocedural optimization and code reuse pardo@smelt.cs.washington.edu (1991-07-02)
Re: Interprocedural optimization and code reuse rfrench@neon.Stanford.EDU (1991-07-02)
Re: Interprocedural optimization and code reuse tseng@rice.edu (1991-07-03)
Re: Interprocedural optimization and code reuse rbe@yrloc.ipsa.reuter.COM (1991-07-03)
Re: Interprocedural optimization and code reuse pardo@sturgeon.cs.washington.edu (1991-07-03)
| List of all articles for this month |

 Newsgroups: comp.compilers From: rbe@yrloc.ipsa.reuter.COM (Robert Bernecky) Keywords: optimize, design Organization: Snake Island Research Inc, Toronto References: 91-07-007 Date: Wed, 3 Jul 91 15:16:10 GMT

In article 91-07-007 ssr@stokes.princeton.edu (Steve S. Roy) writes:
>
> It would be really nice if I could write one general purpose matrix
>multiply routine like (schematicly and in FORTRAN notation):
>and use it everywhere. It is simple, it is correct (though incomplete of
>course), it works for any size arrays, and simple modifications will allow

Well, if you use APL or J ( a new ascii-based dialect of APL), you could
write something like: b +.* c

In APL, the dot is a conjunction which performs matrix product. It is
more general than Fortran's MATMUL/DOTPRODUCT in the following senses:

a. The sum and product (+ and *) operations are explicitly specified.
This means that you can do different inner product, depending on your
needs:
^.= and reduce of equals: Associative memory search.
or.^ (no "or" symbol here, but you get the idea "or point and".
This computes graph connectivity for Boolean graphs.
Do it in a loop and you get transitive closure.

b. dot product is defined on arrays of ANY rank, not just rank 1 or 2.

c. Because, as you pointed out, you specify "what you want done",
rather than "how to do it" (cache problems, etc), you leave
the work of making it run fast to the compiler/interpreter writer,
as it properly should be.
This results in code which is portable AND which is more likely to
be correct than if an application writer tried to exploit a particular
architecture's features.

d. It is compact, and therefore visibly correct, unlike the Fortran loops,
in spite of the "enddo"s.

e. It can indeed run fast:
+.* on floating arrays (APL's simplest case, Fortran's best)
ran 3.5 times as fast as IBM's Fortran compiler when we implemented
new inner product algorithms in SHARP APL.

or.and ran 1000 times the speed of +.* (roughly- probably more than
that), because we exploited the minimal parallelism available in
the /370 (32-bit boolean ops). This sort of optimization is
worthwhile when you are writing an interpreter or compiler, but
is not likely to be done by an applications writer until forced
to do so by performance problems. And, it comes for free to the user!

See my "Fortran 90 Arrays" paper in the February 1991 SIGPLAN Notices.

Bob

Robert Bernecky rbe@yrloc.ipsa.reuter.com bernecky@itrchq.itrc.on.ca
Snake Island Research Inc (416) 368-6944 FAX: (416) 360-4694
18 Fifth Street, Ward's Island
Toronto, Ontario M5J 2B9