Re: Interprocedural optimization and code reuse

rbe@yrloc.ipsa.reuter.COM (Robert Bernecky)
Wed, 3 Jul 91 15:16:10 GMT

From comp.compilers

Related articles
Interprocedural optimization and code reuse ssr@stokes.princeton.edu (1991-06-25)
Re: Interprocedural optimization and code reuse pardo@smelt.cs.washington.edu (1991-07-02)
Re: Interprocedural optimization and code reuse rfrench@neon.Stanford.EDU (1991-07-02)
Re: Interprocedural optimization and code reuse tseng@rice.edu (1991-07-03)
*Re: Interprocedural optimization and code reuse rbe@yrloc.ipsa.reuter.COM* (1991-07-03)**
Re: Interprocedural optimization and code reuse pardo@sturgeon.cs.washington.edu (1991-07-03)

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	rbe@yrloc.ipsa.reuter.COM (Robert Bernecky)
Keywords:	optimize, design
Organization:	Snake Island Research Inc, Toronto
References:	91-07-007
Date:	Wed, 3 Jul 91 15:16:10 GMT

In article 91-07-007 ssr@stokes.princeton.edu (Steve S. Roy) writes:
>
> It would be really nice if I could write one general purpose matrix
>multiply routine like (schematicly and in FORTRAN notation):
>and use it everywhere. It is simple, it is correct (though incomplete of
>course), it works for any size arrays, and simple modifications will allow

Well, if you use APL or J ( a new ascii-based dialect of APL), you could
write something like: b +.* c

In APL, the dot is a conjunction which performs matrix product. It is
more general than Fortran's MATMUL/DOTPRODUCT in the following senses:

a. The sum and product (+ and *) operations are explicitly specified.
      This means that you can do different inner product, depending on your
      needs:
              +.* traditional fizziks matrix product.
              ^.= and reduce of equals: Associative memory search.
              or.^ (no "or" symbol here, but you get the idea "or point and".
                          This computes graph connectivity for Boolean graphs.
                          Do it in a loop and you get transitive closure.

b. dot product is defined on arrays of ANY rank, not just rank 1 or 2.

c. Because, as you pointed out, you specify "what you want done",
      rather than "how to do it" (cache problems, etc), you leave
      the work of making it run fast to the compiler/interpreter writer,
      as it properly should be.
      This results in code which is portable AND which is more likely to
      be correct than if an application writer tried to exploit a particular
      architecture's features.

d. It is compact, and therefore visibly correct, unlike the Fortran loops,
      in spite of the "enddo"s.

e. It can indeed run fast:
            +.* on floating arrays (APL's simplest case, Fortran's best)
                ran 3.5 times as fast as IBM's Fortran compiler when we implemented
                new inner product algorithms in SHARP APL.

          or.and ran 1000 times the speed of +.* (roughly- probably more than
          that), because we exploited the minimal parallelism available in
          the /370 (32-bit boolean ops). This sort of optimization is
          worthwhile when you are writing an interpreter or compiler, but
          is not likely to be done by an applications writer until forced
          to do so by performance problems. And, it comes for free to the user!

See my "Fortran 90 Arrays" paper in the February 1991 SIGPLAN Notices.

Bob

Robert Bernecky rbe@yrloc.ipsa.reuter.com bernecky@itrchq.itrc.on.ca
Snake Island Research Inc (416) 368-6944 FAX: (416) 360-4694
18 Fifth Street, Ward's Island
Toronto, Ontario M5J 2B9
Canada
[Assuming the IBM Fortran compiler you're referring to is some descendant of
Fortran H, I'd be interested to hear how you did 3.5 times better. -John]
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Interprocedural optimization and code reuse

rbe@yrloc.ipsa.reuter.COM (Robert Bernecky)Wed, 3 Jul 91 15:16:10 GMT

rbe@yrloc.ipsa.reuter.COM (Robert Bernecky)
Wed, 3 Jul 91 15:16:10 GMT