|Interprocedural optimization and code reuse firstname.lastname@example.org (1991-06-25)|
|Re: intraprocedure analysis email@example.com (1991-07-02)|
|Re: Interprocedural optimization and code reuse firstname.lastname@example.org (1991-07-02)|
|Re: Interprocedural optimization and code reuse rfrench@neon.Stanford.EDU (1991-07-02)|
|Re: Interprocedural optimization and code reuse email@example.com (1991-07-03)|
|Re: Interprocedural optimization and code reuse firstname.lastname@example.org.COM (1991-07-03)|
|Re: Interprocedural optimization and code reuse email@example.com (1991-07-03)|
|From:||firstname.lastname@example.org (Steve S. Roy)|
|Date:||25 Jun 91 23:51:42 GMT|
I have some questions/comments for people more in the know than I am
about the current state of the art in optimization. It regards writing
generic code and having it translated into efficient machine code on a
variety of different machines.
It would be really nice if I could write one general purpose matrix
multiply routine like (schematicly and in FORTRAN notation):
c(i,j) = c(i,j) + a(i,k)*b(k,j)
and use it everywhere. It is simple, it is correct (though incomplete of
course), it works for any size arrays, and simple modifications will allow
non-unit skips. And with many compilers available today it will be dog
slow. I'm talking here largely about workstation compilers, and though
this example is in FORTRAN notation this effect is just as prevalent in C.
I have experience with MIPS, Sparc, and i860 processors and they all have
large rewards (huge for the i860) for tuning based on cache use and
The tuning involves things like the size of cache and the relative
size of each of the arrays, the pipeline structure of the processor, etc.
If the arrays will all fit in cache then one way of coding this is best.
If one will and the other won't then a second is best. If you know that
'c(i,j)' starts out in cache then a third is best. And of course
everything changes when you go to a vector machine. That means that the
correct routine will be different in different parts of the program or on
different destination computers. It is obviously bad for my productivity
to have to worry about such things; I want the compiler to worry about it.
Now, I picked matrix multiply because it is a conceptually simple
operation, easy to explain and discuss. I don't mean to get into the
issues of FORTRAN 90 with builtin array types, I mean to discuss the
general issue of producing different object code for the same source code
depending on context, not just plugging in canned routines based on a
couple of flags.
It seems to me that this is a very important issue for the languages
that explicitly try to increase code reusability and maintain high
performance. As long as there is strong incentive for people to worry
about the details of each machine, they will produce code that depends on
it and is therefore not portable.
My question is how much work is going into interprocedural
optimization, and I'm looking for people's estimates of the potential for
this line of research, and how good interprocedural optimization would
change how people write code. My own feeling -- from writing a bunch of
i860 mathematical libraries -- is that there's a lot of potential here and
it would drasticly improve my productivity in writing scientific codes
that are performance limited, but it may be a lot of work.
Program of Applied and Computational Mathematics
Princeton NJ, 08544
Return to the
Search the comp.compilers archives again.