|Interprocedural optimization and code reuse email@example.com (1991-06-25)|
|Re: Interprocedural optimization and code reuse firstname.lastname@example.org (1991-07-02)|
|Re: Interprocedural optimization and code reuse rfrench@neon.Stanford.EDU (1991-07-02)|
|Re: Interprocedural optimization and code reuse email@example.com (1991-07-03)|
|Re: Interprocedural optimization and code reuse firstname.lastname@example.org.COM (1991-07-03)|
|Re: Interprocedural optimization and code reuse email@example.com (1991-07-03)|
|From:||rfrench@neon.Stanford.EDU (Robert S. French)|
|Organization:||Computer Science Department, Stanford University, Ca , USA|
|Date:||Tue, 2 Jul 1991 21:56:46 GMT|
In article 91-07-007 firstname.lastname@example.org (Steve S. Roy) writes:
> It would be really nice if I could write one general purpose matrix
>multiply routine like (schematicly and in FORTRAN notation):
>and use it everywhere.
> The tuning involves things like the size of cache and the relative
>size of each of the arrays, the pipeline structure of the processor, etc.
>If the arrays will all fit in cache then one way of coding this is best.
>If one will and the other won't then a second is best. If you know that
>'c(i,j)' starts out in cache then a third is best.
This answer doesn't directly answer your question about code reuse,
but there is a body of literature on "loop blocking" for locality and
cache reuse. There's no need to rewrite the code - simply changing
the "blocking factor" is sufficient based on cache characteristics and
array sizes. This can be done efficiently at runtime. I recommend
reading the paper:
Monica Lam & Michael Wolf, "A Data Locality Optimizing Algorithm",
There are a couple of other papers written by Michael Wolf, et al.,
but I can't find where they've been published off hand. I think "The
Cache Performance and Optimizations of Blocked Algorithms" was in this
Return to the
Search the comp.compilers archives again.