|[2 earlier articles]|
|Re: How many vector registers are useful? email@example.com (1993-01-26)|
|Re How many vector registers are useful? firstname.lastname@example.org (1993-01-26)|
|Re: How many vector registers are useful? email@example.com (1993-01-26)|
|Re: How many vector registers are useful? firstname.lastname@example.org (1993-01-27)|
|Re: How many vector registers are useful? email@example.com (1993-01-27)|
|Re: How many vector registers are useful? firstname.lastname@example.org (1993-01-28)|
|Re: How many vector registers are useful? email@example.com (1993-01-29)|
|Re: How many vector registers are useful? firstname.lastname@example.org (1993-01-30)|
|Re: How many vector registers are useful? email@example.com (1993-02-01)|
|From:||firstname.lastname@example.org (Sanjay Krishnamurthy)|
|Date:||Fri, 29 Jan 1993 00:17:16 GMT|
Here are some references relevant to the topic. Perhaps the
most relevant one is:
"Vector Register Allocation" -Randy Allen and Ken Kennedy
Rice University Tech. Report COMP TR86-45.
It appeared in a recent issue of IEEE Trans. on Computers. It describes
in great detail loop transformations for enhancing vector register reuse.
The basic idea behind vector register allocation-that of exploiting true
and input dependences with constant thresholds, is also presented. All the
examples that various folks have presented on the net these past few days
can be handled by these compiler techniques. But the scope of vector
register allocation is a loop nest. And in linear algebra codes, one often
needs to consider both control flow issues and allocation across multiple
nests. One can sort of look at a single loop nest as a basic block. So,
the report gives compiler techniques for register allocation within basic
blocks. In order to go beyond basic blocks, one needs to extend dataflow
techniques to handle array sections (as simple sections maybe...)
But the problem gets more interesting when scheduling issues are also
considered. As in:
"Vector Register Design for Polycyclic Vector Scheduling"
-William Mangione-Smith, Santosh G. Abraham and Edward
Davidson, ASPLOS-IV, 1991.
or in: "Compile-Time Optimization of Memory and Register Usage on the
Cray-2" -C. Eisenbeis, W. Jalby and A. Lichnewsky, Second Workshop on
Languages and Compilers, Urbana-Illinois, 1989.
I think a more comprehensive version appeared in the Journal of
Both the papers deal with software pipelining issues and their impact on
vector register usage. But one could use a variation of traditional
unrolling too to enhance scheduling opportunities. For instance,
A(1:N) = B(1:N) + C(1:N)
could be turned into
A(1:N/2) = B(1:N/2) + C(1:N/2)
A(N/2+1:N) = B(N/2+1:N) + C(N/2+1:N)
and vector loads/stores could be overlapped with other vector operations.
So, the familiar scalar problem of deciding the profitablility of
unrolling v/s software pipelining exists in vector code too.
-Sanjay M. Krishnamurthy
RISC Compiler Group
Cray Research Inc.
Return to the
Search the comp.compilers archives again.