Re: Re How many vector registers are useful?

idacrd!desj@uunet.UU.NET (David desJardins)
Sun, 31 Jan 1993 05:15:43 GMT

          From comp.compilers

Related articles
How many vector registers are useful? kirchner@uklira.informatik.uni-kl.de (1993-01-25)
Re How many vector registers are useful? ssimmons@convex.com (1993-01-26)
Re: Re How many vector registers are useful? billms@corp.corp.mot.com (1993-01-28)
Re: Re How many vector registers are useful? idacrd!desj@uunet.UU.NET (1993-01-31)
Re: Re How many vector registers are useful? billms@corp.mot.com (1993-02-02)
| List of all articles for this month |

Newsgroups: comp.compilers
From: idacrd!desj@uunet.UU.NET (David desJardins)
Keywords: vector, architecture
Organization: IDA Center for Communications Research, Princeton
References: 93-01-174 93-01-211
Date: Sun, 31 Jan 1993 05:15:43 GMT

Bill Mangione-Smith <billms@corp.corp.mot.com> writes:
> Santosh Abraham, Ed Davidson, and I had a paper two asplos's ago that
> looked at the minimal number of vector registers required for specific
> codes. [.... W]e decided to focus on the minimal number of registers
> required to achieve optimal performance.


I haven't looked at your paper, but I think that you have to be very
careful in using the word "optimal" here. I have written a fair number of
assembly-language routines for vector machines, and it is very often the
case that the number of vector registers needed for "nearly optimal" code
is substantially less than that needed for "perfectly optimal" code.


In my experience, what often happens is that you can get a code which is
"nearly optimal" in the sense of taking the correct number of chimes to
execute the loop, but a few more ticks than is strictly necessary, because
the usage of the vector registers is not perfectly synchronized. A vector
functional unit might have to wait for its input for a few ticks, for
example, because the latency of the unit feeding it is greater than its
own latency. These few ticks might only add a few percent to the
execution time of the loop, but it might take as much as double the number
of vector registers to eliminate them.


Perhaps you were looking at some sort of "ideal" vector machine? Assuming
things like constant latencies in the functional units would certainly
simplify a truly optimal analysis while probably producing nearly
equivalent results for practical purposes.


                                                                                David desJardins
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.