Re: How many vector registers are useful?

kurz@math.uni-frankfurt.de (Volker Kurz)
Mon, 1 Feb 1993 15:00:03 GMT

          From comp.compilers

Related articles
[3 earlier articles]
Re: How many vector registers are useful? jlg@cochiti.lanl.gov (1993-01-26)
Re: How many vector registers are useful? hyatt@cis.uab.edu (1993-01-27)
Re: How many vector registers are useful? jrbd@craycos.com (1993-01-27)
Re: How many vector registers are useful? hrubin@pop.stat.purdue.edu (1993-01-28)
Re: How many vector registers are useful? sanjay@equalizer.cray.com (1993-01-29)
Re: How many vector registers are useful? shubu@cs.wisc.edu (1993-01-30)
Re: How many vector registers are useful? kurz@math.uni-frankfurt.de (1993-02-01)
| List of all articles for this month |

Newsgroups: comp.sys.super,comp.arch,comp.compilers
From: kurz@math.uni-frankfurt.de (Volker Kurz)
Followup-To: comp.sys.super
Keywords: architecture, performance
Organization: University of Frankfurt/Main, Dept. of Mathematics
References: 93-01-174
Date: Mon, 1 Feb 1993 15:00:03 GMT

kirchner@uklira.informatik.uni-kl.de einhard Kirchner) writes:
> [is a large vector] register file useful at all ?


Definitely yes.


> A register has an optimizing effect only when the value in it can be used
> several times, at least twice, ...
>
> But how is this on vector machines ? The register creates a speedup only
> when it can hold an entire vector, which can be used again later. This
> requires a register long enough to do so. That means vectors of e.g. a
> length of 5000 can not be held anyway, every machine must load, process,
> and store it in pieces, and only a lot of memory bandwidth helps.


Every vector command introduces a new startup period. So if you have to
cut your original vector(s) into pieces that fit into a vector register,
it helps if you need fewer pieces. That is the advantage of configuring a
few very long registers.


> When configured as a few long vectors the Fujitsu vector registers may
> help, but then comes the second question: Are there any statistics on the
> reusing of vectors? I know about such things for scalar registers, where
> people found that 32 is plenty enough, and only 8 help a lot. But in these
> cases registers are used for loop indexes, addresses etc., which can not
> be compared to the use of vector registers.
>
> So: what can be gained with such a big vector register file ? Or is it
> only of limited help ? Can the register file be traded against bandwith to
> load and store from memory ?


Yes it can, and this may be the main reason why Fujitsu gave us such a
large register file.


If you configure more but shorter registers, than you have enough space to
keep intermediate results. This may be the most important advantage of a
large register file: to avoid memory traffic at all.


By keeping intermediate results in vector registers, you do increase
computational intensity which is defined as


number of arithmetic operations
-------------------------------
number of (main-)memory references


This has to be seen together with the number of data paths (max number of
memory references per pipe per cicle), which is 3 for a Cray Y-MP, 2 for a
VP1xxx (as you have in Kaiserslautern) and, alas, only 1 for a VP2xxx. As
a rule of thumb, a good estimate for an upper bound of the speed of an
arithmetic operation is


min{computational intensity * data paths, 1} * peak performance


A simple vector add has a computational intensity of 1/3, so it requires 3
data paths for full speed. This is the case on a Y-MP (at least
theoretically, you cannot get the full speed because of memory conflicts
with other processors). On a VP2xxx however you get only roughly 1/3 of
peak performance. On the latter machine, increasing computational
intensity has a dramatic impact on the sustained speed. In many cases
(among these is matrix multiplication) you can increas computational
intensity by unrolling outer loops. This is where a large number of
vector registers is very useful.


You can exploit this on your own machine fairly easily by using the
routines from level 2 BLAS and level 3 BLAS. To the best of my knowledge,
Kaiserslautern uses the routines that were optimized at the University of
Karlsruhe as part of the ODIN project.


Hope this helps,
Volker Kurz


--
Dr. Volker Kurz *** J. W. Goethe-Universitaet
kurz@math.uni-frankfurt.de *** Fachbereich Mathematik
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.