|How many vector registers are useful? firstname.lastname@example.org (1993-01-25)|
|Re: How many vector registers are useful? email@example.com (1993-01-27)|
|Vector registers moliere!pmk@uunet.UU.NET (1993-01-30)|
|From:||moliere!pmk@uunet.UU.NET (Peter Michael Klausler)|
|Date:||Sat, 30 Jan 1993 04:41:55 GMT|
Although lots of folks know about this, in the interest of completeness in
the discussion about the best number of vector registers I'd like to point
out the capability of "tailgating" found on a couple of machines I've had
the pleasure of working with.
Tailgating permits results to flow into a vector register while data are
concurrently flowing out. E.g.,
V0 <- V1+V2
V1 <- V3*V4 ! no need to wait for V1
can run in parallel, subject to initial availability of V1-4. This
capability increases the number of registers available to a scheduler, for
the operand registers coming into their last (or only) uses in a chime can
also serve as result registers in that same chime.
In kernels with no reused vector registers, then, tailgating means that
you can get by with about as many vector registers as you have functional
units. This capability is useful for machines without more flexible (but
expensive) chaining mechanisms, for it permits construction of a very
efficient polycyclic schedule for the kernel.
(In the absence of tailgating, aggressive schedulers can run out of result
registers and must delay instructions to later chimes. Polycyclic
scheduling techniques that construct a regular pattern of operations, such
as modulo reservation table SW pipelining, can run into trouble. An
irregular polycyclic method seems to fare a little better under register
pressure in my experience.)
Return to the
Search the comp.compilers archives again.