Related articles |
---|
Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-08-26) |
Re:Software Pipelining Jatin_Bhateja@mentor.com (Jatin Bhateja) (2008-08-28) |
Re:Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-08-28) |
Re: Software Pipelining pertti.kellomaki@tut.fi (Pertti Kellomaki) (2008-08-29) |
Re: Software Pipelining mr.neeraj@gmail.com (Neeraj Goel) (2008-09-02) |
Re: Software Pipelining sidtouati@inria.fr (Touati Sid) (2008-09-08) |
Re: Software Pipelining kamalpr@hp.com (kamal) (2008-09-10) |
Re: Software Pipelining johnhull2008@gmail.com (johnhull2008) (2008-09-11) |
Re: Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-09-16) |
Re: Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-09-16) |
Re: Software Pipelining pertti.kellomaki@tut.fi (Pertti Kellomaki) (2008-09-17) |
Re: Software Pipelining cdg@nullstone.com (Christopher Glaeser) (2008-09-21) |
Re: Software Pipelining armelasselin@hotmail.com (Armel) (2008-09-24) |
Software pipelining napi@rangkom.MY (1991-07-04) |
[1 later articles] |
From: | johnhull2008 <johnhull2008@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Thu, 11 Sep 2008 13:57:10 -0700 (PDT) |
Organization: | Compilers Central |
References: | 08-08-072 08-08-086 08-08-092 |
Keywords: | code, optimize |
Posted-Date: | 13 Sep 2008 11:56:23 EDT |
On Aug 28, 1:51 pm, Tim Frink <plfr...@yahoo.de> wrote:
> On Thu, 28 Aug 2008 11:49:49 +0530, Jatin Bhateja wrote:
> > It's a combination of loop unrolling and instruction scheduling. e.g
>
> Thank you, but I know how software pipelining is working. :-)
>
> My question was if also a significant performance increase for RISC
> architectures with a restricted number of functional units can be
> expected when software pipelining is applied.
>
> And if profiling might be exploited here.
I believe software pipelining can improve performance even when there
are only a few functional units.
For example, suppose the following code is run on a machine with a
single FU.
All the
It is just incrementing an array of 10 elements.
ldc r0, 0
L1:
0. load r1, 0(r0) # latency 2
1. add r1, r1, 1 # latency 1
2. store 0(r0), r1 # latency 1
3. add r0, r0, 1 # latency 1
4. cmp r0, 10 # latency 1
5. blt L1 # latency 1
For a non-SW pipelined code, a single iteration would take 7 cycles.
Now, for the SW pipelined code, a single iteration would take 6 cycles
on average.
## prologue
op0_0
noop
op1_0
op2_0
op3_0
op4_0
######## 1st iteration of steady state
op0_1
op5_0
op1_1
op2_1
op3_1
op4_1
######## 2nd iteration of steady state
op0_2
op5_1
op1_2
op2_2
op3_2
op4_2
########
.............
The degree of performance increase of a SW pipelined loop over a non-
SW pipelined loop depends on several factors, including the number of
resources (FUs), whether there are recurrences in the loop, the number
of iterations and also register pressure.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.