Re: Software Pipelining

johnhull2008 <johnhull2008@gmail.com>
Thu, 11 Sep 2008 13:57:10 -0700 (PDT)

          From comp.compilers

Related articles
Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-08-26)
Re:Software Pipelining Jatin_Bhateja@mentor.com (Jatin Bhateja) (2008-08-28)
Re:Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-08-28)
Re: Software Pipelining pertti.kellomaki@tut.fi (Pertti Kellomaki) (2008-08-29)
Re: Software Pipelining mr.neeraj@gmail.com (Neeraj Goel) (2008-09-02)
Re: Software Pipelining sidtouati@inria.fr (Touati Sid) (2008-09-08)
Re: Software Pipelining kamalpr@hp.com (kamal) (2008-09-10)
Re: Software Pipelining johnhull2008@gmail.com (johnhull2008) (2008-09-11)
Re: Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-09-16)
Re: Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-09-16)
Re: Software Pipelining pertti.kellomaki@tut.fi (Pertti Kellomaki) (2008-09-17)
Re: Software Pipelining cdg@nullstone.com (Christopher Glaeser) (2008-09-21)
Re: Software Pipelining armelasselin@hotmail.com (Armel) (2008-09-24)
Software pipelining napi@rangkom.MY (1991-07-04)
[1 later articles]
| List of all articles for this month |
From: johnhull2008 <johnhull2008@gmail.com>
Newsgroups: comp.compilers
Date: Thu, 11 Sep 2008 13:57:10 -0700 (PDT)
Organization: Compilers Central
References: 08-08-072 08-08-086 08-08-092
Keywords: code, optimize
Posted-Date: 13 Sep 2008 11:56:23 EDT

On Aug 28, 1:51 pm, Tim Frink <plfr...@yahoo.de> wrote:
> On Thu, 28 Aug 2008 11:49:49 +0530, Jatin Bhateja wrote:
> > It's a combination of loop unrolling and instruction scheduling. e.g
>
> Thank you, but I know how software pipelining is working. :-)
>
> My question was if also a significant performance increase for RISC
> architectures with a restricted number of functional units can be
> expected when software pipelining is applied.
>
> And if profiling might be exploited here.


I believe software pipelining can improve performance even when there
are only a few functional units.
For example, suppose the following code is run on a machine with a
single FU.
All the
It is just incrementing an array of 10 elements.


ldc r0, 0
L1:
0. load r1, 0(r0) # latency 2
1. add r1, r1, 1 # latency 1
2. store 0(r0), r1 # latency 1
3. add r0, r0, 1 # latency 1
4. cmp r0, 10 # latency 1
5. blt L1 # latency 1


For a non-SW pipelined code, a single iteration would take 7 cycles.
Now, for the SW pipelined code, a single iteration would take 6 cycles
on average.


## prologue
op0_0
noop
op1_0
op2_0
op3_0
op4_0
######## 1st iteration of steady state
op0_1
op5_0
op1_1
op2_1
op3_1
op4_1
######## 2nd iteration of steady state
op0_2
op5_1
op1_2
op2_2
op3_2
op4_2
########
.............


The degree of performance increase of a SW pipelined loop over a non-
SW pipelined loop depends on several factors, including the number of
resources (FUs), whether there are recurrences in the loop, the number
of iterations and also register pressure.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.