Re:Software Pipelining

Jatin Bhateja <Jatin_Bhateja@mentor.com>
Thu, 28 Aug 2008 11:49:49 +0530

          From comp.compilers

Related articles
Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-08-26)
Re:Software Pipelining Jatin_Bhateja@mentor.com (Jatin Bhateja) (2008-08-28)
Re:Software Pipelining plfriko@yahoo.de (Tim Frink) (2008-08-28)
Re: Software Pipelining pertti.kellomaki@tut.fi (Pertti Kellomaki) (2008-08-29)
Re: Software Pipelining mr.neeraj@gmail.com (Neeraj Goel) (2008-09-02)
Re: Software Pipelining sidtouati@inria.fr (Touati Sid) (2008-09-08)
Re: Software Pipelining kamalpr@hp.com (kamal) (2008-09-10)
Re: Software Pipelining johnhull2008@gmail.com (johnhull2008) (2008-09-11)
[7 later articles]
| List of all articles for this month |

From: Jatin Bhateja <Jatin_Bhateja@mentor.com>
Newsgroups: comp.compilers
Date: Thu, 28 Aug 2008 11:49:49 +0530
Organization: Compilers Central
References: 08-08-072
Keywords: optimize
Posted-Date: 28 Aug 2008 10:11:15 EDT

It's a combination of loop unrolling and instruction scheduling. e.g


forloop (...) {
        a[i] = b[i]; // stmt 1
        c[i] = a[i]; // stmt 2
        d[i] = c[i]; // stmt 3
}




there is a clear true dependence b/w stmt 1 , 2 and 3 which constrains
their execution order. If each statement takes one unit time then
each iteration will take 3 unit time, so if there are 2 iterations
then loop execution will take 6 unit time.


But if we unroll the loop and the reorder the instruction execution
order in such a way that all the non-dependent statements can execute
simultaneously then we can save on execution time. i.e.


1. Unrolling step .


a[0] = b[0]
c[0] = a[0]
d[0] = c[0]
a[1] = b[1]
c[1] = a[1]
d[1] = c[1]




2. After Instruction scheduling step.


a[0] = b[0]
a[1] = b[1]


c[0] = a[0]
c[1] = a[1]


d[0] = c[0]
d[1] = c[1]


Thus instructions in each of above pair of instruction can run
parallely. Thus if the architecture has multiple functional units then
we can run the above loop in 3 units times. Also we can club the
instruction present in each group in a long VLIW format if the
processor is VLIW, this will reduce the initial loop into few VLIW
instructions.


Optimization is called pipelining because we are able to overlap the
execution of multiple instruction after optimizations.


Thanks
- Jatin Bhateja


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.