Re: Loop Unroll

rkrayhawk@aol.com (RKRayhawk)
21 May 2001 02:07:10 -0400

          From comp.compilers

Related articles
Loop Unroll lxh@arch.cs.pku.edu.cn (Alkaid) (2001-05-18)
Re: Loop Unroll mike@dimmick.demon.co.uk (Mike Dimmick) (2001-05-21)
Re: Loop Unroll gvmt@bgl.vsnl.net.in (Venkatesha Murthy G.) (2001-05-21)
Re: Loop Unroll samiam@cisco.com (Scott Moore) (2001-05-21)
Re: Loop Unroll rkrayhawk@aol.com (2001-05-21)
Re: Loop Unroll christian.bau@isltd.insignia.com (Christian Bau) (2001-05-22)
| List of all articles for this month |
From: rkrayhawk@aol.com (RKRayhawk)
Newsgroups: comp.compilers
Date: 21 May 2001 02:07:10 -0400
Organization: AOL http://www.aol.com
References: 01-05-046
Keywords: optimize
Posted-Date: 21 May 2001 02:07:10 EDT

Alkaid, lxh@arch.cs.pku.edu.cn
on 18 May 2001 23:51:16 -0400
posted
<<


When we do ILP optimization, loop unrolling is a kind of method.
But there is a question: if loop time is a big prime number, how to unroll
the loop?
e.g. when loop time is 100, I can unroll the loop 4 times to get a new one
which has 25 iterations. What shall I do if the loop time is 101?


>>


Two Part Response, Part 1
Assume you are dealing with a situation where you have a fixed count for the
loop, a count known at compile time, please consider these comments.


First, I would say that if you are smart enough to set up an efficient
algorithm to detect the primeness of your "big prime number", the rest
of the challenge is a piece of cake.


In general, a non prime is only a step away. That is, the next number
up or down is a non prime. (Yup, two and three are close but we are
talking 'big', so I will stipulate that all relevant cases are atleast
larger than three).


So, if you can detect that you have a loop control variable that is
large and prime, then all you need do is transform it to a
do-once-and-then-loop. In simple code, code replication may be
feasible, in complex code rendering the loop body as an invocation of
a returning subroutine will allow you to simply invoke it once before
entry as a loop. Naturally some code generation schemes will not
facilitate that so you may need to consider some way to get into the
loop body and get out once before looping.


This notion is a lot like loop unfolding itself, just one more image
of the ply (fold) for the odd number occurence. "One more image"
actually means "subtract one from prime to get a non prime", so you
have to add an execution to make up for it.


Let's say you unfolded to two. If you need to subtract one from the
prime loop control count then this code image needs a jump over the
first ply. ((deep tangent: if you really are dealing with a two-ply
code image extremely smart manipulation of the induction variables may
allow you to use a jump past the first one (which is a jump to the
second one) as a method of "adding one to the prime, and then
subtracting it back out"))


An unfolded image that has four plies, allows start positions that are
distinct from the 0th: the 1st, the 2nd and the 3rd. Obviously, in
some zones of the integer domain, not all of those are non prime. So
you need unfold logic that takes into account whether there is to be a
'priming' leap into the midst of the plies (pardon the pun, I see by
the return address that I am crossing language and cultural
boundaries, my light-heartedness is intended to relax the issue).


Stated another way, it is probably not that tough to find a way to
establish control valiables and to competently manage all associated
induction variables to allow you to leap into the middle of a prime
number occurence unfolded loop image. But your design must centralize
the information that allows your algorithm to know how to de-prime the
control count and remain flexible in the number of plies to project in
your unfolded image. That is, fi your optimizer is to be flexible in
the number of plies in the unfolded image, it must be flexible and
competent in it's leap in point.


If you are actually dealing with a static situation, then the unfold
image really, all things considered needs something like a preceding
jump-into instruction, the address of which is to be resolved later.




Two Part Response, Part 2
I will bet that you did not intend to say that you have a fixed loop
control count known at compile time! That is a guess. Yet this other
alternative is important for you to generalize your knowledge, if not
your current algorithm.


If you must determine the entry point within the unfolded image at
execution time. Then there are two basic possibilities. First, the
unfolded image code that is generated can have tests between each ply
that ask, is it time to leave yet. (That will chewup CPU, which is
exactly not the idea in optimization). The other possibility is to
make the genrated pre-entry code calculate where to enter (adjusting
parallel induction vaiables also), and then leap in to that
dynamically calculated position.


Thus the pre-entry code must detect
  1) is it prime value in the induction variable?
  2) what value must I adjust by, considering the wonderful code the the
compiler generated (that is, current-initial-control-count mod
compiler-unfold-image-ply-count)?
  3) how to deal with parallel induction variables?
  4) calculate priming leap
  5) leap


Hope you find that useful.


Robert Rayhawk
rayhawk@alum.calberkeley.org



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.