Re: reducible loops (Preston Briggs)
Thu, 27 Feb 1992 07:28:55 GMT

          From comp.compilers

Related articles
reducible loops (1992-02-24)
Re: reducible loops (Raul Deluth Miller-Rockwell) (1992-02-25)
Re: reducible loops (1992-02-26)
Re: reducible loops (1992-02-26)
Re: reducible loops (1992-02-26)
Re: reducible loops (1992-02-27)
Re: reducible loops (1992-02-28)
Re: reducible loops idacrd!desj@uunet.UU.NET (1992-03-09)
Re: reducible loops (1992-03-10)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Preston Briggs)
Keywords: optimize
Organization: Rice University, Houston
References: <9202220021.AA28153@ucbvax.Berkeley.EDU> 92-02-125
Date: Thu, 27 Feb 1992 07:28:55 GMT

In article 92-02-125 (Andy Glew) writes:

>Duff's device is a technique for unrolling loops with minimal overhead.
> [ buggy version omitted -John]

All the stars are missing!
Here's a less tricky way to accomplish the same effect

        j = i / 8;
        switch( i % 8 ) {
case 7:
         *destptr++ = *srcptr++;
         case 6:
         *destptr++ = *srcptr++;
         case 5:
         *destptr++ = *srcptr++;
         case 4:
         *destptr++ = *srcptr++;
         case 3:
         *destptr++ = *srcptr++;
         case 2:
         *destptr++ = *srcptr++;
         case 1:
         *destptr++ = *srcptr++;
         case 0:
         *destptr++ = *srcptr++;
        while (--j >= 0) {
         *destptr++ = *srcptr++;
         *destptr++ = *srcptr++;
         *destptr++ = *srcptr++;
         *destptr++ = *srcptr++;
         *destptr++ = *srcptr++;
         *destptr++ = *srcptr++;
         *destptr++ = *srcptr++;
         *destptr++ = *srcptr++;

Same overhead, but better optimization of the loop. Even allowing the
overhead is probably fine, especially if we consider the code growth and
switch overhead.

for (j=0; j<i%8; j++)
dest[j] = src[j];
while (j < i) {
dest[j+0] = src[j+0];
dest[j+1] = src[j+1];
dest[j+2] = src[j+2];
dest[j+3] = src[j+3];
dest[j+4] = src[j+4];
dest[j+5] = src[j+5];
dest[j+6] = src[j+6];
dest[j+7] = src[j+7];
j += 8;

Probably runs at least as fast, and the optimizer will do nice things to
the loop (assuming there's something to do). In particular, it's clear
that the 2nd assignment (dest[j+1] = src[j+1]) always follows the first,
allowing (sometimes) common subexpression elimination, value numbering,
strength reduction, and so forth.

In the example above, it ought to be able to manage each iteration with a
single increment and test. All the array accesses should be doable with
constant indices off the single index. Alternatively, if the machine
supports free auto-increment, the optimizer could use it, as suggested by
Andy's code.

In the original Duff'd code, each assignment may be reached from outside
the loop or from the previous assignment. This choice (at every
statement) eliminates many possibilities for optimization.

Of course, with this simple a loop body, there's little to gain from
optimization after unrolling. More interesting loops have more to gain
(or lose, if Duff's device is used).

Preston Briggs

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.