Related articles |
---|
Re: reducible loops glew@pdx007.intel.com (1992-02-26) |
Duff's device glew@pdx007.intel.com (1992-02-27) |
Newsgroups: | comp.compilers |
From: | glew@pdx007.intel.com (Andy Glew) |
Keywords: | optimize, history |
Organization: | Intel Corp, Hillsboro, Oregon |
References: | <9202220021.AA28153@ucbvax.Berkeley.EDU> 92-02-125 |
Date: | Thu, 27 Feb 1992 18:55:07 GMT |
Alan Watson <alan@uwast.astro.wisc.edu> corrects me about Duff's devive.
Since I've already received several requests for a reference to Duff's
device, I provide it here.
I believe your post on Duff's device is in error, and I enclose the
original and one of Duff's comments on its use. Note that Duff's
device is not a block-to-block copy, but a block-to-location copy. I
have not submitted a correction to comp.compilers.
(1) Thanks for the original. I'll keep it in my files. I rather thought
that I had the code a bit wrong.
(2) Maybe Tom Duff only originated this as a block-to-location copy, but
I've used it for block to block copies, and to implement fast checksums,
and I've seen other people use it for vector operations. I don't think
it's unreasonable to give Tom Duff credit for any use of the switch into
loop body construct.
>From research!ucbvax!dagobah!td Sun Nov 13 07:35:46 1983
Received: by ucbvax.ARPA (4.16/4.13)
Received: by dagobah.LFL (4.6/4.6b)
Date: Thu, 10 Nov 83 17:57:56 PST
From: ucbvax!dagobah!td (Tom Duff)
Message-Id: <8311110157.AA01034@dagobah.LFL>
To: ucbvax!decvax!hcr!rrg, ucbvax!ihnp4!hcr!rrg, ucbvax!research!dmr,
ucbvax!research!rob
Consider the following routine, abstracted from code which copies an
array of shorts into the Programmed IO data register of an Evans &
Sutherland Picture System II:
send(to, from, count)
register short *to, *from;
register count;
{
do
*to = *from++;
while(--count>0);
}
(Obviously, this fails if the count is zero.)
The VAX C compiler compiles the loop into 2 instructions (a movw and
a sobleq, I think.) As it turns out, this loop was the bottleneck in
a real-time animation playback program which ran too slowly by about 50%.
The standard way to get more speed out of something like this is to unwind
the loop a few times, decreasing the number of sobleqs. When you do that,
you wind up with a leftover partial loop. I usually handle this in C with
a switch that indexes a list of copies of the original loop body. Of
course, if I were writing assembly language code, I'd just jump into the
middle of the unwound loop to deal with the leftovers. Thinking about this
yesterday, the following implementation occurred to me:
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do{ *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}while(--n>0);
}
}
Disgusting, no? But it compiles and runs just fine. I feel a combination
of pride and revulsion at this discovery. If no one's thought of it before,
I think I'll name it after myself.
It amazes me that after 10 years of writing C there are still little corners
that I haven't explored fully. (Actually, I have another revolting way to
use switches to implement interrupt driven state machines but it's too
horrid to go into.)
Many people (even bwk?) have said that the worst feature of C is that
switches don't break automatically before each case label. This code forms
some sort of argument in that debate, but I'm not sure whether it's for or
against.
yrs trly
Tom
--------------------------------------------------------------------------
From: td@alice.UUCP (Tom Duff)
Date: 29 Aug 88 20:33:51 GMT
[...]
Transformations like this can only be justified by measuring the
resulting code. Be careful when you use this thing that you don't
unwind the loop so much that you overflow your machine's instruction
cache. Don't try to be smarter than an over-clever C compiler that
recognizes loops that implement block move or block clear and compiles
them into machine idioms.
[...]
--
Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Pkwy,
Hillsboro, Oregon 97124-6497
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.