Re: Does Duff Device break C compilers

msb@sq.com (Mark Brader)
10 May 1996 01:36:06 -0400

          From comp.compilers

Related articles
Does Duff Device break C compilers Lambert.Lum@eng.efi.com (Lambert Lum) (1996-05-06)
Re: Does Duff Device break C compilers preston@tera.com (1996-05-07)
Re: Does Duff Device break C compilers krste@ICSI.Berkeley.EDU (1996-05-08)
Re: Does Duff Device break C compilers jeremy@floyd.sw.oz.au (1996-05-10)
Re: Does Duff Device break C compilers rankin@eql.caltech.edu (1996-05-10)
Re: Does Duff Device break C compilers msb@sq.com (1996-05-10)
Re: Does Duff Device break C compilers preston@tera.com (1996-05-13)
Re: Does Duff Device break C compilers baynes@ukpsshp1.serigate.philips.nl (1996-05-19)
| List of all articles for this month |

From: msb@sq.com (Mark Brader)
Newsgroups: comp.compilers
Date: 10 May 1996 01:36:06 -0400
Organization: SoftQuad Inc., Toronto, Canada
References: 96-05-050 96-05-054 96-05-057
Keywords: C, optimize

> > On the other hand, wouldn't be surprised if the resulting code wasn't
> > that great. Why? It makes an irreducible loop ...


Yep. Tom Duff was hand-optimizing for a particular machine and a particular
compiler when he originally wrote it, one where it did work.


> > Better is to rewrite it, like this
> >
> > /* no assumptions made about count */
> > for (n = 0; n < count; n++)
> > to[n] = from[n];
> >
> > which is correct, obvious, maintainable, parallelizable, vectorizable,
> > software pipelinable, and in all other ways superior.


In all other ways superior except that it *does the wrong thing*!
The correct translation is


for (n = 0; n < count; n++)
*to = from[n];


because "to" was pointing to an output register, specifically "the Pro-
grammed IO data register of an Evans & Sutherland Picture System II".
Note that this also means that the loop really has to be executed serially!


> Better yet...
>
> memcpy(from, to, count*(sizeof *from));


Worse yet -- it cannot be trivially corrected as the "for" version can.
If we were talking about a block copy, THEN memcpy() or memmove() would
of course generally be best.


> [Oh, right. Duh. It was still a great hack on the PDP-11. -John]


Tom was working on a VAX, actually. The *to = *from++ compiled to
a single instruction.


For greater clarity: Tom wrote in 1988 "I do not claim to have invented
loop unrolling, merely this particular expression of it in C." And it
was in 1983 that he did it.


--
Mark Brader
msb@sq.com
SoftQuad Inc., Toronto


My text in this article is in the public domain.


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.