|It's 1997. Do you know where your scheduler is? email@example.com (D. J. Bernstein) (1997-12-19)|
|Re: It's 1997. Do you know where your scheduler is? firstname.lastname@example.org (David Chase) (1997-12-23)|
|Re: It's 1997. Do you know where your scheduler is? email@example.com (D. J. Bernstein) (1997-12-29)|
|Re: It's 1997. Do you know where your scheduler is? firstname.lastname@example.org (David Greene) (1998-01-04)|
|From:||"D. J. Bernstein" <email@example.com>|
|Date:||19 Dec 1997 00:19:02 -0500|
Surfing the web one day. See ``Fastest Fourier Transform in the
West,'' benchmarked as being faster than all other publicly available
Sit down and toss together a new FFT library. 39900 Pentium cycles for
a problem where FFTW takes 60000. Not bad, but not optimal.
Glance at the asm output produced by gcc -O1. It's horrendous. The gcc
scheduler obviously doesn't know diddly about the Pentium.
Try gcc -O3. The code slows down to 40400 Pentium cycles. Wonderful.
Back to gcc -O1. Shuffle instructions by hand, pipelining
appropriately. 25000 cycles. Not bad, but still not optimal.
Inspect asm output again. gcc mostly obeys the manual scheduling but
insists on loading some values at exactly the wrong times. Wonderful.
Pause to contemplate the market for assemblers. Would casm be a good
name for an assembler with C-like syntax? Bouncing between UNIX x86
syntax and Intel x86 syntax is a pain.
Try gcc -O3 again. The code slows down again. Wonderful.
Try gcc -O0, many combinations of options, other compilers. No luck.
Tweak the code, find some unnecessary loads that confuse the optimizer
enough to make it accidentally speed the code up a bit. Heh, heh, heh.
Make code available: http://pobox.com/~djb/djbfft.html.
Pause to wonder whether compiler writers will care about a 1.6x
speedup in one of the world's most important computational problems.
``These days, scheduling is realistically best left to the compiler.''
P.S. FFTW is at http://theory.lcs.mit.edu/~fftw.
Return to the
Search the comp.compilers archives again.