Related articles |
---|
Problems with Hardware, Languages, and Compilers hrubin@stat.purdue.edu (1997-03-07) |
[++] Re: Problems with Hardware, Languages, and Compilers Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1997-03-09) |
Re: [++] Re: Problems with Hardware, Languages, and Compilers jhi@alpha.hut.fi (Jarkko Hietaniemi) (1997-03-13) |
Re: [++] Re: Problems with Hardware, Languages, and Compilers Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1997-03-16) |
From: | Terje Mathisen <Terje.Mathisen@hda.hydro.com> |
Newsgroups: | comp.compilers,comp.lang.misc,comp.arch.arithmetic |
Date: | 9 Mar 1997 11:30:27 -0500 |
Organization: | Hydro |
References: | 97-03-037 |
Keywords: | arithmetic, design |
Herman Rubin wrote:
>
> I read an article in _High-Speed Computing_ recently, and this showed
> up a problem involving languages and compilers, and incidentally
> hardware arithmetic. I do not remember the exact title of the paper,
> but the topics discussed were division and square root on the i860.
>
> Hardware on the "cutting edge" can use a table lookup to start
> reciprocal or reciprocal square root, and iterative methods using only
> a few multiplications and additions to get accurate values for these.
> The same cannot be done for square root.
This is very similar to what I did for a swedish researcher last fall;
his molecular simulation program was (according to him) speed-limited
by sqrt(x).
When checking his source, it turned out the real limitation was
1/sqrt(x), which I then figured out a way to calculate as you show
above:
With a lookup table for sqrt(1/x), where the exponent of x is
normalized to the 0.5 to 2.0 range, I got an initial guess which I
then used to move into a range very close to 1.0.
A few Chebychev steps later we got an approximate sqrt(1/x) which was
accurate enough to give exactly the same 10-digit (decimal) results from
his trial runs.
By running two or three of these calculations in parallel, we got rid
of nearly all the fp pipeline stalls, resulting in final speedups for
his simulations between 30% and 100%.
It was quite easy to do this using portable C, so I cannot really see
the need to modify either hardware, languages or compilers to realize
these kinds of speedups?
Terje
--
- <Terje.Mathisen@hda.hydro.com>
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.