Re: floating point accuracy (Hans Aberg)
24 Jan 2002 13:54:47 -0500

          From comp.compilers

Related articles
re: Compiler bugs (David Chase) (2002-01-03)
Re: Compiler bugs (Christian Bau) (2002-01-05)
Re: Compiler bugs (David Chase) (2002-01-14)
Re: floating point accuracy, was Compiler bugs (Christian Bau) (2002-01-17)
Re: floating point accuracy, was Compiler bugs (David Chase) (2002-01-18)
Re: floating point accuracy (2002-01-24)
| List of all articles for this month |

From: (Hans Aberg)
Newsgroups: comp.compilers
Date: 24 Jan 2002 13:54:47 -0500
Organization: Mathematics
References: 02-01-015 02-01-029 02-01-054 02-01-069 02-01-087
Keywords: arithmetic, theory
Posted-Date: 24 Jan 2002 13:54:47 EST

David Chase <> wrote:
>> You can look at rounding errors in two ways: Instead of producing f (x)
>> you produce f (x) + eps, and you want eps small. Or instead of producing
>> f (x) you produce f (x + delta), and you want delta small.

One can consider these errors using differentials, assuming that f is
"well behaved" (has good derivatives):

Then I arrive at two approximate number types, the fixed that approximates
differentials df, and the floating number type which approximates
logarithmic differentials dlog f := df/f. (I have not seen any
implementation of the fixed point type.)

When using the floating number type, the dx (= your epsilon) does not
become apparent, because it is implicitly tacitly taken to be the smallest
relative error != 0. But it becomes apparent when making a floating point
library where the mantissa can vary its accuracy (like say the GNU GMP).
Then I think it would be prudent that one can define the accuracy of x say
in number of bits, and the output should be computed with respect to that,
which will depend on the function f, or whether absolute (df, fixed) or
relative (dlog f, floating) accuracies are sought. Thus, the algorithms
would have to come with an accuracy analysis.

An example: If f depends on the variables x_1, ..., x_n, then df =
(df/dx_1)dx_1 + ... + (df/dx_n)dx_n. So, for example, dlog(x*y) = dlog x +
dlog y. As we want to put bounds absolute value of the relative errors,
the relative error of x*y can be bounded by of x plus that of y. In terms
of precision (inverted values), one then knows that the smallest relative
precision of x and y will suffice to contain the precision of x*y. This
gives a relatively simple formula for the multiprecision library to
handle. -- If we would have used fixed point numbers, then this formula
would apply instead for the absolute precision of x + y.

So my idea is that one should work through the other functions in this
manner, for use in a multiprecision floating or fixed point library.

As for computer languages, I think that CPU's are sufficiently fast for
them to support multiprecion floats (and perhaps fixeds, too): If people
start to add that to languages, perhaps eventually, one will see hardware
support for it, and it will not be slow, just as happened with the fixed
precision FPU's.

    Hans Aberg * Anti-spam: remove "remove." from email address.
                                    * Email: Hans Aberg <>
                                    * Home Page: <>
                                    * AMS member listing: <>

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.