Related articles |
---|
[2 earlier articles] |
Re: Modulo optimizations torbenm@diku.dk (1999-10-13) |
Re: Modulo optimizations chase@world.std.com (David Chase) (1999-10-13) |
Re: Modulo optimizations ger@informatik.uni-bremen.de (George Russell) (1999-10-14) |
Re: Modulo optimizations harley@corton.inria.fr (Robert Harley) (1999-10-14) |
Re:Modulo optimizations Wilco.Dijkstra@arm.com (Wilco Dijkstra) (1999-10-19) |
Re: Modulo optimizations harley@corton.inria.fr (Robert Harley) (1999-10-21) |
Re: Modulo optimizations Peter-Lawrence.Montgomery@cwi.nl (1999-10-27) |
From: | Peter-Lawrence.Montgomery@cwi.nl (Peter L. Montgomery) |
Newsgroups: | comp.compilers |
Date: | 27 Oct 1999 14:06:41 -0400 |
Organization: | CWI, Amsterdam |
References: | 99-10-017 99-10-093 |
Keywords: | arithmetic, performance |
>Yes, but unfortunately not all divisions by a constant can be treated
>like that... A problem occurs when the error after multiplication
>could be greater than the pre-computed value you are multiplying with,
>forcing a correction step.
Torbj\"orn Granlund and I published
`Division by Invariant Integers using Multiplication'
in the 1994 PLDI (Proceedings of Programming Languages Design
and Implementation). See SIGPLAN Notices, June, 1994.
Yes, unsigned division by 3 needs a 33-bit constant
c3 = (2^34 + 2)/3 = 155555556x. If 0 <= x < 2^32, then
FLOOR(x/3) = FLOOR(c3 * x / 2^34)
In this case the multiplier c3 is even.
Reduce the fraction c3 / 2^34 to (c3/2) / 2^33.
The revised multiplier c3/2 fits in 32 bits.
Use the upper half of the unsigned integer multiply,
which I'll call MULUH, to multiply by (c3/2) / 2^32.
Shift right to divide by 2 again:
FLOOR(x/3) = MULUH(c3/2, x) >> 1
This is two instructions once the constant c3/2 is in a register.
There are no branches.
When the 33-bit multiplier is odd, the code is
longer. We first replace
FLOOR(c3 * x / 2^34) = FLOOR(FLOOR(c3 * x / 2^32) / 4)
The product FLOOR(c3 * x / 2^32) can exceed 2^32. We replace it by
x + FLOOR((c3 - 2^32) * x / 2^32)
The constant c3 - 2^32 fits in 32 bits. Hence
FLOOR(x/3) = FLOOR((x + t1) / 4)
where t1 = MULUH(c3 - 2^32, x). To avoid overflow in the
(x + t1) / 4 computation, observe 0 <= t1 < x. Hence
(x + t1)/4 = ((x - t1) + 2*t1) / 4
= ((x - t1)/2 + t1)/2
Besides the MULUH, we need two shifts, an addition, and a subtraction.
The paper covers more topics, such as two forms of signed division
(round towards zero, round towards -infinity).
It is available in ftp.cwi.nl:/pub/pmontgom/divcnst.ps{a4,l}.gz .
--
Peter-Lawrence.Montgomery@cwi.nl Home: San Rafael, California
Microsoft Research and CWI
Return to the
comp.compilers page.
Search the
comp.compilers archives again.