# Re: Modulo optimizations

## Peter-Lawrence.Montgomery@cwi.nl (Peter L. Montgomery)27 Oct 1999 14:06:41 -0400

From comp.compilers

Related articles
[2 earlier articles]
Re: Modulo optimizations torbenm@diku.dk (1999-10-13)
Re: Modulo optimizations chase@world.std.com (David Chase) (1999-10-13)
Re: Modulo optimizations ger@informatik.uni-bremen.de (George Russell) (1999-10-14)
Re: Modulo optimizations harley@corton.inria.fr (Robert Harley) (1999-10-14)
Re:Modulo optimizations Wilco.Dijkstra@arm.com (Wilco Dijkstra) (1999-10-19)
Re: Modulo optimizations harley@corton.inria.fr (Robert Harley) (1999-10-21)
Re: Modulo optimizations Peter-Lawrence.Montgomery@cwi.nl (1999-10-27)
| List of all articles for this month |

 From: Peter-Lawrence.Montgomery@cwi.nl (Peter L. Montgomery) Newsgroups: comp.compilers Date: 27 Oct 1999 14:06:41 -0400 Organization: CWI, Amsterdam References: 99-10-017 99-10-093 Keywords: arithmetic, performance

>Yes, but unfortunately not all divisions by a constant can be treated
>like that... A problem occurs when the error after multiplication
>could be greater than the pre-computed value you are multiplying with,
>forcing a correction step.

Torbj\"orn Granlund and I published
`Division by Invariant Integers using Multiplication'
in the 1994 PLDI (Proceedings of Programming Languages Design
and Implementation). See SIGPLAN Notices, June, 1994.

Yes, unsigned division by 3 needs a 33-bit constant
c3 = (2^34 + 2)/3 = 155555556x. If 0 <= x < 2^32, then

FLOOR(x/3) = FLOOR(c3 * x / 2^34)

In this case the multiplier c3 is even.
Reduce the fraction c3 / 2^34 to (c3/2) / 2^33.
The revised multiplier c3/2 fits in 32 bits.
Use the upper half of the unsigned integer multiply,
which I'll call MULUH, to multiply by (c3/2) / 2^32.
Shift right to divide by 2 again:

FLOOR(x/3) = MULUH(c3/2, x) >> 1

This is two instructions once the constant c3/2 is in a register.
There are no branches.

When the 33-bit multiplier is odd, the code is
longer. We first replace

FLOOR(c3 * x / 2^34) = FLOOR(FLOOR(c3 * x / 2^32) / 4)

The product FLOOR(c3 * x / 2^32) can exceed 2^32. We replace it by

x + FLOOR((c3 - 2^32) * x / 2^32)

The constant c3 - 2^32 fits in 32 bits. Hence

FLOOR(x/3) = FLOOR((x + t1) / 4)

where t1 = MULUH(c3 - 2^32, x). To avoid overflow in the
(x + t1) / 4 computation, observe 0 <= t1 < x. Hence

(x + t1)/4 = ((x - t1) + 2*t1) / 4
= ((x - t1)/2 + t1)/2

Besides the MULUH, we need two shifts, an addition, and a subtraction.

The paper covers more topics, such as two forms of signed division
(round towards zero, round towards -infinity).
It is available in ftp.cwi.nl:/pub/pmontgom/divcnst.ps{a4,l}.gz .
--
Peter-Lawrence.Montgomery@cwi.nl Home: San Rafael, California
Microsoft Research and CWI

Post a followup to this message