Re: Modulo optimizations

Peter-Lawrence.Montgomery@cwi.nl (Peter L. Montgomery)
27 Oct 1999 14:06:41 -0400

          From comp.compilers

Related articles
[2 earlier articles]
Re: Modulo optimizations torbenm@diku.dk (1999-10-13)
Re: Modulo optimizations chase@world.std.com (David Chase) (1999-10-13)
Re: Modulo optimizations ger@informatik.uni-bremen.de (George Russell) (1999-10-14)
Re: Modulo optimizations harley@corton.inria.fr (Robert Harley) (1999-10-14)
Re:Modulo optimizations Wilco.Dijkstra@arm.com (Wilco Dijkstra) (1999-10-19)
Re: Modulo optimizations harley@corton.inria.fr (Robert Harley) (1999-10-21)
Re: Modulo optimizations Peter-Lawrence.Montgomery@cwi.nl (1999-10-27)
| List of all articles for this month |

From: Peter-Lawrence.Montgomery@cwi.nl (Peter L. Montgomery)
Newsgroups: comp.compilers
Date: 27 Oct 1999 14:06:41 -0400
Organization: CWI, Amsterdam
References: 99-10-017 99-10-093
Keywords: arithmetic, performance

>Yes, but unfortunately not all divisions by a constant can be treated
>like that... A problem occurs when the error after multiplication
>could be greater than the pre-computed value you are multiplying with,
>forcing a correction step.




              Torbj\"orn Granlund and I published
`Division by Invariant Integers using Multiplication'
in the 1994 PLDI (Proceedings of Programming Languages Design
and Implementation). See SIGPLAN Notices, June, 1994.


          Yes, unsigned division by 3 needs a 33-bit constant
c3 = (2^34 + 2)/3 = 155555556x. If 0 <= x < 2^32, then


                FLOOR(x/3) = FLOOR(c3 * x / 2^34)


        In this case the multiplier c3 is even.
Reduce the fraction c3 / 2^34 to (c3/2) / 2^33.
The revised multiplier c3/2 fits in 32 bits.
Use the upper half of the unsigned integer multiply,
which I'll call MULUH, to multiply by (c3/2) / 2^32.
Shift right to divide by 2 again:


                FLOOR(x/3) = MULUH(c3/2, x) >> 1


This is two instructions once the constant c3/2 is in a register.
There are no branches.


          When the 33-bit multiplier is odd, the code is
longer. We first replace


                        FLOOR(c3 * x / 2^34) = FLOOR(FLOOR(c3 * x / 2^32) / 4)


The product FLOOR(c3 * x / 2^32) can exceed 2^32. We replace it by


                        x + FLOOR((c3 - 2^32) * x / 2^32)


The constant c3 - 2^32 fits in 32 bits. Hence


                        FLOOR(x/3) = FLOOR((x + t1) / 4)


where t1 = MULUH(c3 - 2^32, x). To avoid overflow in the
(x + t1) / 4 computation, observe 0 <= t1 < x. Hence


                        (x + t1)/4 = ((x - t1) + 2*t1) / 4
                                              = ((x - t1)/2 + t1)/2


Besides the MULUH, we need two shifts, an addition, and a subtraction.


        The paper covers more topics, such as two forms of signed division
(round towards zero, round towards -infinity).
It is available in ftp.cwi.nl:/pub/pmontgom/divcnst.ps{a4,l}.gz .
--
                Peter-Lawrence.Montgomery@cwi.nl Home: San Rafael, California
                Microsoft Research and CWI


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.