Related articles |
---|
Modulo optimizations gmarshal@globalnet.co.uk (Graham Marshall) (1999-10-04) |
Re: Modulo optimizations jgk@jgk.org (Joe Keane) (1999-10-11) |
Re: Modulo optimizations torbenm@diku.dk (1999-10-13) |
Re: Modulo optimizations chase@world.std.com (David Chase) (1999-10-13) |
Re: Modulo optimizations ger@informatik.uni-bremen.de (George Russell) (1999-10-14) |
Re: Modulo optimizations harley@corton.inria.fr (Robert Harley) (1999-10-14) |
Re:Modulo optimizations Wilco.Dijkstra@arm.com (Wilco Dijkstra) (1999-10-19) |
Re: Modulo optimizations harley@corton.inria.fr (Robert Harley) (1999-10-21) |
Re: Modulo optimizations Peter-Lawrence.Montgomery@cwi.nl (1999-10-27) |
From: | torbenm@diku.dk (Torben AEgidius Mogensen) |
Newsgroups: | comp.compilers |
Date: | 13 Oct 1999 00:57:42 -0400 |
Organization: | Department of Computer Science, U of Copenhagen |
References: | 99-10-017 |
Keywords: | arithmetic, performance |
That modulo with powers of two can be done for unsigned numbers by
bitwise AND is a standard trick and done by many compilers. For signed
numbers, the problem is that most languages say that modulos of
negative numbers is negative, for eaxmple (-7)%3 == -1, so you need a
test for these. Hence, it is not a clear win for machines that have
division hardware.
For machines without division hardware, you can also do fast(ish)
modulo (2^n-1), using a method similar to the way we find modulo 3 or
9 of decimal numbers by summing digits. This also is for unsigned
numbers only. For x%3, you add all bit-pairs and reduce this sum again
in the same way until you are down to two bits. You can do some of the
summing in parallel, i.e.:
a = x & 0x33333333; /* even two-bit groups */
b = x & 0xcccccccc; /* odd two-bit groups */
sum = a + (b >> 2); /* sum 0-6 in 8 groups */
sum = sum + (sum >> 2); /* sum 0-3 in 8 groups */
sum = sum & 0x33333333; /* clear garbage bits */
sum = sum + (sum >> 4); /* sum 0-6 in 4 groups */
sum = sum + (sum >> 2); /* sum 0-3 in 4 groups */
sum = sum & 0x33333333; /* clear garbage bits */
sum = sum + (sum >> 8); /* sum 0-6 in 2 groups */
sum = sum + (sum >> 2); /* sum 0-3 in 2 groups */
sum = sum & 0x33333333; /* clear garbage bits */
sum = sum + (sum >> 16); /* sum 0-6 in 1 group */
sum = sum + (sum >> 2); /* sum 0-3 in 1 group */
sum = sum & 0x3; /* clear garbage bits */
After this (if I haven't made any mistakes along the way), sum
contains x%3. On a processor that can do scaled addition (e.g., ARM),
each lina above can be done in one instruction (assuming the constants
used are set up in registers beforehand). On ARM7/StrongARM, this will
take 14 cycles which, while not lightning-fast, is a good deal faster
than full division by software. The same trick can be done for other
(2^n-1) and (mostly) in fewer instructions since there are fewer
groups to add.
Torben Mogensen (torbenm@diku.dk)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.