Related articles |
---|
Is there a way of generating code for x/const without division? jeremy@sw.oz.au (1992-04-02) |
Re: Is there a way of generating code for x/const without division? suitti@ima.isc.com (1992-04-03) |
Re: Is there a way of generating code for x/const without division? moss@cs.umass.edu (1992-04-03) |
Re: Is there a way of generating code for x/const without division? markt@harlqn.co.uk (1992-04-08) |
Re: Is there a way of generating code for x/const without division? haddad@pa.dec.com (1992-04-07) |
Newsgroups: | comp.compilers,comp.programming,comp.lang.c |
From: | markt@harlqn.co.uk (Mark Tillotson) |
Keywords: | arithmetic, optimize |
Organization: | Harlequin Limited, Cambridge, England |
References: | 92-04-018 |
Date: | Wed, 8 Apr 1992 11:53:34 GMT |
jeremy@sw.oz.au (Jeremy Fitzhardinge) writes:
> I've been playing with a simple code generator for RISC CPUs. One of the
> simple peephole optimisations it does is to replace multiplies by
> constants with a series of shift/adds rather than using the general
> multiplication.
>
> I'm wondering whether a similar thing can be done for divide and modulo,
> such that they are implemented by the code generator as a sequence of
> simple instructions that is more efficient (in time) than the more general
> way of performing divide/modulo.
Well, for powers of two it is easy, a single shift, but there is also a
method for modulo by numbers which are close to a power of two, such as
3,5,7,9,15,17. It only really pays off when division is expensive on the
processor (such as SPARC integer divide), or for the larger examples (such
as 0x200001)
The basic idea is to split to number up into groups of n bits, where the
desired modulus is close to 2^n, and then sum the sets of bit
appropriately so that the modulo of the sum is equal to the desired
result. This combination is done by series of shifts and adds/subtracts.
Consider B = 2^7, then we can take a number A = c3*B^3 + c2*B^2 ... + c0
and note that the modulo of this by B-1 is the sum of the modulos of each
section, (modulo B-1), and that B^n modulo B-1 is always 1. Thus we want
(c3+c2+c1+c0) modulo B-1, which is easy to calculate.
For a modulo of B+1, then B^n modulo B+1 is (-1)^n, so alternate
additions/subtractions are required.
Using a divide/conquer method to combine the chunks of bits enable an
instruction sequence of size log2 m, where m is the number of chunks in
the word, to combine the sets of bits. Beware of overflow between
neighbouring chunks (there is always a guard chunk though).
Example in C:
long module_by_127 (a)
long a,;
{
long tt ;
tt = (a & 0xF01FC07F) + ((a >> 7) & 0xF01FC07F) ;
tt += (tt >> 14) & 0x3fff ;
tt += (tt >> 28) ;
/* now repeat to compress the 9 bit result into 7 bits again */
tt = (tt & 0x7F) + ((tt >> 7) & 0x7F) ;
if (tt >= 0x7F)
{ tt -= 0x7F ; }
return (tt) ;
}
/* this is about 20 RISC instructions, as opposed to a loop of about 4
instructions that gets executed 32 times */
Of course the fastest way to divide or modulo on such a RISC as the
SPARC is to use double floats....
--
M. Tillotson Harlequin Ltd.
markt@uk.co.harlqn Barrington Hall,
44 223 872522 Barrington, Cambridge CB2 5RG
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.