Related articles |
---|
Optimal code sequences for signed int div/rem by powers of 2 S_JUFFA@IRAV1.ira.uka.de (1993-04-21) |
Re: Optimal code sequences for signed int div/rem by powers of 2 markt@harlqn.co.uk (1993-04-23) |
Newsgroups: | comp.arch,comp.compilers |
From: | S_JUFFA@IRAV1.ira.uka.de (|S| Norbert Juffa) |
Keywords: | arithmetic, optimize |
Organization: | University of Karlsruhe, FRG |
Date: | Wed, 21 Apr 1993 13:49:45 GMT |
This article is somewhat of a followup to the recent discussion about
integer division here comp.arch. I am also cross-posting to comp.compilers
because this stuff may be relevant for people involved with compiler code
generation.
Many modern RISC CPUs do not have an integer division instruction (SPARC
V7, Alpha), have only a division step instruction (AMD 29K), or have a
rather slow HW-division (microSPARC). Even if there is a division
instruction, it may fail to produce a remainder (SPARC V8).
Therefore, people are looking for alternatives to using division. Fast
alternatives are especially feasible if the divisor is known at compile
time (e.g. multiplication by reciprocal of divisor). Quite a few posts in
the recent discussion were involved with speeding up division by powers of
two known at compile time. This is really easy when dealing with unsigned
integers (-> shift right), but requires some correction steps for signed
integers.
Checking compiler output for 80x86 produced by Microsoft C 7.0 and for
SPARC by gcc 2.2.3 and the Solaris cc compiler, I find they don't use all
the optimizations possible for signed integers when the divisor is a power
of two. Especially, when the divisor is a negated power of two (i.e.
-2^i), all of these compilers will resort to divide and remainder
instructions or subroutines, although the shifting and masking approach
used for unsigned integers could be used with a few correction steps. The
Microsoft C 7.0 compiler doesn't optimize any division/remainder by 2^i or
-2^i for signed integers.
I wonder what the fastest instruction sequences for signed integer
division by +/- 2^n are for different machines. I know from the discussion
there are even machines that have signed integer division by powers of two
in HW (i960).
I am including the routines I came up with for 80x86 and SPARC for doing
signed integer division/remainder by +/- 2^i.
Norbert Juffa (s_juffa@iravcl.ira.uka.de)
CODE FOR Intel 80x86 (can easily be changed to 32-bit version for >= 386)
=========================================================================
/ 2: CMP AX, 8000h ; CY = 1, if dividend >= 0
SBB AX, -1 ; inc AX, if dividend < 0
SAR AX, 1 ; now do right shift
/ 2^n: CWD ; DX = FFFFh if dividend < 0
AND DX, 2^n-1 ; mask correction
ADD AX, DX ; apply correction if necessary
SAR AX, n ; now do right shift
/ -2: CMP AX, 8000h ; CY = 1, if dividend >= 0
SBB AX, -1 ; inc AX, if dividend < 0
SAR AX, 1 ; now do right shift
NEG AX ; use (x div -2) = - (x div 2)
/ -2^n: CWD ; DX = FFFFh if dividend < 0
AND DX, 2^n-1 ; mask correction
ADD AX, DX ; apply correction if necessary
SAR AX, n ; now do right shift
NEG AX ; use (x div -2^n) = - (x div 2^n)
% 2, % -2: CWD ; generate flag, FFFFh id divd < 0, else 0
AND AX, 1 ; mask out remainder
XOR AX, DX ; negate
SUB AX, DX ; remainder if dividend < 0
% 2^n, % -2^n: CWD ; generate mask, FFFFh if divd < 0, else 0
AND DX, 2^n-1 ; mask correction
ADD AX, DX ; apply pre-correction if necessary
AND AX, 2^n-1 ; mask out remainder
SUB AX, DX ; apply post-correction if necessary
CODE FOR SPARC
==============
Dividing a signed integer by a positive power of two (1<<n) known at
compile time.
1) for i/2: addcc %o1,%o1,%g0 ! carry if %o1 < 0
addx %o1,%g0,%o1 ! inc %o1, if %o1 < 0
sra %o1,1,%o1 ! do the shift
Although this code sequence uses three instructions, just like what is
produced by gcc now, it has the advantage that it doesn't need an
additional register. It has the disadvantage of destroying the
condition codes.
2) for i/(1<<n): sra %o1,31,%o2 ! %o2 = 0xffffffff if %o1 < 0
srl %o2,32-n,%o2 ! (1<<n)-1, if %o1<0, else 0
add %o1,%o2,%o1 ! apply correction
sra %o1,n,%o1 ! do the shift
The advantage of this sequence is that it doesn't use a branch and has
no problem with n>=13 since it doesn't use immediate constants for the
correction step, also it doesn't destroy the condition codes.
Dividing a signed integer by a negated positive power of two (-(1<<n))
should make use of the identity x / (-(1<<n)) = -(x / (1<<n)) and then
apply the code given above. Currently, gcc 2.2.3 generates calls to .div
1) for i/-2: addcc %o1,%o1,%g0 ! carry if %o1 < 0
addx %o1,%g0,%o1 ! inc %o1, if %o1 < 0
sra %o1,1,%o1 ! i / 2
neg %o1 ! -(i / 2)
2) for i/(-(1<<n)):sra %o1,31,%o2 ! %o2 = 0xffffffff if %o1 < 0
srl %o2,32-n,%o2 ! (1<<n)-1, if %o1<0, else 0
add %o1,%o2,%o1 ! apply correction
sra %o1,n,%o1 ! i / (1<<n)
neg %o1 ! -(i / (1<<n))
Computing the remainder (% operator) of a division of a signed integer by
a positive power of two (1<<n) known at compile time. Currently, gcc uses
calls to .rem.
1) for i%2, i%(/-2):
sra %o1,31,%o2 ! 0xFFFFFFFF, if %o < 0, else 0
and %o1,1,%o1 ! mask out remainder
xor %o1,%o2,%o1 ! negate remainder if quotient
sub %o1,%o2,%o1 ! negative (sign(quot)=sign(rem)!)
2) for i%(1<<n), i%(-(1<<n)):
sub %g0,1,%o2 ! 0xffffffff
srl %o2,32-n,%o2 ! (1<<n)-1
sra %o1,31,%o3 ! 0xffffffff, if %o1 < 0, else 0
and %o3,%o2,%o3 ! (1<<n)-1, if %o1 < 0, else 0
add %o1,%o3,%o1 ! apply correction if necessary
and %o1,%o2,%o1 ! mask out remainder bits
sub %o1,%o3,%o1 ! apply correction if necessary
This instruction sequence is branch free, doesn't destroy the condition
codes and has no problem with n>=13, since it doesn't use immediate
constants in the correction step.
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.