Related articles |
---|
multiplication by constant - quick comparison of algorithms wu@sequent.com (Youfeng Wu) (1992-10-21) |
Re: multiplication by constant - quick comparison of algorithms preston@helena.cs.rice.edu (1992-10-21) |
Re: multiplication by constant - quick comparison of algorithms bgb@iexist.att.com (1992-10-25) |
Re: multiplication by constant - quick comparison of algorithms chased@rbbb.Eng.Sun.COM (1992-10-27) |
Re: multiplication by constant - quick comparison of algorithms joe@babel.ho.att.com (1992-10-28) |
Newsgroups: | comp.compilers |
From: | chased@rbbb.Eng.Sun.COM (David Chase) |
Organization: | Sun Microsystems, Mt. View, Ca. |
Date: | Tue, 27 Oct 1992 20:46:18 GMT |
Keywords: | arithmetic, optimize |
References: | 92-10-079 92-10-095 |
wu@sequent.com (Youfeng Wu):
> For all numbers in [2, 10000], there is a total number of 64612 one's in
> their binary represenations, and the following compares the total number
> of add/sub/shift instructions used to perform all of the multiplications
bgb@iexist.att.com (Brian G Beuning) writes:
>Isn't it a little dangerous to use sub? Won't it overflow sometimes when
>the multiply would not.
It all depends on the machine. Some machines (SPARC and RS6000 and ARM,
e.g.) have non cc-setting variants of their arithmetic instructions. Even
if an overflow condition code is set, you're free to ignore it. The only
place this is a problem is when "overflow" causes a trap.
The miracle of two's complement arithmetic lets you come up with some
entertaining sequences for some multiplications. It depends a lot on your
heuristics for guiding the search, too. For instance, my multiply
converter persists in choosing to take a first step of
106 = 1 - (-105)
when processing 106*x. I think the reason is that -105 has fewer
transitions between 1 and 0 in it than 105, and thus it is preferred as a
first step.
One reason for avoiding some of the more bizarre sequences is that they
lead the search algorithm far afield, and make it take too long. It's
also very tricky to get just the right search algorithm when exploiting
these tricky overflow properties, because you need to watch out for both
signed and unsigned products when factoring. For example, one algorithm
that I tinkered with years past at another job would reduce (-180229)*x in
3 steps:
Rb = Ra + (Ra << 16); /* 65537 <- 1, step 4, shift 16 */
Rb = Ra + (Rb << 2); /* 262149 <- 65537, step 4, shift 2 */
Rb = (Rb << 14) - Rb; /* -180229 <- 262149, step 2, shift 14 */
but only found 4 steps for 180229*x
Rb = Ra + (Ra << 2); /* 5 <- 1, step 4, shift 2 */
Rb = Ra + (Rb << 1); /* 11 <- 5, step 4, shift 1 */
Rb = Ra + (Rb << 12); /* 45057 <- 11, step 4, shift 12 */
Rb = Ra + (Rb << 2); /* 180229 <- 45057, step 4, shift 2 */
missing the obvious reversal of the last step in the negative
multiplication. (The number of cases where this happens is not large.)
Note the factorizations required to do the steps:
-180229 tc= 4294787067 = 262149 * 16383 = 0xfffd3ffb
(use an unsigned factorization)
180229 = 262149 * (-16383) = 0x0002c005
(overflow city - the sign bits are gone)
In this case, when searching factors of 180229, it makes sense to search
for factors of -180229, and a factor is of the form E1 - E2, then that
means that E2 - E1 can be used instead.
This makes the search even more expensive, since you are considering more
cases at each step, and each step is more expensive. In practice, I don't
think it is worth it in this case (that number didn't drop out of the air
-- I constructed it so that -180229 was in fact equal to (65537 * 4 + 1) *
16383, and also negative.)
A more interesting negative number (for students of such trivia) is
-2147270651. It reduces in three steps, with no negatable steps involved:
I = -2147270651 (10000000000000110100000000000101)
Rb = Ra + (Ra << 15); /* 32769 <- 1, step 4, shift 15 */
Rb = Ra + (Rb << 2); /* 131077 <- 32769, step 4, shift 2 */
Rb = (Rb << 14) + Rb; /* -2147270651 <- 131077, step 1, shift 14 */
I have not yet discovered a 3-step sequence for 2147270651. Again, this
number was found by construction: (32769 * 4 + 1) * 16385.
Disclaimer: Note well that these sequences may only work on 32-bit two's
complement machines.
David Chase
Sun
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.