HAKMEM #169

chase@centerline.com (David Chase)
Thu, 6 Jul 1995 16:04:49 GMT

From comp.compilers

Related articles
integer mod with constant modulus. rocky@panix.com (R. Bernstein) (1995-06-24)
Re: integer mod with constant modulus. mernst@theory.lcs.mit.edu (1995-06-29)
*HAKMEM #169 chase@centerline.com* (1995-07-06)**
Re: HAKMEM #169 hbaker@netcom.com (1995-07-11)
Re: HAKMEM #169 Tommy.Thorn@irisa.fr (1995-07-15)

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	chase@centerline.com (David Chase)
Keywords:	arithmetic, comment
Organization:	CenterLine Software
References:	95-06-047 95-07-028
Date:	Thu, 6 Jul 1995 16:04:49 GMT

mernst@theory.lcs.mit.edu (Michael Ernst) writes:
> This is a really neat technique which was described in HAKMEM and is used
> in the X11 sources to count the number of bits in a word (which is actually
> a modulus operation). It doesn't require any branches. On some
> architectures the final modulus may be expensive; it can be replaced by a
> few more shifts, masks, and adds in the obvious way.

Actually, I'd be interested if you could name an architecture for
which the final modulus is not expensive. If I just code up
population count in the ordinary way (see below), I find it is faster
on all the different machine-compiler combiations that I have the
patience to try.

> /*
> * This is HAKMEM 169, as used in X11 sources.
> */
> static int
> t0_hackmemmod(register unsigned long n)
> {
> register unsigned long tmp;
>
> tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
> return ((tmp + (tmp >> 3)) & 030707070707) % 63;
> }

                                                                              time in seconds
                                                      hakmem hakmem'*4 naive compiler & flags
HP-UX A.09.05 C 9000/710 19.6 8.3 6.9 c89 +O3
IBM Power something 8.2 5.1 4.0 xlc -O3
MicroSparc (50 Mhz) 18.6 10.8 8.8 acc 3.0.1 -fast -cg92 *2
MicroSparc (50 Mhz) 30.7 9.5 9.0 acc 3.0.1 -fast -cg89 *3
MicroSparc (50 Mhz) 29.9 8.9 8.0 gcc 2.5.8 -O3

SuperSparc (50 Mhz) 8.5 5.7 5.2 cc 3.0.1 -fast -xcg92
SuperSparc (50 Mhz) 21.1 5.7 5.4 cc 3.0.1 -fast -xcg89

Sparc ELC (Cypress, 33 Mhz) *1 16.6 15.6 cc 3.0.1 -fast -xcg92
Sparc ELC (Cypress, 33 Mhz) 52.6 16.7 15.6 cc 3.0.1 -fast -xcg89
Sparc ELC (Cypress, 33 Mhz) 47.7 10.8 8.3 gcc 2.6.3 -O3
Sparc ELC (Cypress, 33 Mhz) 49.5 13.4 12.3 cc 3.0.1 -fast -xO4 -xcg89

*1 I went to the bathroom and made tea, came back, and it still wasn't done.
      The missing instruction is emulated by a trap to the OS.

*2 "-cg89" indicates scheduling and instruction selection tuned to run well
      on the SparcStation 2. In particular, no hardware division.

*3 "-cg92" indicates sched. and inst. selection tuned to run well on the
      SuperSparc and MicroSparc chips. In particular, support for division
      in hardware.

*2 & *3 -- as a former Sun employee and compiler hacker, I am very familiar
with the compiler flags and versions. I am not so familiar with the ins and
outs of IBM and HP compilers. I also tried various GCC versions to test the
hypothesis that X11 and GCC were tuned for one another -- that is, GCC might
be loaded with optimizations tuned for code like what is in hakmem (and, in
fact, gcc optimizes the mask construction better than cc).

*4 For hakmem-prime, I performed the mod-63 operation with hand-written shifts,
      masks, and additions. I may not have written it optimally, but it appears
      to be correct (I tested it separately on a wide range of inputs, and of
      course I worked it out first carefully by hand).

So, what do we learn from this:

- if this is really used in X11, it is a mistake. Why make use of obscure,
    slow code when "straightforward" code is ALWAYS faster? (Yet more fodder
    for the X11-haters.)

- it looks like it really would make sense to perform some constant remainders
    with a sequence of shifts, masks, and additions. Hakmem-prime consistently
    beat hakmem by a wide margin, and the only difference between the two was the
    optimized constant-remainder.

- be wary of untested "great hacks", even when they come from a renowned
    Institute of Technology.

Here's the test code:
--------------------------------------------------
#include <stdio.h>

#ifdef NAIVE
int
popcount(unsigned long n)
{
    unsigned long x, y;
    x = n & 0x55555555;
    y = (n >> 1) & 0x55555555;
    n = x + y;
    x = n & 0x33333333;
    y = (n >> 2) & 0x33333333;
    n = x + y;
    x = n & 0x0f0f0f0f;
    y = (n >> 4) & 0x0f0f0f0f;
    n = x + y;
    x = n & 0x00ff00ff;
    y = (n >> 8) & 0x00ff00ff;
    n = x + y;
    x = n & 0x0000ffff;
    y = (n >> 16);
    n = x + y;
    return n;
}
#endif
#ifdef HAKMEM
int
popcount(unsigned long n)
{
    unsigned long tmp;

    tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
    return ((tmp + (tmp >> 3)) & 030707070707) % 63;
}
#endif
#ifdef HAKMEM_P
int
popcount(unsigned long n)
{
    unsigned long tmp, x, y;

    tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
    tmp = ((tmp + (tmp >> 3)) & 030707070707);

    /* Now take remainder mod 63 */
    x = tmp & 07700770077;
    y = (tmp >> 6) & 07700770077;
    tmp = x + y; /* tmp contains 3 sums */
    tmp = ((tmp >> 12) + (tmp >> 24) + tmp) & 0777;
    tmp = (tmp >> 6) + (tmp & 077);
    tmp = tmp + 1;
    tmp = (tmp >> 6) + (tmp & 077) - 1;
    return tmp;
}
#endif
main() {
    int i;
    int j = 0;

    for (i = 0; i < 10000000; i++) {
        j = j + popcount(i);
    }
    printf("j = %d\n", j);
}
--------------------------------------------------

speaking for myself,

David Chase
[I'm pretty sure it was a win on the PDP-6. -John]
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

HAKMEM #169

chase@centerline.com (David Chase)Thu, 6 Jul 1995 16:04:49 GMT

chase@centerline.com (David Chase)
Thu, 6 Jul 1995 16:04:49 GMT