Your very own 32-way SIMD machine

October 15th, 2007 tonyg

What’s a good way of counting the number of bits set in a word? The obvious answer, adding the low bit to an accumulator, shifting right, and repeating, is O(n) in the number of bits in the word. This is a sequential approach - and we can do better, complexity-wise, by using a parallel algorithm. Let’s assume we are using 32-bit words, and that Xn is just such a 32-bit word:

X0 = input word
X1 = (X0 & 0x55555555) + ((X0 >>  1) & 0x55555555)
X2 = (X1 & 0x33333333) + ((X1 >>  2) & 0x33333333)
X3 = (X2 & 0x0F0F0F0F) + ((X2 >>  4) & 0x0F0F0F0F)
X4 = (X3 & 0x00FF00FF) + ((X3 >>  8) & 0x00FF00FF)
X5 = (X4 & 0x0000FFFF) + ((X4 >> 16) & 0x0000FFFF)
total number of set bits = X5

This algorithm is O(log2 n) in the number of bits in a word.

Every ordinary N-bit-word based sequential machine is a disguised N-way, 1-bit SIMD machine with a slightly odd instruction set. Lots more on data-parallel algorithms here.

What about finding which is the highest bit set in a word?

X0 = input word
X1 = X0 or (X0 >> 1)
X2 = X1 or (X1 >> 2)
X3 = X2 or (X2 >> 4)
X4 = X3 or (X3 >> 8)
X5 = X4 or (X4 >> 16)

… and feed X5 through the parallel counter-of-set-bits algorithm above. The resulting number is the index of the highest set bit in the original word, starting from zero.

Entry Filed under: Technology, Water cooler, Programming

7 Comments Add your own

  • 1. Pete Kirkham  |  October 15th, 2007 at 12:14 pm

    The link to the pdf from the LtU node is broken, and I can’t find the paper on citeseer. It is available to ACM members from http://portal.acm.org/citation.cfm?id=7903

  • 2. tonyg  |  October 15th, 2007 at 12:54 pm

    Try http://cva.stanford.edu/classes/cs99s/papers/hillis-steele-data-parallel-algorithms.pdf, or Google’s HTML rendering of the PDF.

  • 3. Pete Kirkham  |  October 15th, 2007 at 1:12 pm

    You also might like Bit Twiddling Hacks, which includes the above and more.

  • 4. thomas figg  |  October 15th, 2007 at 1:14 pm

    The population count from the “Aggregate magic page” is a bit smaller:

    from: http://aggregate.org/MAGIC/

    unsigned int ones32(register unsigned int x)
    {
    /* 32-bit recursive reduction using SWAR…
    but first step is mapping 2-bit values
    into sum of 2 1-bit values in sneaky way
    */
    x -= ((x >> 1) & 0×55555555);
    x = (((x >> 2) & 0×33333333) + (x & 0×33333333));
    x = (((x >> 4) + x) & 0×0f0f0f0f);
    x += (x >> 8);
    x += (x >> 16);
    return(x & 0×0000003f);
    }

    And their most significant 1 bit is too:

    unsigned int
    msb32(register unsigned int x)
    {
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);
    return(x & ~(x >> 1));
    }2

  • 5. niklas  |  October 15th, 2007 at 7:54 pm

    For x86 processors finding the first and last bit is best done with the BSF (Bit Scan Forward) and BSR (Bit Scan Reverse) instructions. They are quite fast on Intel (2 µops), but takes 20+ cycles on AMD, still it has to be a lot faster than the C routine.

  • 6. Jason  |  October 15th, 2007 at 11:29 pm

    Tony, that’s both completely messed in the head and cool.

  • 7. Sebastian  |  November 11th, 2007 at 10:06 am

    Hi Tony,

    There is another way to find the highest bit set: convert the number to a float and inspect the exponent.

    This is folklore, but explicitly mentioned in footnote 3 on p. 2 of van Emde Boas tree data structure

    Cheers,

    Sebastian.

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed

Calendar

October 2007
M T W T F S S
« Sep   Nov »
1234567
891011121314
15161718192021
22232425262728
293031  

Most Recent Posts