Your very own 32-way SIMD machine
What’s a good way of counting the number of bits set in a word? The obvious answer, adding the low bit to an accumulator, shifting right, and repeating, is O(n) in the number of bits in the word. This is a sequential approach – and we can do better, complexity-wise, by using a parallel algorithm. Let’s assume we are using 32-bit words, and that Xn is just such a 32-bit word:
X0 = input word X1 = (X0 & 0x55555555) + ((X0 >> 1) & 0x55555555) X2 = (X1 & 0x33333333) + ((X1 >> 2) & 0x33333333) X3 = (X2 & 0x0F0F0F0F) + ((X2 >> 4) & 0x0F0F0F0F) X4 = (X3 & 0x00FF00FF) + ((X3 >> 8) & 0x00FF00FF) X5 = (X4 & 0x0000FFFF) + ((X4 >> 16) & 0x0000FFFF) total number of set bits = X5
This algorithm is O(log2 n) in the number of bits in a word.
Every ordinary N-bit-word based sequential machine is a disguised N-way, 1-bit SIMD machine with a slightly odd instruction set. Lots more on data-parallel algorithms here.
What about finding which is the highest bit set in a word?
X0 = input word X1 = X0 or (X0 >> 1) X2 = X1 or (X1 >> 2) X3 = X2 or (X2 >> 4) X4 = X3 or (X3 >> 8) X5 = X4 or (X4 >> 16)
… and feed X5 through the parallel counter-of-set-bits algorithm above. The resulting number is the index of the highest set bit in the original word, starting from zero.