<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.12-alpha" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Your very own 32-way SIMD machine</title>
	<link>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine</link>
	<description>What happens at LShift</description>
	<pubDate>Fri, 21 Nov 2008 22:12:08 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.12-alpha</generator>

	<item>
		<title>by: Sebastian</title>
		<link>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-70818</link>
		<pubDate>Sun, 11 Nov 2007 09:06:22 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-70818</guid>
					<description>&lt;p&gt;Hi Tony, &lt;/p&gt;

&lt;p&gt;There is another way to find the highest bit set: convert the number to a float and inspect the exponent.&lt;/p&gt;

&lt;p&gt;This is folklore, but explicitly mentioned in footnote 3 on p. 2 of &lt;a href="http://www.mpi-inf.mpg.de/~kettner/pub/veb_tree_alenex_04.pdf" rel="nofollow"&gt;van Emde Boas tree data structure&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cheers,&lt;/p&gt;

&lt;p&gt;Sebastian.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Hi Tony, </p>
<p>There is another way to find the highest bit set: convert the number to a float and inspect the exponent.</p>
<p>This is folklore, but explicitly mentioned in footnote 3 on p. 2 of <a href="http://www.mpi-inf.mpg.de/~kettner/pub/veb_tree_alenex_04.pdf" rel="nofollow">van Emde Boas tree data structure</a></p>
<p>Cheers,</p>
<p>Sebastian.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Jason</title>
		<link>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68172</link>
		<pubDate>Mon, 15 Oct 2007 22:29:25 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68172</guid>
					<description>&lt;p&gt;Tony, that's both completely messed in the head and cool.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Tony, that&#8217;s both completely messed in the head and cool.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: niklas</title>
		<link>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68164</link>
		<pubDate>Mon, 15 Oct 2007 18:54:43 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68164</guid>
					<description>&lt;p&gt;For x86 processors finding the first and last bit is best done with the BSF (Bit Scan Forward) and BSR (Bit Scan Reverse) instructions. They are quite fast on Intel (2 µops), but takes 20+ cycles on AMD, still it has to be a lot faster than the C routine.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>For x86 processors finding the first and last bit is best done with the BSF (Bit Scan Forward) and BSR (Bit Scan Reverse) instructions. They are quite fast on Intel (2 µops), but takes 20+ cycles on AMD, still it has to be a lot faster than the C routine.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: thomas figg</title>
		<link>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68151</link>
		<pubDate>Mon, 15 Oct 2007 12:14:16 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68151</guid>
					<description>&lt;p&gt;The population count from the "Aggregate magic page"  is a bit smaller:&lt;/p&gt;

&lt;p&gt;from: http://aggregate.org/MAGIC/&lt;/p&gt;

&lt;p&gt;unsigned int ones32(register unsigned int x)
{
        /* 32-bit recursive reduction using SWAR...
       but first step is mapping 2-bit values
       into sum of 2 1-bit values in sneaky way
    */
        x -= ((x &#62;&#62; 1) &#38; 0x55555555);
        x = (((x &#62;&#62; 2) &#38; 0x33333333) + (x &#38; 0x33333333));
        x = (((x &#62;&#62; 4) + x) &#38; 0x0f0f0f0f);
        x += (x &#62;&#62; 8);
        x += (x &#62;&#62; 16);
        return(x &#38; 0x0000003f);
}&lt;/p&gt;

&lt;p&gt;And their most significant 1 bit is too:&lt;/p&gt;

&lt;p&gt;unsigned int
msb32(register unsigned int x)
{
        x &#124;= (x &#62;&#62; 1);
        x &#124;= (x &#62;&#62; 2);
        x &#124;= (x &#62;&#62; 4);
        x &#124;= (x &#62;&#62; 8);
        x &#124;= (x &#62;&#62; 16);
        return(x &#38; ~(x &#62;&#62; 1));
}2&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>The population count from the &#8220;Aggregate magic page&#8221;  is a bit smaller:</p>
<p>from: http://aggregate.org/MAGIC/</p>
<p>unsigned int ones32(register unsigned int x)<br />
{<br />
        /* 32-bit recursive reduction using SWAR&#8230;<br />
       but first step is mapping 2-bit values<br />
       into sum of 2 1-bit values in sneaky way<br />
    */<br />
        x -= ((x &gt;&gt; 1) &amp; 0&#215;55555555);<br />
        x = (((x &gt;&gt; 2) &amp; 0&#215;33333333) + (x &amp; 0&#215;33333333));<br />
        x = (((x &gt;&gt; 4) + x) &amp; 0&#215;0f0f0f0f);<br />
        x += (x &gt;&gt; 8);<br />
        x += (x &gt;&gt; 16);<br />
        return(x &amp; 0&#215;0000003f);<br />
}</p>
<p>And their most significant 1 bit is too:</p>
<p>unsigned int<br />
msb32(register unsigned int x)<br />
{<br />
        x |= (x &gt;&gt; 1);<br />
        x |= (x &gt;&gt; 2);<br />
        x |= (x &gt;&gt; 4);<br />
        x |= (x &gt;&gt; 8);<br />
        x |= (x &gt;&gt; 16);<br />
        return(x &amp; ~(x &gt;&gt; 1));<br />
}2</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Pete Kirkham</title>
		<link>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68148</link>
		<pubDate>Mon, 15 Oct 2007 12:12:04 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68148</guid>
					<description>&lt;p&gt;You also might like &lt;a href="http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel" rel="nofollow"&gt;Bit Twiddling Hacks&lt;/a&gt;, which includes the above and more.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>You also might like <a href="http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel" rel="nofollow">Bit Twiddling Hacks</a>, which includes the above and more.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: tonyg</title>
		<link>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68145</link>
		<pubDate>Mon, 15 Oct 2007 11:54:43 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68145</guid>
					<description>&lt;p&gt;Try &lt;a href="http://cva.stanford.edu/classes/cs99s/papers/hillis-steele-data-parallel-algorithms.pdf" rel="nofollow" rel="nofollow"&gt;http://cva.stanford.edu/classes/cs99s/papers/hillis-steele-data-parallel-algorithms.pdf&lt;/a&gt;, or &lt;a href="http://www.google.com/search?q=cache:olPZtuHoqQMJ:cva.stanford.edu/classes/cs99s/papers/hillis-steele-data-parallel-algorithms.pdf+hillis+steele+data-parallel&#038;hl=en&#038;ct=clnk&#038;cd=2&#038;client=iceweasel-a" rel="nofollow" rel="nofollow"&gt;Google's HTML rendering of the PDF&lt;/a&gt;.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Try <a href="http://cva.stanford.edu/classes/cs99s/papers/hillis-steele-data-parallel-algorithms.pdf" rel="nofollow" rel="nofollow">http://cva.stanford.edu/classes/cs99s/papers/hillis-steele-data-parallel-algorithms.pdf</a>, or <a href="http://www.google.com/search?q=cache:olPZtuHoqQMJ:cva.stanford.edu/classes/cs99s/papers/hillis-steele-data-parallel-algorithms.pdf+hillis+steele+data-parallel&#038;hl=en&#038;ct=clnk&#038;cd=2&#038;client=iceweasel-a" rel="nofollow" rel="nofollow">Google&#8217;s HTML rendering of the PDF</a>.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Pete Kirkham</title>
		<link>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68142</link>
		<pubDate>Mon, 15 Oct 2007 11:14:17 +0000</pubDate>
		<guid>http://www.lshift.net/blog/2007/10/15/your-very-own-32-way-simd-machine#comment-68142</guid>
					<description>&lt;p&gt;The link to the pdf from the LtU node is broken, and I can't find the paper on citeseer. It is available to ACM members from http://portal.acm.org/citation.cfm?id=7903&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>The link to the pdf from the LtU node is broken, and I can&#8217;t find the paper on citeseer. It is available to ACM members from http://portal.acm.org/citation.cfm?id=7903</p>
]]></content:encoded>
				</item>
</channel>
</rss>
