<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>LShift Ltd.</title>
	<atom:link href="http://www.lshift.net/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.lshift.net/blog</link>
	<description>What happens at LShift</description>
	<pubDate>Wed, 01 Feb 2012 00:14:59 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>Being Shifty with Minecraft — Blue Sky Thinking</title>
		<link>http://www.lshift.net/blog/2012/02/01/being-shifty-with-minecraft-%e2%80%94-blue-sky-thinking</link>
		<comments>http://www.lshift.net/blog/2012/02/01/being-shifty-with-minecraft-%e2%80%94-blue-sky-thinking#comments</comments>
		<pubDate>Wed, 01 Feb 2012 00:03:30 +0000</pubDate>
		<dc:creator>hok</dc:creator>
		
		<category><![CDATA[Haskell]]></category>

		<category><![CDATA[Programming]]></category>

		<category><![CDATA[Technology]]></category>

		<category><![CDATA[binary]]></category>

		<category><![CDATA[development]]></category>

		<category><![CDATA[haskell]]></category>

		<category><![CDATA[LShift]]></category>

		<category><![CDATA[minecraft]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=636</guid>
		<description><![CDATA[After spending a bit over three months at LShift, I am proud to leave LShift's mark in the Minecraft Universe.

Frolicking over Minecraft's cubic pastures and passing by interesting arrangements of hovering dirt blocks suspended in mid-air is all in a Minecrafter's day's work. But if you ever see light-blue wool blocks hanging around in the air, you can be sure that someone's been . . . Shifty  . . .

The ones you see in the picture above, in fact, have been put into the Minecraft world by a tool I wrote in Haskell. In this multi-part series, I want to share with you how I did it.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.lshift.net/blog/wp-content/uploads/2012/01/lshift-blog-top.png"><img class="aligncenter size-full wp-image-645" title="lshift-blog-top" src="http://www.lshift.net/blog/wp-content/uploads/2012/01/lshift-blog-top.png" alt="LShift logo floating in mid-air" width="500" height="298" /></a></p>

<p>After spending a bit over three months at LShift, I am proud to leave LShift&#8217;s mark in the Minecraft Universe.</p>

<p>Frolicking over Minecraft&#8217;s cubic pastures and passing by interesting arrangements of hovering dirt blocks suspended in mid-air is all in a Minecrafter&#8217;s day&#8217;s work. But if you ever see light-blue wool blocks hanging around in the air, you can be sure that someone&#8217;s been . . . <span style="color: #00aacc;">Shifty </span><img class="alignnone size-full wp-image-648" title="lshift_logo_5x5" src="http://www.lshift.net/blog/wp-content/uploads/2012/01/lshift_logo_5x5.png" alt="" width="10" height="10" /> . . .</p>

<p>The ones you see in the picture above, in fact, have been put into the Minecraft world by a tool I wrote in Haskell. In this multi-part series, I want to share with you how I did it.</p>

<p><span id="more-636"></span></p>

<h3>An Executive Summary</h3>

<p>The theme:</p>

<blockquote>What does it take to carry an idea from conception to realisation in Haskell?</blockquote>

<p>And here are the requirements I worked with. The task is to write a Haskell program that takes</p>

<ul>
    <li>a path to a PNG image</li>
    <li>path to a Minecraft saved world</li>
</ul>

<p>and then it</p>

<ul>
    <li>quantises the pixel data fron the PNG image into Minecraft coloured wool blocks (to a palette of a dazzling 16 colours!!)</li>
    <li>inserts these coloured wool blocks into the world at a height 5 blocks above the player&#8217;s head</li>
</ul>

<p>Simple? Not quite. As with any software project <span>—</span> even one as small as this <span>— </span>the devil is in the details. The Minecraft saved game format turns out to have an interesting structure involving offsets, some compression and even some odd (<em>even some odd?</em> :/)  &#8217;nybble endianness&#8217; in places.</p>

<h3>My Motivation</h3>

<p>In case you are wondering, there are already many really good libraries for editing and manipulating Minecraft saved worlds (and no, I&#8217;m not talking about the diamond axe), so the concept of my tool <a href="http://www.minecraftforum.net/topic/26098-pymclevel-minecraft-levels-for-python/">is</a> <a href="http://code.google.com/p/substrate-minecraft/">nothing</a> <a href="https://github.com/danielribeiro/RubyCraft">new</a>.</p>

<p>To use those tools directly for my aims presents little challenge. Being a bit of a Haskell enthusiast, this project presented itself as a fun way to try Haskell out for size on problems resembling those you see in the real-world, such as dealing with and manipulating binary data.</p>

<h3>The Road Ahead</h3>

<p>Throughout the series, I aim to uncover details as we require them, just as I did while I explored the problem space. At a glance:</p>

<ul>
    <li><em>Haskell stubbing for fun and profit</em>: We (pretend!) to practise test-driven development by making the success of reading and writing Minecraft saved world files (called <em>Region</em> files) a <em>test property
</em></li>
    <li><em>Dealing with compression using Functors and Phantom Types:</em> We design the main data types for maximum programmer comfort, paving the road for operations on arrays of block data.</li>
    <li><em>Serious Binary Serialisation</em>: We implement code to read and write Region files using the Get and Put monads from Data.Binary, and ensure a roundtrip succeeds</li>
    <li><em>Update your Chunks everywhere using SYB and SYZ</em>: Time to change the world! We write code to perform somewhat troublesome chunk updates. We also extract the player&#8217;s current coordinates in the world from Level data</li>
    <li><em>Fun turning Pixel Data into Wool using Codec.DevIL</em>: We read in image data using Codec.DevIL, write a simple quantisation function, and build the final executable</li>
</ul>

<p>I must admit I didn&#8217;t actually practise TDD, but the way I went about it actually more in line with the strong-typing mantra:</p>

<blockquote>
<div>If it compiles, your code&#8217;s probably doing something useful.</div></blockquote>

<p>Though whenever I found that my code didn&#8217;t work immediately after making it compile, the fall-back is simply to write some tests to help debug the problem.</p>

<p>And that, dear reader, is the principal matter of the next post. I hope you&#8217;ll enjoy it!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2012/02/01/being-shifty-with-minecraft-%e2%80%94-blue-sky-thinking/feed</wfw:commentRss>
		</item>
		<item>
		<title>&#8220;Duck-finding&#8221; for testing your Theories</title>
		<link>http://www.lshift.net/blog/2012/01/31/duck-finding-for-testing-your-theories</link>
		<comments>http://www.lshift.net/blog/2012/01/31/duck-finding-for-testing-your-theories#comments</comments>
		<pubDate>Tue, 31 Jan 2012 12:44:53 +0000</pubDate>
		<dc:creator>Frank Shearar</dc:creator>
		
		<category><![CDATA[Smalltalk]]></category>

		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=649</guid>
		<description><![CDATA[A while ago I wrote a semi-port of Haskell&#8217;s QuickCheck. Easy enough - a property is like a test method but with arity 1, into which you inject data - potential counterexamples to your theory. In Haskell, the type system can, through unification, figure out the type of the generator required for that property. What [...]]]></description>
			<content:encoded><![CDATA[<p>A while ago I wrote a semi-port of Haskell&#8217;s QuickCheck. Easy enough - a property is like a test method but with arity 1, into which you inject data - potential counterexamples to your theory. In Haskell, the type system can, through unification, figure out the type of the generator required for that property. What to do in a dynamic language?</p>

<p><span id="more-649"></span></p>

<p>There are a number of type inference techniques for dynamic languages - <a href="http://matt.might.net/articles/implementation-of-kcfa-and-0cfa/">k-CFA</a>, <a href="http://lexspoon.org/chuck/spoon-ecoop04.pdf">demand-driven type inferencing with subgoal pruning</a>, <a href="http://decomp.ulb.ac.be/roelwuyts/smalltalk/roeltyper/">RoelTyper</a>. I&#8217;m going to use a very simple technique.</p>

<p>First, some terminology. In Smalltalk, &#8220;protocol&#8221; usually means one of two things: either &#8220;what messages does this object understand?&#8221; or &#8220;does this object understand the Foo protocol?&#8221;, where Foo might be &#8220;Stream&#8221;, or &#8220;Collection&#8221;. We&#8217;re going to use the latter meaning. In particular, given a JUnit-like theory, we want an answer to the question &#8220;What objects - what instances of what classes - satisfy the protocol sent to the datum injected into this theory?&#8221;</p>

<p>With a decompiler to hand, it&#8217;s easy enough to generate an AST of the theory over which we can walk: walk the <code>MessageNode</code>s and look for things with a receiver <code>'t1'</code> which will be the name given to the first temporary variable, i.e., the argument of the unary method. This won&#8217;t work for anything hidden through a <code>#perform:</code> - think of this as the <code>eval</code> of Smalltalk code.</p>

<pre><code>    ParseNodeVisitor subclass: #SenderToArgCollector
    instanceVariableNames: 'selectors classSelectors'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'SqueakCheck-SUnit'

    visitMethodNode: aMethodNode
        classSelectors := Set new.
        selectors := Set new.
        ^ super visitMethodNode: aMethodNode.

    visitMessageNode: aMessageNode
        (aMessageNode receiver name = 't1')
            ifTrue: [selectors add: aMessageNode selector key].
        (aMessageNode receiver isMessageNode
            and: [aMessageNode receiver selector key = #class]
            and: [aMessageNode receiver receiver name = 't1'])
            ifTrue: [classSelectors add: aMessageNode selector key].
        ^ super visitMessageNode: aMessageNode.

    selectors
        ^ selectors.

    classSelectors
        ^ classSelectors.
</code></pre>

<p>and invoked by our <code>TheoryTyper</code> (which could be called a <code>DuckFinder</code>&#8230;)</p>

<pre><code>    messagesSentToDatum: aUnaryCompiledMethod
    "Answer a pair of Sets of all the message selectors sent by this method to its argument.
    The first Set contains messages sent to the argument, and the second contains messages
    sent to the argument's class."
    | collector |
    collector := SenderToArgCollector new
        visitMethodNode: (Decompiler new
            decompile: aUnaryCompiledMethod selector
            in: aUnaryCompiledMethod methodClass).

    ^ {collector selectors. collector classSelectors}.
</code></pre>

<p>Let&#8217;s try out our new toy. To recap, we wish to write a theory and have the system automatically find the right types of things to test the theory. So let&#8217;s try the &#8220;monadic laws&#8221;:</p>

<pre><code>    Object subclass: #TheoryTyper
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'SqueakCheck-SUnit'

    monadsObeyLeftIdentity: m

    self
        assert: (m class return: m value) &gt;&gt;= [:t | m class return: t]
        equals: ([:t | m class return: t] value: m value)

    monadsObeyRightIdentity: m

        self assert: m equals: (m &gt;&gt;= [:a | m class return: a])
</code></pre>

<p>And lo! our duck-finder says, with the <code>Maybe</code> and <code>Either</code> monads loaded:</p>

<pre><code>    TheoryTyper new typeOfDatum: (MonadTheories &gt;&gt; #monadsObeyLeftIdentity:)
    "=&gt; a Set(Maybe Either)"
</code></pre>

<p>The keen-eyed will notice a law missing from the above - the associativity law for monads. Given monadic blocks <code>f</code> and <code>g</code> - that is, unary blocks that take some value and return a value wrapped up in whatever monad you&#8217;re using - we can express this law as</p>

<pre><code>    monadsObeyAssociativity: m
    | f g |

    self
        assert: ((m &gt;&gt;= f) &gt;&gt;= g)
        equals: (m &gt;&gt;= [:x | (f value: x) &gt;&gt;= g])
</code></pre>

<p>but from where do we get <code>f</code> and <code>g</code>? Our naive duck-finding fails: we would need to extend our &#8220;type inference&#8221;. One possible (and fairly ugly) solution is to make <code>m</code>&#8217;s class responsible through helpers: add <code>sampleBlockF</code> and <code>sampleBlockG</code> messages to the protocol and we could write:</p>

<pre><code>    monadsObeyAssociativity: m
    | f g |

    f := m class sampleBlockF.
    g := m class sampleBlockG.
    self
        assert: ((m &gt;&gt;= f) &gt;&gt;= g)
        equals: (m &gt;&gt;= [:x | (f value: x) &gt;&gt;= g])
</code></pre>

<p>Also, typing <code>m</code> is a bit harder than in the previous examples:</p>

<pre><code>    TheoryTyper new typeOfDatum:
        (MonadTheories &gt;&gt; #monadsObeyAssociativity:)
    "=&gt; a Set(Either Nothing Maybe Left Right Just)"
</code></pre>

<p>because all we have to work with is the send of <code>#&gt;&gt;=</code>. Arguably it would be clearer to return <code>a Set(Either Maybe)</code> because the other classes are subclasses of these two.</p>

<p>However, despite the limitations of this technique, one can express theories in a nicely modular way. The monadic laws will simply run against <em>any</em> monadic classes (that is, any classes that understand <code>#&gt;&gt;=</code> on the instance side and <code>return:</code> on the class side) present in the image.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2012/01/31/duck-finding-for-testing-your-theories/feed</wfw:commentRss>
		</item>
		<item>
		<title>The unreasoned Javan</title>
		<link>http://www.lshift.net/blog/2012/01/17/the-unreasoned-javan</link>
		<comments>http://www.lshift.net/blog/2012/01/17/the-unreasoned-javan#comments</comments>
		<pubDate>Tue, 17 Jan 2012 20:32:26 +0000</pubDate>
		<dc:creator>tim</dc:creator>
		
		<category><![CDATA[Java]]></category>

		<category><![CDATA[Rant]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=643</guid>
		<description><![CDATA[I really hate null!

Reflect on that statement. Apparently Tim has a strong dislike for a concept found in lots of programming
languages (even brainiac languages like Haskell) and successfully used in millions of programs. He must be
crazy I wouldn&#8217;t like to have a discussion with him about something contentious like tabs versus spaces.



I understand that the [...]]]></description>
			<content:encoded><![CDATA[<p>I really hate <strong>null</strong>!</p>

<p>Reflect on that statement. Apparently Tim has a strong dislike for a concept found in lots of programming
languages (even brainiac languages like Haskell) and successfully used in millions of programs. He must be
crazy I wouldn&#8217;t like to have a discussion with him about something contentious like tabs versus spaces.</p>

<span id="more-643"></span>

<p>I understand that the world is an uncertain place and that programs need to represent uncertainty but I
don&#8217;t think <strong>null</strong> can be the correct way.</p>

<p>For example, consider this question: how many pets do you own? Is null a suitable answer?</p>

<p>Taking Java as the target for this polemic, which of <strong>Integer</strong> or <strong>int</strong>
is the best way to represent the number of pets you have?</p>

<p>Hint, one of those types can&#8217;t be null. Bonus question: what happens if you unbox an <strong>Integer</strong>
that holds a null value? Research topic: why does Java need boxed numeric types?</p>

<p>Some people have told me that they would use a type that can be null because they don&#8217;t know how many pets
that I own. Sorry, but I don&#8217;t think that is a good answer!</p>

<p>How would your program be different if you let zero be the default value for a quantity instead of null?</p>

<p>If their is uncertainty in a property would it be polite to let me know?</p>

<p>In order to guarantee that your program works correctly do you need to check that every
value is non-null before proceeding?</p>

<p>Is your code very hard to read and comprehend because of the huge number of null checks?</p>

<p>Or do you ignore that because all of your unit tests pass?</p>

<p>You do pass nulls into all your methods via your unit tests don&#8217;t you?</p>

<p>Moving on!</p>

<p>Now if we establish that Tim has 23 pets, how do we associate them with Tim?</p>

<p>I&#8217;d probably use <strong>List&lt;Pet&gt;</strong>.</p>

<p>Can a <strong>Collection</strong> contain nulls?</p>

<p>Can a collection be null?</p>

<p>Should you use null to represent an empty collection?</p>

<p>How would your program be different if you used <strong>Collections.emptyList()</strong> or
similar instead of null?</p>

<p>Should we stop programming and hope that JSR-305 makes it into the next version of the JDK?</p>

<p>Is their a refactoring in your IDE that will make Tim and his nulls go away?</p>

<p>Is it easier to press generate getters and setters in your IDE because no one you know would
pass in a null anyway at least not whilst you are still responsible for the project?</p>

<p>Do you think Tim is a wee bit upset about nulls in Java code?</p>

<p>Should he go and code in a more modern language?</p>

<p>Not necessarily! <a href="http://code.google.com/p/guava-libraries/">Guava</a> can be quite helpful.</p>

<p>You don&#8217;t need to use Scala to get an Option type, Guava has <a href="http://docs.guava-libraries.googlecode.com/git-history/v11.0.1/javadoc/com/google/common/base/Optional.html">Optional</a>
</p>

<p>Guava has immutable <a href="http://docs.guava-libraries.googlecode.com/git-history/v11.0.1/javadoc/com/google/common/collect/package-summary.html">collections</a>
that reject nulls.</p>

<p>Guava has handy <a href="http://code.google.com/p/guava-libraries/wiki/PreconditionsExplained">pre-conditions</a>
that you can use to prevent nulls entering at your constructors and setters.</p>

<p>Have I ranted for long enough about null?</p>

<p>Yes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2012/01/17/the-unreasoned-javan/feed</wfw:commentRss>
		</item>
		<item>
		<title>(Re-)adding a tray icon to Rhythmbox</title>
		<link>http://www.lshift.net/blog/2012/01/16/re-adding-a-tray-icon-to-rhythmbox</link>
		<comments>http://www.lshift.net/blog/2012/01/16/re-adding-a-tray-icon-to-rhythmbox#comments</comments>
		<pubDate>Mon, 16 Jan 2012 23:34:12 +0000</pubDate>
		<dc:creator>Tom Parker</dc:creator>
		
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=642</guid>
		<description><![CDATA[One of the features I used to particularly like about Rhythmbox was it&#8217;s ability to minimise to tray. This meant with a simple click of an icon I could briefly bring it up on my current workspace to play/pause and then hide it again. However, in the new 2.90.x releases, the upstream has decided to [...]]]></description>
			<content:encoded><![CDATA[<p>One of the features I used to particularly like about Rhythmbox was it&#8217;s ability to minimise to tray. This meant with a simple click of an icon I could briefly bring it up on my current workspace to play/pause and then hide it again. However, in the new 2.90.x releases, the upstream <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=654435">has decided to remove this</a>, which has been annoying me over the last few days.</p>

<p>Luckily, Rhythmbox has a nice plugin system, so I <a href="https://github.com/palfrey/rhythmbox-tray-icon">can fix that</a>. Having done this sort of thing a little bit before, but not for a while, I started at the <a href="https://live.gnome.org/RhythmboxPlugins/WritingGuide">plugin writing guide</a>. As it turns out, in the 2 weeks between starting writing this and doing this post, they&#8217;ve updated that guide a lot, but at the time 
it was very out of date. My major guide was in fact the <a href="https://github.com/owais/remember-the-rhythm/blob/master/src/remember-the-rhythm.py">&#8216;Remember the Rhythm&#8217;</a> plugin off of the <a href="https://live.gnome.org/RhythmboxPlugins/ThirdParty">3rd party plugins list</a>.
<span id="more-642"></span>
The top of <a href="https://github.com/palfrey/rhythmbox-tray-icon/blob/master/tray_icon.py">the plugin</a> may look a little unfamiliar to anyone whose done any Python Gtk programming before, and have many of you wondering what this &#8220;gi.repository&#8221; thing is. &#8220;gi&#8221; in this case stands for <a href="https://live.gnome.org/GObjectIntrospection">GObject Introspection</a>, a new and shiny way to interface between object-orientated C written with GObject (i.e. a lot of things in the Gnome project) and your favourite scripting language. More specifically, instead of writing language-specific bindings (although there can be language-specific extensions), a library writer can now just write for the GObject introspection  system and all of the fans of language Foo can just write bindings for the introspection stuff, and everyone gets to do less work. This isn&#8217;t quite introspection in the same way as most people know it, as the tool starts from scraping C header files (plus some custom overrides), but it&#8217;s still a nifty little item.</p>

<p>(Side note: GObject Introspection&#8217;s main claim to fame is the <a href="https://live.gnome.org/GnomeShell">Gnome Shell</a>, the major feature of Gnome 3, which uses GObject Introspection with the Javascript bindings heavily to do a lot of its work)</p>

<p>The other new item there would be the Peas system, which is a <a href="https://live.gnome.org/Libpeas">general-purpose plugin system for GObject-based systems</a>, leveraging the GObject Introspection stuff to let you write multi-language plugins (although Rhythmbox always had Python support).</p>

<p>Other useful items along the way were the <a href="http://python-gtk-3-tutorial.readthedocs.org/en/latest/index.html">Python Gtk+ 3 tutorial</a> for the basics, and the main <a href="http://developer.gnome.org/gtk3/stable/">Gtk+ manual</a> (written for the C interfaces, but you can kinda figure things out). You&#8217;re also going to need the <a href="http://developer.gnome.org/rhythmbox/unstable/index.html">Rhythmbox developer manual</a> and for when you want to do funky things like a <a href="https://github.com/palfrey/rhythmbox-tray-icon/issues/2">play/pause icon</a>, you&#8217;ll need the <a href="http://cairographics.org/documentation/pycairo/2/index.html">Python Cairo bindings</a> (or the relevant <a href="http://faq.pygtk.org/index.py?req=show&amp;file=faq08.018.htp">PyGtk FAQ entry</a> when you get stuck).</p>

<p>As always, there&#8217;s Debian packaging, and I&#8217;ve even added a <a href="https://github.com/downloads/palfrey/rhythmbox-tray-icon/rhythmbox-tray-icon_0.2-1_all.deb">downloadable prebuilt .deb</a> from the <a href="https://github.com/palfrey/rhythmbox-tray-icon">Github repo</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2012/01/16/re-adding-a-tray-icon-to-rhythmbox/feed</wfw:commentRss>
		</item>
		<item>
		<title>Unifying parts of structures</title>
		<link>http://www.lshift.net/blog/2012/01/16/unifying-parts-of-structures</link>
		<comments>http://www.lshift.net/blog/2012/01/16/unifying-parts-of-structures#comments</comments>
		<pubDate>Mon, 16 Jan 2012 22:05:59 +0000</pubDate>
		<dc:creator>Frank Shearar</dc:creator>
		
		<category><![CDATA[Smalltalk]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=644</guid>
		<description><![CDATA[Those with even a passing familiarity with Prolog should recognise statements like [H&#124;T] = [1,2,3]. In particular, = here is not &#8220;is equal to&#8221; but rather &#8220;unifies with&#8221;. So that statement causes the variable H to unify with 1, and T with the rest of the list, [2, 3].

Clojure&#8217;s abstract bindings provide much the same [...]]]></description>
			<content:encoded><![CDATA[<p>Those with even a passing familiarity with <a href="http://en.wikipedia.org/wiki/Prolog">Prolog</a> should recognise statements like <code>[H|T] = [1,2,3]</code>. In particular, <code>=</code> here is not &#8220;is equal to&#8221; but rather &#8220;unifies with&#8221;. So that statement causes the variable <code>H</code> to unify with <code>1</code>, and <code>T</code> with the rest of the list, <code>[2, 3]</code>.

<p>Clojure&#8217;s abstract bindings provide much the same capability - <code>(let [[h &#038; t] &#8216;(1 2 3)] &lt;do stuff&gt;)</code> - modulo the difference between pattern matching and unification, of course.</p>

<p>There&#8217;s a subtlety in something like <code>[H|T] = [1,2,3]</code>, at least, if your lists aren&#8217;t built of nested cons cells. Consider Smalltalk arrays. Suppose we have some <code>ListUnifier</code> that will rip a <code>SequenceableCollection</code>&#8217;s head off, like <code>#(1 2 3)</code>. We&#8217;d like the tail to unify with <code>#(2 3)</code> in other words. But that&#8217;s not a node in the original structure - it&#8217;s an entirely artificial node we wish to construct from the original collection. Firstly, how can we unify with only <em>part</em> of a structure and, secondly, how can we determine a solution from that partition?</p>

<p><span id="more-644"></span></p>

<p>Let&#8217;s try model the parts:</p>

<pre><code>    DestructuringUnifier subclass: #ListUnifier
        instanceVariableNames: 'head tail'
        classVariableNames: ''
        poolDictionaries: ''
        category: 'Unification-Destructuring'.

    ListUnifier >> head: anObject tail: anotherObject
        head := anObject.
        tail := anotherObject.

    ListUnifier class >> headNamed: headSymbol tailNamed: tailSymbol
        ^ self new head: headSymbol asVariable tail: tailSymbol asVariable.

    "Various helper constructors like #head:tailNamed:, #head:tail:, etc.
     elided for brevity."
</code></pre>

<p>Then we can write the original Prolog statement as</p>

<pre><code>    (ListUnifier headNamed: #x tailNamed: #y) =? #(1 2 3)
</code></pre>

<p>As mentioned above, first we want to be able to construct an equivalence relation on the above (or, expressed differently, partition the set of nodes in the structure together with the artificial nodes we create) such that <code>#x asVariable</code> and <code>1</code> are in the same class, and ditto for <code>#y asVariable</code> and <code>#(2 3)</code>.

<pre><code>    unificationClosureWith: anObject in: termRelation
        | h t partition |
        anObject isMetaVariable
            ifTrue: [^ termRelation union: self with: anObject].
        anObject isCollection
            ifFalse: [^ self failToUnifyWith: anObject].
        anObject isEmpty
            ifTrue: [^ self failToUnifyWith: anObject].

        h := head isCollection
            ifTrue: [anObject first: head size]
            ifFalse: [1].
        t := head isCollection
            ifTrue: [anObject allButFirst: head size]
            ifFalse: [anObject allButFirst].
        partition := head unificationClosureWith: h in: termRelation.
    ^ tail unificationClosureWith: t in: partition.
</code></pre>

<p>The mild complication around <code>head isCollection</code> lets us support a head that is itself a collection. So let&#8217;s check that we can construct a partition using parts of things:</p>

<pre><code>    | left right partition |
    left := (ListUnifier headNamed: #x tailNamed: #y).
    right := #(1 2 3).
    partition := VariableTrackingUnionFind
        usingArrayType: PersistentCollection
        partitioning: Dictionary new.
    partition := (partition find: left)
        unificationClosureWith: (partition find: right) in: partition.
    partition elementsOfClass: #x asVariable. "=> {1 . (#Variable #x)}"
    partition elementsOfClass: #y asVariable. "=> {#(2 3) . (#Variable #y)}"
</code></pre>

<p>We see that the partition that originally would only hold nodes in the structures may now hold <em>parts</em> of the original structure.</p>

<p>The original algorithm for determining the most general unifier from some partition as described in <a href="http://www.cs.bu.edu/~snyder/publications/UnifChapter.pdf">Baader &#038; Snyder (pp. 461-462)</a> runs the solution finder starting from the left operand in the unification. Consider the partition we have above. What elements are in the equivalence class of <code>ListUnifier</code>? Well, just the <code>ListUnifier</code> itself! Clearly we need to adjust the solution finder a bit. The obvious approach would be to start the solution-finding from an element in each class, and merge the partial solutions:</p>

<pre><code>    findSolutionFor: aVariableAvoidingUnionFind
    ^ aVariableAvoidingUnionFind
        inject: MostGeneralUnifier new
        into: [:mgu :node |
            mgu addAll: (self new
                findSolutionFor: aVariableAvoidingUnionFind
                starting: node)]
</code></pre>

<p>where <code>#addAll:</code> merges the various <code>MostGeneralUnifier</code>s generated and <code>#inject:into:</code> folds over the representative node in each equivalence class. (Remember, a union-find always has a representative for each equivalence class, namely, <code>myPartition find: someObject</code>.) And it works, at the cost of turning a linear algorithm into a (worst case) quadratic one:</p>

<pre><code>    | left right |
    left := (ListUnifier headNamed: #x tailNamed: #y).
    right := #(1 2 3).

    left =? right "=> MostGeneralUnifier((#Variable #x)->1 (#Variable #y)->#(2 3) )"
</code></pre>

<p>But &#8220;finding a solution&#8221; really means &#8220;to what must we assign <em>each variable</em>?&#8221;. So we can at least speed things up by only solution-finding in those classes in which variables occur:</p>

<pre><code>    findSolutionFor2: aVariableAvoidingUnionFind
    ^ aVariableAvoidingUnionFind variableContainingClasses
        inject: MostGeneralUnifier new
        into: [:mgu :node |
            mgu addAll: (self new
                findSolutionFor: aVariableAvoidingUnionFind
                starting: node)]
</code></pre>

<p>This makes finding a solution O(NM), where N is the number of nodes in the structure, M the number of classes containing variables.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2012/01/16/unifying-parts-of-structures/feed</wfw:commentRss>
		</item>
		<item>
		<title>Translating a persistent union-find from ML to Smalltalk</title>
		<link>http://www.lshift.net/blog/2011/12/31/translating-a-persistent-union-find-from-ml-to-smalltalk</link>
		<comments>http://www.lshift.net/blog/2011/12/31/translating-a-persistent-union-find-from-ml-to-smalltalk#comments</comments>
		<pubDate>Sat, 31 Dec 2011 23:24:35 +0000</pubDate>
		<dc:creator>Frank Shearar</dc:creator>
		
		<category><![CDATA[Smalltalk]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=641</guid>
		<description><![CDATA[When I wrote my unification library a while back, I tried to add an &#8220;or matcher&#8221;. That is, something that would allow 

    &#124; matcher mgu &#124;
    matcher := OrUnifier
        left: (TreeNode left: #x asVariable)
        [...]]]></description>
			<content:encoded><![CDATA[<p>When I wrote my <a href="http://www.squeaksource.com/Nutcracker/">unification library</a> a while back, I tried to add an &#8220;or matcher&#8221;. That is, something that would allow 

<pre><code>    | matcher mgu |
    matcher := OrUnifier
        left: (TreeNode left: #x asVariable)
        right: (TreeNode right: #x asVariable).

    mgu := matcher =? (TreeNode left: (Leaf value: 1)).
    mgu at: (#x asVariable) "=> (Leaf value: 1)".

    mgu := matcher =? (TreeNode right: (Leaf value: 1)).
    mgu at: (#x asVariable) "=> (Leaf value: 1)".
</code></pre>

<p>Easy enough&#8230; until one tries to use an OrUnifier as an operand on the right hand side. See, as the unification progresses, if the first option fails, you&#8217;d like to backtrack part of the equivalence relation calculation, and with the imperative union-find in Nutcracker that&#8217;s not possible. What to do, what to do?</p>

<p><span id="more-641"></span></p>

<p>The standard solution is to reach into one&#8217;s toolbox of functional data structures. Sadly, noone knows (as far as I can see, at least) how to implement a functional union-find. At least, not an efficient one. However, <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.8494">Conchon and Filliâtre</a> tell us how to implement a persistent union-find.</p>

<p>The implementation - originally in ML - uses a persistent array to get its rollbackability. Further, it uses &#8220;rerooting&#8221;, a trick Henry Baker <a href="http://www.pipeline.com/~hbaker1/ShallowBinding.html">wrote up</a>, to improve efficiency. I couldn&#8217;t improve on the pictures Conchon and Filliâtre use to illustrate rerooting, so I won&#8217;t try, and just <a href="http://research.microsoft.com/~crusso/ml2007/slides/puf-wml07-slides.pdf">point you to their artwork</a>, pp. 18-27. The structure makes massive use of side effects, but presents an apparently purely function API. First, the basic data structure:</p>

<pre><code>    type 'a t = 'a data ref
    and 'a data =
        | Arr of 'a array
        | Diff of int * 'a * 'a t
</code></pre>

<p>Note the <code>ref</code> there - it&#8217;s an updatable reference to something. Since everything&#8217;s mutable by default in Smaltalk, I tried ignoring the <code>ref</code> and just translate things. However, I quickly ran into difficulties. The &#8220;massive use of (hidden) side effects&#8221; quickly bit me (see section 4 of the paper), as I attempted to translate the following:</p>

<pre><code>    let set t i v = match !t with
        | Arr a as n ->
            let old = a.(i) in
            a.(i) <- v
            let res = ref n in
            t := Diff (i, old, res);
            res
        | Diff _ ->
            ref (Diff (i, v, t))
</Code></pre>

<p>It all looks quite simple. But look at <code>t := Diff (i, old, res)</code>. Ignoring it at first, the obvious translation would be (ignoring noise like class declarations):</p>

<pre><code>    Diff >> set: index to: anObject
        ^ Diff index: index value: v in: self

    Arr >> set: index to: anObject
        | old res ref |
        old := a at: i.
        a at: i put: v.
        res := self.
        self become: (Diff index: i value: v in: self)
        ^ self.
</code></pre>

<p>Did you feel a shiver there? <code>#become:</code> is deep Smalltalk magic. Its operation is simple enough: it swaps two object pointers. Here, we will <em>change self to a new object</em>. Oh, did I say &#8220;swap two object pointers&#8221;? I meant to say &#8220;swap two object pointers throughout the entire image&#8221;. Deep, dangerous magic indeed. And I thought, as I tried to figure out what was going on, that there had to be an easier way. What if I modelled the <code>ref</code> itself?</p>

<pre><code>    Object subclass: #Ref
        instanceVariableNames: 'value'
        classVariableNames: ''
        poolDictionaries: ''
        category: 'PersistentUnionFind'.

    Ref class >> wrapping: anObject
        ^ self new wrapping: anObject.

    Ref >> value
        ^ value.

    Ref >> value: anObject
        value := anObject.

    Ref >> wrapping: anObject
        value := anObject.
</code></pre>

<p>Now you can have an immutable reference to something that, itself, may change to what it points. (Yes, that sounds a lot like &#8220;well done, you&#8217;ve invented a pointer!&#8221;) With that in hand, let&#8217;s hide the gory bits - <code>Arr</code> and <code>Diff</code> - behind a nice clean <code>PersistentCollection</code> interface:</p>

<pre><code>    Ref subclass: #PersistentCollection
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'PersistentUnionFind'.

    PersistentCollection >> at: index put: anObject
        t := value. "The equivalent of !t"
        ^ t isDiff
            ifTrue: [Diff index: i value: anObject in: self]
            ifFalse: [ | old |
            old := t array at: index.
            t array at: index put: anObject.
                self value: (Diff index: i value: old in: self)]
</code></pre>

<p>which doesn&#8217;t look too bad, in comparison to the original!</p>

<p>The code&#8217;s published at <a href="http://www.squeaksource.com/Nutcracker/">SqueakSource</a>, in the PersistentUnionFind package:</p>

<pre><code>    Installer ss
      project: 'Nutcracker';
      install: 'PersistentUnionFind'.
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2011/12/31/translating-a-persistent-union-find-from-ml-to-smalltalk/feed</wfw:commentRss>
		</item>
		<item>
		<title>Finding new albums by old bands</title>
		<link>http://www.lshift.net/blog/2011/12/29/finding-new-albums-by-old-bands</link>
		<comments>http://www.lshift.net/blog/2011/12/29/finding-new-albums-by-old-bands#comments</comments>
		<pubDate>Thu, 29 Dec 2011 12:13:47 +0000</pubDate>
		<dc:creator>Tom Parker</dc:creator>
		
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=638</guid>
		<description><![CDATA[I&#8217;m once again visting music-related problems, with a look at a different aspect of the &#8220;discovering new music&#8221; problem. Now, the LShift jukebox is very good at introducing me to new and weird artists, and it also occasionally tells me about other work by artists I&#8217;m already aware of, but this is all rather haphazard.

The [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m <a href="http://www.lshift.net/blog/2011/02/28/data-visualisation-how-weird-is-our-jukebox">once again</a> visting music-related problems, with a look at a different aspect of the &#8220;discovering new music&#8221; problem. Now, the LShift jukebox is very good at introducing me to new and weird artists, and it also occasionally tells me about other work by artists I&#8217;m already aware of, but this is all rather haphazard.</p>

<p>The core problem I&#8217;m trying to address is as follows: you find a new artist, and you really like them and buy all their CDs. Problem is, is that there&#8217;s nothing notifying me when they do a new release, and given new releases are a relatively rare event, I forget to check&#8230; essentially I want <a href="http://www.songkick.com/">Songkick</a>, but for CDs (or MP3 albums, but the principle still applies).<span id="more-638"></span></p>

<p>Enter <a href="http://github.com/palfrey/missing-albums">Missing Albums</a>. It works as follows:  </p>

<ul>
    <li>Trawls through your existing music collection, and finds all the bands you&#8217;ve got at least 4 tracks of (which tends to eliminate most of the uninteresting bands), and gets an album list for each based on the tags of those files (presence of any track from an album is assumed to imply knowledge of that album)
</li>
    <li>Grabs the <a href="http://musicbrainz.org/">Musicbrainz</a> data for those bands, and establishes a list of albums for each.
</li>
    <li>Find the dates for each existing album for a band, determine the newest one, and get the list of all albums for that band after that date.
</li>
    <li>Grab Amazon price data and album cover for each newer album and spit out a list of albums in reverse chronological order. <a href="http://genshi.edgewall.org/">Genshi</a> is used to give this nice formatting.
</li>
</ul>

<p>Results look like this:<br />
<a href='http://www.lshift.net/blog/wp-content/uploads/2011/12/missing-albums.png'><img src="http://www.lshift.net/blog/wp-content/uploads/2011/12/missing-albums-300x270.png" alt="" title="missing-albums" width="300" height="270" class="aligncenter size-medium wp-image-640" /></a><br /></p>

<p>The tool manages to solve my core problem i.e. finding out about new albums, but it&#8217;s still got a number of flaws:</p>

<ul>
    <li>Musicbrainz is quite slow, and so this is a batch processed command-line app for the moment.
</li>
    <li>The only identifier we&#8217;ve got for a band is it&#8217;s name, and that&#8217;s non-unique in a large enough set of cases. Brian Whitman&#8217;s <a href="http://notes.variogr.am/post/10733372290/music-resolving-facebook">excellent article</a> on the issues with Facebook&#8217;s music ID system illustrates this well, and I&#8217;ve got some pretty nasty cases just in my collection (e.g. I have both tracks from the Belgian rock band and Trance artists called &#8220;Deus&#8221;, and there&#8217;s another 2 entries for it <a href="http://musicbrainz.org/search?query=deus&#038;type=artist">in Musicbrainz</a>; there are <a href="http://musicbrainz.org/search?query=muse&#038;type=artist">5 entries all called &#8220;Muse&#8221;</a> and <a href="http://musicbrainz.org/search?query=james&#038;type=artist">7 called James</a>). In this tool, I&#8217;ve hacked around the issue by ignoring anyone with no albums listed on Amazon, which tends to thin things down to just the big commercial bands.
</li>
    <li>My artist tagging scheme mostly lists bands starting with the word &#8220;The&#8221; as &#8220;Foo, The&#8221; rather than &#8220;The Foo&#8221;, but telling that both are the same is again a bit of a problem due to the same disambiguation problem</li>
<li>Sometimes a band isn&#8217;t present at all in Musicbrainz, or there&#8217;s no Amazon data for an album. I&#8217;ve solved this by either a) adding the data into Musicbrainz, or b) adding artists to an &#8220;ignore&#8221; list. The missing-albums.py script takes as one of it&#8217;s arguments an &#8220;overrides&#8221; file which is an ini-style file containing potentially two sections; first is &#8220;artist&#8221;, which rewrites any artist names I&#8217;ve drastically misspelt and the second is &#8220;ignore&#8221; which is a list of artists to skip (e.g. my meta-artist name of &#8220;Theme&#8221; which I&#8217;ve marked most of the TV/film tracks I&#8217;ve got)</li>
</ul>

<p>On the other hand, all these issues aren&#8217;t too awful for a command-line tool. If I was trying to make a slick web interface for this, they&#8217;d be major problems, but I&#8217;m willing to skip the odd dodgy result as the results are very usable already, and have resulted in some purchasing already.</p>

<p><b>TODO</b></p>

<ul>
<li><a href="https://github.com/palfrey/missing-albums/issues/1">Work with last.fm data instead of trawling local music files</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2011/12/29/finding-new-albums-by-old-bands/feed</wfw:commentRss>
		</item>
		<item>
		<title>Benchmarking simple JSON generation in Java</title>
		<link>http://www.lshift.net/blog/2011/12/28/benchmarking-simple-json-generation-in-java</link>
		<comments>http://www.lshift.net/blog/2011/12/28/benchmarking-simple-json-generation-in-java#comments</comments>
		<pubDate>Wed, 28 Dec 2011 22:25:27 +0000</pubDate>
		<dc:creator>tim</dc:creator>
		
		<category><![CDATA[Java]]></category>

		<category><![CDATA[Programming]]></category>

		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=639</guid>
		<description><![CDATA[What is the fastest way to produce JSON output in Java? Well if you have a complicated object
tree to turn into JSON I would guess it is probaby Jackson.
However, not all JSON output is complicated so maybe we can find quicker and simpler alternatives.

My test class is simple, I call him Thing, he has two [...]]]></description>
			<content:encoded><![CDATA[<p>What is the fastest way to produce JSON output in Java? Well if you have a complicated object
tree to turn into JSON I would guess it is probaby <a href="http://jackson.codehaus.org/">Jackson</a>.
However, not all JSON output is complicated so maybe we can find quicker and simpler alternatives.</p>

<p>My test class is simple, I call him Thing, he has two fields name and content, he isn&#8217;t a Java bean
because he doesn&#8217;t need to be (maybe your classes don&#8217;t need to be Java beans either!), here he is:</p>

<code>
<pre>
public class Thing {
    public Thing(String name, String content)   {
        this.name = name;
        this.content = content;
    }

    final public String name, content;
}
</pre>
</code>

<p>We will use <a href="http://labs.carrotsearch.com/junit-benchmarks.html">JUnitBenchmarks</a> to test
my theory that you can be simpler and faster than Jackson. JUnitBenchmarks allows unit tests to be run
multiple times and measurements taken, it also allows the code to be warmed up so any JIT compilation
should have been carried out before measurements are taken. I&#8217;ve set my tests to run a warmup period of
50 iterations followed by 1000 measurement iterations. The Jackson code being tested looks like this:</p>

<code>
<pre>
public class JacksonStreamingSerialiser implements Serialiser {

    public String toJson(Thing thing) {
        StringWriter out = new StringWriter();
        try {
            JsonGenerator generator = jsonFactory.createJsonGenerator(out);
            generator.writeStartObject();
            generator.writeStringField("name", thing.name);
            generator.writeStringField("content", thing.content);
            generator.writeEndObject();
            generator.close();
        }
        catch (IOException e) {
            e.printStackTrace();
        }

        return out.toString();
    }

    private final JsonFactory jsonFactory = new JsonFactory();
}
</pre>
</code>

<p>When tested we get a mean measurement for writing 250 objects using Jackson of 0.95 seconds</p>

<p><a href="http://json-lib.sourceforge.net/">json-lib</a> is usually a worse performer than Jackson.
Here is the equivalent code using json-lib:
</p>

<p><code></p>

<pre>
public class JsonLibSerialiser implements Serialiser {
    public String toJson(Thing thing) {
        JSONObject object = new JSONObject();
        object.put("name", thing.name);
        object.put("content", thing.content);

        return object.toString();
    }
}
</pre>

<p></code></p>

<p>When tested we get a mean measurement for writing 250 objects using json-lib of 10.74 seconds. Not so good!</p>

<p>Who needs library code? Maybe that new fangled String.format will be quicker and simpler. Here is the code:</p>

<p><code></p>

<pre>
public class StringFormatSerialiser implements Serialiser {
    public String toJson(Thing thing) {
        return String.format("{\"name\":\"%s\",\"content\":\"%s\"}", thing.name, thing.content);
    }
}
</pre>

<p></code></p>

<p>When tested we get a mean measurement for writing 250 objects using String.format of 5.26 seconds. Better than
json-lib but Jackson isn&#8217;t looking worried!</p>

<p>I guess that format string must be expensive to parse so lets try a StringBuilder. Here is the code:</p>

<p><code></p>

<pre>
public class StringBuilderSerialiser implements Serialiser {
    public String toJson(Thing thing) {
        StringBuilder builder = new StringBuilder();
        builder.append("{\"name\":\"").append(thing.name).append("\",\"content\":\"").append(thing.content).append("\"}");

        return builder.toString();
    }
}
</pre>

<p></code></p>

<p>When tested we get a mean measurement for writing 250 objects using StringBuilder of 0.91 seconds. Finally a
winner, faster than Jackson but you had better quote those strings properly!</p>

<p>People have always told me that StringBuilder is unsyncronised so should be faster than an old fashioned
StringBuffer so lets check. Here is the code:</p>

<p><code></p>

<pre>
public class StringBufferSerialiser implements Serialiser {
    public String toJson(Thing thing) {
        StringBuffer buffer = new StringBuffer();
        buffer.append("{\"name\":\"").append(thing.name).append("\",\"content\":\"").append(thing.content).append("\"}");

        return buffer.toString();
    }
}
</pre>

<p></code></p>

<p>When tested we get a mean measurement for writing 250 objects using StringBuffer of 0.60 seconds. Hang on!
That is the fastest yet! What is going on?</p>

<p>What is going on is that I have been playing fast and loose with statistics by only presenting the mean
times in seconds for each benchmark. By extracting the raw data and applying the power of statistics (well
I looked at the distributions and standard deviations) it turns out you cannot tell the difference between
the StringBuilder and the StringBuffer, so all is well, and String[Builder|Buffer] are both winners! Jackson
is also a winner since for more complex object trees it will allow you to write more maintainable code than
using a StringBuilder combined with loops and conditional logic and is almost as fast as a StringBuilder
(or StringBuffer).</p>

<p>So what have we learnt? Firstly, use Jackson if you have to serialise your objects into JSON. Secondly,
JUnitBenchmarks is a very handy library. Thirdly, if you don&#8217;t present a standard deviation with your benchmark
results then your results may not mean what you think they mean.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2011/12/28/benchmarking-simple-json-generation-in-java/feed</wfw:commentRss>
		</item>
		<item>
		<title>Implementing maps and folds using Zippers</title>
		<link>http://www.lshift.net/blog/2011/12/23/implementing-maps-and-folds-using-zippers</link>
		<comments>http://www.lshift.net/blog/2011/12/23/implementing-maps-and-folds-using-zippers#comments</comments>
		<pubDate>Fri, 23 Dec 2011 14:41:19 +0000</pubDate>
		<dc:creator>Frank Shearar</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=637</guid>
		<description><![CDATA[Zippers continue to fascinate me. Let&#8217;s recap a bit: a zipper is a data structure representing the navigation and mutation of an otherwise immutable structure. We&#8217;ve looked at several different implementations here on the LShift blog, in several different languages.

Today, we&#8217;re going to see what else we can do with zippers.

When we implement a Huet-style [...]]]></description>
			<content:encoded><![CDATA[<p>Zippers continue to fascinate me. Let&#8217;s recap a bit: a zipper is a data structure representing the navigation and mutation of an otherwise immutable structure. We&#8217;ve looked at several different implementations here on the LShift blog, in several different languages.</p>

<p>Today, we&#8217;re going to see what else we can do with zippers.</p>
<span id="more-637"></span>
<p>When we implement a Huet-style zipper (in other words, we use a recursive structure based on one-hole contexts instead of directly using partial continuations), we implement a bunch of primitive behaviours - left, right, replace, insert child, and so on. Let&#8217;s assume we <a href="https://github.com/frankshearar/zipr">have all that</a>.</p>

<p>The first obvious thing to do is to define a traversal. We need two essential elements for a traversal - a means of knowing when we&#8217;re done, and a means of moving to the next element. Now there&#8217;s a teeny hitch. Given a zipper over some hierarchical structure, it seems most natural to describe our traversal in terms of recursion - after all, we&#8217;re working with a recursive structure. Ruby doesn&#8217;t require proper tail call elimination (even though, apparently, some implementations support it). We don&#8217;t want stack overflows from too-deep recursion, so we&#8217;ll reach into our toolbox and haul out a <em>trampoline</em>:</p>

<pre><code>
    def trampoline(initial_value, &amp;unary_block)
      result = unary_block.call(initial_value)
      while result.kind_of?(Proc) do
        result = unary_block.call(result.call)
      end
      result
    end
</code></pre>

<p>A trampoline will take some block, and invoke it with the given initial value. If the block yields a <code>Proc</code> (what the Lispers would call a <em>thunk</em> - a parameterless <code>Proc</code> that delays the evaluation of some chunk of code), it will invoke the block with this <code>Proc</code>, and repeat that process until the block yields something that&#8217;s not a <code>Proc</code>. Finally, it will return that non-<code>Proc</code> value. With tool in hand, we can now write a recursive pre-order traversal, confident that we won&#8217;t blow our stack even on large structures. Note that this particular zipper uses &#8220;safe&#8221; navigation, wrapping up results in the <a href="https://github.com/frankshearar/zipr/blob/master/lib/zipr/either.rb">Either monad</a>.</p>

<pre><code>
    # Perform a pre-order traversal, that is "this node, then a pre-order
    # traversal of my children, left to right".
    def next
      if not has_next? then
        return @zipper
      end

      if @zipper.branch?(@zipper.value) then
        return @zipper.down
      end

      right_sibling = @zipper.safe_right
      if right_sibling.right? then
        return right_sibling.value
      end

      # Return a Zipper if there's a next, with a distinguishable context
      # for the last element of the traversal.
      # This algorithm returns a thunk when it wishes to recurse.
      # The trampoline converts this CPS-like algorithm into one
      # that runs in constant space.
      trampoline(@zipper) { |z|
        parent = z.safe_up
        parent.either(->parent_z{
                        uncle = parent_z.safe_right
                        uncle.either(->z{ next z},
                                     ->unused_error{ next ->{parent_z}}) # Recur
                      },
                      ->unused_error{
                        # We've popped up the structure all the way to the root node.
                        z.new_zipper(z.value, EndOfTraversalContext.new(z.context))
                      })
      }
    end
</code></pre>

<p>It&#8217;s easy to wrap up <code>next</code> inside some object, and simply invoke the entire traversal with <code>each</code>:</p>

<pre><code>
    it "should process all nodes in a pre-order fashion" do
      tree = Tree.new(1, [Tree.new(2, [Tree.new(3, [])]),
                          Tree.new(4, [Tree.new(5, []),
                                       Tree.new(6, [])])])
      t = PreOrderTraversal.new(tree.zipper)
      answers = []
      t.each { |value|
        answers << value.value
      }
      answers.should == [1, 2, 3, 4, 5, 6]
    end
</code></pre>

<p>where <code>each</code> is defined thusly:</p>

<pre><code>
    def each(&#038;block)
      map { |node|
        block.call(node)
        node
      }
    end

    # Return a same-shaped structure with the relevant mapping performed on
    # each node.
    def map(&#038;unary_block)
      # It's ridiculous to store the previous zipper to avoid a one-past-the-end
      # error. It works, and it's simple, but it's _ugly_.
      prev = @zipper
      while has_next? do
        @zipper = @zipper.replace(unary_block.call(@zipper.value))
        prev = @zipper
        @zipper = self.next
      end
      prev
    end
</code></pre>

<p>We could just mix in <code>Enumerable</code> and get <code>map</code> and a whole lot more For Free&#8230; but <code>Enumerable#map</code> returns an <code>Array</code>, and I want <code>map</code> to return something of whatever type it was fed.</p>

<p>Of course there&#8217;s nothing stopping us from implementing other traversals - post-order, in-order (for binary trees), or even breadth-first. What&#8217;s nice is that none of these traversal strategies have any knowledge about the structure over which they move! They just know about the zipper&#8230; and the <em>zipper</em> knows nothing about the structure, except
  <ul>
    <li>Can this node have children?</li>
    <li>What are this node&#8217;s children?</li>
    <li>Given a parent node and children nodes, how do I make a new node?</li>
  </ul>
</p>

<p>With a traversal in hand, we&#8217;re free to jump into the land of higher order functions - map, fold, and the like. Now of course, as we just saw, these traversals know nothing about the underlying structure. That means that while we can implement a fold -</p>

<pre><code>
    # Collapse some structure into some kind of value using an initial value,
    # and a binary block taking the thus-far-computed value (accumulator) and
    # the current node.
    def fold(initial_value, &amp;binary_block)
      accumulator = initial_value
      each { |node|
        accumulator = binary_block.call(accumulator, @zipper.value)
      }
      accumulator
    end
</code></pre>

<p>- we will obviously have to tell the fold what to do. Actually, this is no different from your usual fold in Common Lisp or Smalltalk or Haskell. So given some kind of tagged tree with <code>Node</code> and <code>Leaf</code> elements, we could have</p>

<pre><code>
    it "should permit the folding of a structure according to a given block" do
      t = Node.new(:root, [Node.new(:left_subchild, [Leaf.new(1)]), Leaf.new(2)])
      PreOrderTraversal.new(t.zipper).fold(0) { |sum, node|
        sum + case node
                when Node then 0
                when Leaf then node.value
              end
      }.should == 3
    end
</code></pre>

<p>So here we see how to apply all manner of transformations - mapping a tree to another tree of the same shape, folding a tree, insert our fondest desire - with the core mechanisms remaining firmly decoupled from our types.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2011/12/23/implementing-maps-and-folds-using-zippers/feed</wfw:commentRss>
		</item>
		<item>
		<title>Randomly testing Ruby</title>
		<link>http://www.lshift.net/blog/2011/11/26/randomly-testing-ruby</link>
		<comments>http://www.lshift.net/blog/2011/11/26/randomly-testing-ruby#comments</comments>
		<pubDate>Sat, 26 Nov 2011 20:57:28 +0000</pubDate>
		<dc:creator>Frank Shearar</dc:creator>
		
		<category><![CDATA[Ruby]]></category>

		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=635</guid>
		<description><![CDATA[I recently ran into the need for testing the behaviour of a parser of a modelling language. The parser processes a number of model descriptions in gem files, as well as local definitions. Until recently, the parser would process the gems in an arbitrary order. However the language while ostensibly declarative, isn&#8217;t, because of a [...]]]></description>
			<content:encoded><![CDATA[<p>I recently ran into the need for testing the behaviour of a parser of a modelling language. The parser processes a number of model descriptions in gem files, as well as local definitions. Until recently, the parser would process the gems in an arbitrary order. However the language while ostensibly declarative, isn&#8217;t, because of a huge restructuring of the parser. As a result, some bugs lurk in the changed code. These bugs, because of the arbitrary processing order of the model gems, manifest on some machines and not others. Forcing an ordering on the gem processing masks the underlying issues, even while letting the users of the parser to get on with their lives.</p>

<p>What to do? Random testing to the rescue! I cast around for Ruby ports of QuickCheck, and found two: <a href="https://github.com/IKEGAMIDaisuke/rushcheck">rushcheck</a> and <a href="https://github.com/hayeah/rantly">rantly</a>. Rushcheck hasn&#8217;t been wrapped up in a gem, so I decided to take rantly for a spin.</p>

<p><span id="more-635"></span></p>

<p>The basic structure of a rantly test is:</p>

<pre><code>    property_of {
      my_special_generator
    }.check { |generator, values|
      my_special_property(generator, values)
    }
</code></pre>

<p>In other words, <code>property_of</code> generates some data, and the results are fed into <code>check</code>, where you describe the property you&#8217;re checking.</p>

<p>So, for example, a full spec might be</p>

<pre><code>    class Rantly
      def peano(limit = nil)
        limit = 0..Peano::MAX_INT if limit.nil?
        Peano.from_i(integer(limit))
      end
    end

    module Peano
      describe "Peano" do
        it "should 0 < n" do
          property_of {
            peano(1..100)
          }.check { |n|
            Peano.zero.should < n
          }
        end
      end
    end
</code></pre>

<p>This example comes from a <a href="https://github.com/frankshearar/peano">basic Peano number library</a> I wrote for the purposes of playing with rantly. (I&#8217;m really not kidding about &#8220;basic&#8221;: I&#8217;ve intentionally limited the Peano numbers to the range [0, 1000] because I&#8217;m lazy.</p>

<p>Rantly supports Test::Unit, so you can always just subclass <code>Test::Unit::TestCase</code> and write your test as per normal with <code>assert_equals</code> and friends. Since I like RSpec I had to add a little helper:</p>

<pre><code>    class RSpec::Core::ExampleGroup
      def property_of(&#038;block)
        Rantly::Property.new(block)
      end
    end
</code></pre>

<p>rantly supplies a number of basic generators - integers, ranged integers, strings, booleans, chars, etc. - as well as various combinators, and scoped settings. Need an array of between 3 and 5 Peano numbers? No problem:</p>

<pre><code>    # When you want to share the size between multiple generators ...
    sized(integer(3..5)) {
      array { peano }
    }

    # ... or when you don't need to.
    array(integer(3..5)) { peano }
</code></pre>

<p>rantly doesn&#8217;t use a polymorphic method for data generation, unlike QuickCheck&#8217;s use of typeclasses. It&#8217;s hardly difficult, of course, to roll your own such thing:</p>

<pre><code>    class PNumber
      def self.generator
        Peano.from_i(Rantly.integer(0..Peano::MAX_INT))
      end
    end
</code></pre>

<p>Most importantly though, a test framework&#8217;s only as good as its output. If your assertions don&#8217;t result in decent error messages, you might as well not bother. So let&#8217;s say we have defined <code>:==</code> and <code>:&lt;</code> for <code>PNumber</code>s. We would obviously also like <code>:&gt;</code>. We write up a property (first!):</p>

<pre><code>    it "should succ(n) > n" do
      property_of {
        PNumber.generator(0..3)
      }.check {|n|
        n.succ.should > n
      }
    end
</code></pre>

<p>When we run <code>rake test</code> we see:</p>
<pre>    .
    failure: 0 tests, on:
    #<Zero>
    F

    Failures:

      1) Peano should succ(n) > n
         Failure/Error: n.succ.should > n
         NoMethodError:
           undefined method `>&#8217; for #<Succ #<Zero>>
         # ./test/peano_test.rb:96:in `block (3 levels) in <module:Peano>&#8216;
         # ./test/peano_test.rb:93:in `block (2 levels) in <module:Peano>&#8216;
</pre>

<p>We see a decent error message in the final output thanks to RSpec. Just as important - given that this is a test framework using <em>random</em> data - we see a counterexample. Subsequent runs, in this case, would give us different counterexamples. (&#8221;In this case&#8221; because, since we haven&#8217;t defined <code>:&gt;</code> yet, every example is a counterexample!)</p>

<p>Let&#8217;s play around a bit, and half-implement <code>:&gt;</code>:</p>

<pre><code>    def > (peano)
      if peano.to_i.even? then
        not (self < peano) and not (self == peano)
      else
        false
      end
    end
</code></pre>

<p>Our output then looks like this:</p>

<pre>    .
    failure: 0 tests, on:
    #<Succ #<Succ #<Succ #<Zero>>>>
    F

    Failures:

      1) Peano should succ(n) > n
         Failure/Error: n.succ.should > n
           expected: > #<Succ #<Succ #<Succ #<Zero>>>>
                got:   #<Succ #<Succ #<Succ #<Succ #<Zero>>>>>
           Diff:
           @@ -1,2 +1,2 @@
           -#<Succ #<Succ #<Succ #<Zero>>>>
           +#<Succ #<Succ #<Succ #<Succ #<Zero>>>>>
         # ./test/peano_test.rb:96:in `block (3 levels) in <module:Peano>&#8216;
         # ./test/peano_test.rb:93:in `block (2 levels) in <module:Peano>&#8216;
</pre>

<p>Or, &#8220;gosh darn, <code>:&gt;</code> is broken for odd numbers!&#8221;</p>

<p>A final note: rantly has some dependencies, but fortunately not that many: rake (naturally), technicalpickles-jeweler, yaml.</p>

<p>In summary, rantly is a simple, easy to use random data generator that works nicely with Test::Unit and is easily extended to use RSpec. It comes with basic generators, and it&#8217;s easy to extend the generator support to arbitrary structures.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2011/11/26/randomly-testing-ruby/feed</wfw:commentRss>
		</item>
	</channel>
</rss>

