technology from back to front

Clojure to Smalltalk translation notes

Clojure has an interesting implementation of a Huet-style zipper. I started translating it to Smalltalk, and in the process discovered a number of things not really related to zippers. Given that the end result ends up looking very similar to something we’ve already seen , let’s talk more about the translation process itself.

1. Abstract binding

Destructuring is a way of dividing up some kind of structure. In Prolog one might say [H|T] = [1,2,3] to split up a list into a head H (unifying with 1) and a tail T (unifying with [2,3]). Destructuring’s at the core of Scala’s pattern matching (through the use of the Extractor pattern and some compiler magic). Clojure has a number of very handy destructuring tools. Why do we care about handy destructuring? Consider the following two snippets of code:

    (let [[c & cnext :as cs] (children loc)] ...)

Which in Smalltalk becomes the more verbose, and less readable:

    | c cnext cs newPath |
    cs := self children: node.
    c := cs first.
    cnext := cs allButFirst.
    ...

When one takes into account the fact that Clojure functions quite happily consume nil, and how empty lists behave, a precise duplication of the let becomes something as… inelegant as:

    | c cnext cs newPath |
    cs := self children: node.
    cs notNil and: [cs notEmpty]
        ifTrue: [
            c := cs first.
            cnext := cs allButFirst]
        ifFalse: [
            c := nil.
            cnext := nil]
    ...

But perhaps there’s a way to leverage unification to bring the same kind of brevity to Smalltalk…

2. Pluggable behaviours to traverse arbitrary hierarchical structures

zip.clj attaches the zipper data – how one got to a node, what’s changed during the traversal, examining the target data structure – to the nodes of that structure with Clojure’s metadata. We pass in details specific to the structure as lambdas. We only need three things from the structure:

  1. Does this node have children? (or: is this node a branching point?)
  2. What are this node’s children?
  3. How can we construct a new node?

If we revisit the TreeZipper, we see the same three questions answered in a different way:

  1. This node has children if focus children notEmpty is true.
  2. The node’s children are returned by focus children.
  3. We construct a new node by invoking TreeZipper >> #newFocusOn:children:.

To put it another way: we accomplish the same thing – abstracting the details of a structure – by passing in lambdas in Clojure, and by subclassing and overriding in Smalltalk.

3. Dictionary: world’s most useful data structure?

A colleague of mine reckons that a dictionary’s the world’s most useful structure. After looking at zip.clj I think I see why: zip.clj’s dictionaries map names to values in the same manner an object does… except you don’t need to define a class before you start playing around, and it’s trivial to add new fields. (There’s a trade-off here: using a dictionary means never having to add yet another property to your data object – rapid prototyping – while using a data object means never having to trawl source code wondering what properties you’re using – self-documenting source.)

One of the really neat features of a Clojure dictionary is the ease with which one accesses values. Want the right context of your zipper’s current node? All you need is (:r (loc 1)). If there’s no such key, you get nil.

It’s possible, with a caveat, to duplicate a Clojure-like dictionary in Smalltalk. In particular, we want something such that

    CljDictTest >> testArbitraryNamesReturnNil
        self assert: nil equals: CljDict new foo.

    CljDictTest >> testArbitraryNamesTurnIntoMaps
        | dict |
        dict := CljDict new.
        dict foo: 1.
        self assert: 1 equals: dict foo.

And this is all you need:

    Object subclass: #CljDict
        instanceVariableNames: 'dict'

    CljDict >> initialize
        dict := Dictionary new.

    CljDict >> doesNotUnderstand: aMessage
        ^ aMessage numArgs
            caseOf: {
                [0] -> [self get: aMessage selector].
                [1] -> [self set: aMessage selector to: aMessage arguments first].}
            otherwise: [super doesNotUnderstand: aMessage].

    CljDict >> get: aSymbol
        ^ dict at: aSymbol ifAbsent: [nil].

    CljDict >> set: aSymbol to: anObject
        "aSymbol is the name of a setter method."
        ^ dict at: aSymbol allButLast put: anObject.

As you can see, the caveat is this: we subclass Object, and Object has a fairly extensive vocabulary. Attempting to use a key that matches a selector in Object‘s vocabulary will cause surprising behaviour, in that you will see the result of Object doing something, not simply the access of a value (through #doesNotUnderstand:). We could subclass ProtoObject, but usually we only do that when we absolutely have to.

As it turns out, the dictionary in zip.clj maps isomorphically to a TreeZipperNode:

Datum zip.clj dictionary TreeZipperNode
Left context :l #leftNodes
Right context :r #rightNodes
History of visited nodes :pnodes #value [1]
Previous context :ppath #path

[1] Really, you need to collect the nodes yourself by recursing over the chained TreeZipperNodes.

Did I say isomorphically? zip.clj does use another key, :changed?. As it turns out, this reveals a bug in TreeZipper! The assertion zipper root == zipper down up root will fail, because TreeZipper will blindly create a new node, even for a node that hasn’t changed! And it’s by storing the previously visited nodes in :pnodes that zip.clj can traverse a structure and re-root, returning the identical structure.

by
Frank Shearar
on
18/10/11
 
 


five + 1 =

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us