technology from back to front

Translating a persistent union-find from ML to Smalltalk

When I wrote my unification library a while back, I tried to add an “or matcher”. That is, something that would allow

| matcher mgu |
matcher := OrUnifier
    left: (TreeNode left: #x asVariable)
    right: (TreeNode right: #x asVariable).

mgu := matcher =? (TreeNode left: (Leaf value: 1)).
mgu at: (#x asVariable) "=> (Leaf value: 1)".

mgu := matcher =? (TreeNode right: (Leaf value: 1)).
mgu at: (#x asVariable) "=> (Leaf value: 1)".

Easy enough… until one tries to use an OrUnifier as an operand on the right hand side. See, as the unification progresses, if the first option fails, you’d like to backtrack part of the equivalence relation calculation, and with the imperative union-find in Nutcracker that’s not possible. What to do, what to do?

The standard solution is to reach into one’s toolbox of functional data structures. Sadly, noone knows (as far as I can see, at least) how to implement a functional union-find. At least, not an efficient one. However, Conchon and Filliâtre tell us how to implement a persistent union-find.

The implementation – originally in ML – uses a persistent array to get its rollbackability. Further, it uses “rerooting”, a trick Henry Baker wrote up, to improve efficiency. I couldn’t improve on the pictures Conchon and Filliâtre use to illustrate rerooting, so I won’t try, and just point you to their artwork, pp. 18-27. The structure makes massive use of side effects, but presents an apparently purely function API. First, the basic data structure:

type 'a t = 'a data ref
and 'a data =
    | Arr of 'a array
    | Diff of int * 'a * 'a t

Note the ref there – it’s an updatable reference to something. Since everything’s mutable by default in Smaltalk, I tried ignoring the ref and just translate things. However, I quickly ran into difficulties. The “massive use of (hidden) side effects” quickly bit me (see section 4 of the paper), as I attempted to translate the following:

let set t i v = match !t with
    | Arr a as n ->
        let old = a.(i) in
        a.(i) <- v
        let res = ref n in
        t := Diff (i, old, res);
    | Diff _ ->
        ref (Diff (i, v, t))

It all looks quite simple. But look at t := Diff (i, old, res). Ignoring it at first, the obvious translation would be (ignoring noise like class declarations):

Diff >> set: index to: anObject
    ^ Diff index: index value: v in: self

Arr >> set: index to: anObject
    | old res ref |
    old := a at: i.
    a at: i put: v.
    res := self.
    self become: (Diff index: i value: v in: self)
    ^ self.

Did you feel a shiver there? #become: is deep Smalltalk magic. Its operation is simple enough: it swaps two object pointers. Here, we will change self to a new object. Oh, did I say “swap two object pointers”? I meant to say “swap two object pointers throughout the entire image”. Deep, dangerous magic indeed. And I thought, as I tried to figure out what was going on, that there had to be an easier way. What if I modelled the ref itself?

Object subclass: #Ref
    instanceVariableNames: 'value'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PersistentUnionFind'.

Ref class >> wrapping: anObject
    ^ self new wrapping: anObject.

Ref >> value
    ^ value.

Ref >> value: anObject
    value := anObject.

Ref >> wrapping: anObject
    value := anObject.

Now you can have an immutable reference to something that, itself, may change to what it points. (Yes, that sounds a lot like “well done, you’ve invented a pointer!”) With that in hand, let’s hide the gory bits – Arr and Diff – behind a nice clean PersistentCollection interface:

Ref subclass: #PersistentCollection
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'PersistentUnionFind'.

PersistentCollection >> at: index put: anObject
    t := value. "The equivalent of !t"
    ^ t isDiff
        ifTrue: [Diff index: i value: anObject in: self]
        ifFalse: [ | old |
            old := t array at: index.
            t array at: index put: anObject.
            self value: (Diff index: i value: old in: self)]

which doesn’t look too bad, in comparison to the original!

The code’s published at SqueakSource, in the PersistentUnionFind package:

Installer ss
    project: 'Nutcracker';
    install: 'PersistentUnionFind'.
Frank Shearar

2 + four =

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us