Translating a persistent union-find from ML to Smalltalk
When I wrote my unification library a while back, I tried to add an “or matcher”. That is, something that would allow
| matcher mgu |
matcher := OrUnifier
left: (TreeNode left: #x asVariable)
right: (TreeNode right: #x asVariable).
mgu := matcher =? (TreeNode left: (Leaf value: 1)).
mgu at: (#x asVariable) "=> (Leaf value: 1)".
mgu := matcher =? (TreeNode right: (Leaf value: 1)).
mgu at: (#x asVariable) "=> (Leaf value: 1)".
Easy enough… until one tries to use an OrUnifier as an operand on the right hand side. See, as the unification progresses, if the first option fails, you’d like to backtrack part of the equivalence relation calculation, and with the imperative union-find in Nutcracker that’s not possible. What to do, what to do?
The standard solution is to reach into one’s toolbox of functional data structures. Sadly, noone knows (as far as I can see, at least) how to implement a functional union-find. At least, not an efficient one. However, Conchon and FilliĆ¢tre tell us how to implement a persistent union-find.
The implementation – originally in ML – uses a persistent array to get its rollbackability. Further, it uses “rerooting”, a trick Henry Baker wrote up, to improve efficiency. I couldn’t improve on the pictures Conchon and FilliĆ¢tre use to illustrate rerooting, so I won’t try, and just point you to their artwork, pp. 18-27. The structure makes massive use of side effects, but presents an apparently purely function API. First, the basic data structure:
Note the
ref
there – it’s an updatable reference to something. Since everything’s mutable by default in Smaltalk, I tried ignoring the
ref
and just translate things. However, I quickly ran into difficulties. The “massive use of (hidden) side effects” quickly bit me (see section 4 of the paper), as I attempted to translate the following:
let set t i v = match !t with
| Arr a as n ->
let old = a.(i) in
a.(i) <- v
let res = ref n in
t := Diff (i, old, res);
res
| Diff _ ->
ref (Diff (i, v, t))
It all looks quite simple. But look at
t := Diff (i, old, res)
. Ignoring it at first, the obvious translation would be (ignoring noise like class declarations):
Diff >> set: index to: anObject
^ Diff index: index value: v in: self
Arr >> set: index to: anObject
| old res ref |
old := a at: i.
a at: i put: v.
res := self.
self become: (Diff index: i value: v in: self)
^ self.
Did you feel a shiver there?
#become:
is deep Smalltalk magic. Its operation is simple enough: it swaps two object pointers. Here, we will change self to a new object. Oh, did I say “swap two object pointers”? I meant to say “swap two object pointers throughout the entire image”. Deep, dangerous magic indeed. And I thought, as I tried to figure out what was going on, that there had to be an easier way. What if I modelled the
ref
itself?
Object subclass: #Ref
instanceVariableNames: 'value'
classVariableNames: ''
poolDictionaries: ''
category: 'PersistentUnionFind'.
Ref class >> wrapping: anObject
^ self new wrapping: anObject.
Ref >> value
^ value.
Ref >> value: anObject
value := anObject.
Ref >> wrapping: anObject
value := anObject.
Now you can have an immutable reference to something that, itself, may change to what it points. (Yes, that sounds a lot like “well done, you’ve invented a pointer!”) With that in hand, let’s hide the gory bits -
Arr
and
Diff
- behind a nice clean
PersistentCollection
interface:
Ref subclass: #PersistentCollection
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PersistentUnionFind'.
PersistentCollection >> at: index put: anObject
t := value. "The equivalent of !t"
^ t isDiff
ifTrue: [Diff index: i value: anObject in: self]
ifFalse: [ | old |
old := t array at: index.
t array at: index put: anObject.
self value: (Diff index: i value: old in: self)]
which doesn’t look too bad, in comparison to the original!
The code’s published at SqueakSource, in the PersistentUnionFind package:
Installer ss
project: 'Nutcracker';
install: 'PersistentUnionFind'.
