technology from back to front

Unifying parts of structures

Those with even a passing familiarity with Prolog should recognise statements like [H|T] = [1,2,3]. In particular, = here is not “is equal to” but rather “unifies with”. So that statement causes the variable H to unify with 1, and T with the rest of the list, [2, 3].

Clojure’s abstract bindings provide much the same capability – (let [[h & t] '(1 2 3)] <do stuff>) – modulo the difference between pattern matching and unification, of course.

There’s a subtlety in something like [H|T] = [1,2,3], at least, if your lists aren’t built of nested cons cells. Consider Smalltalk arrays. Suppose we have some ListUnifier that will rip a SequenceableCollection‘s head off, like #(1 2 3). We’d like the tail to unify with #(2 3) in other words. But that’s not a node in the original structure – it’s an entirely artificial node we wish to construct from the original collection. Firstly, how can we unify with only part of a structure and, secondly, how can we determine a solution from that partition?

Let’s try model the parts:

    DestructuringUnifier subclass: #ListUnifier
        instanceVariableNames: 'head tail'
        classVariableNames: ''
        poolDictionaries: ''
        category: 'Unification-Destructuring'.

    ListUnifier >> head: anObject tail: anotherObject
        head := anObject.
        tail := anotherObject.

    ListUnifier class >> headNamed: headSymbol tailNamed: tailSymbol
        ^ self new head: headSymbol asVariable tail: tailSymbol asVariable.

    "Various helper constructors like #head:tailNamed:, #head:tail:, etc.
     elided for brevity."

Then we can write the original Prolog statement as

    (ListUnifier headNamed: #x tailNamed: #y) =? #(1 2 3)

As mentioned above, first we want to be able to construct an equivalence relation on the above (or, expressed differently, partition the set of nodes in the structure together with the artificial nodes we create) such that #x asVariable and 1 are in the same class, and ditto for #y asVariable and #(2 3).

    unificationClosureWith: anObject in: termRelation
        | h t partition |
        anObject isMetaVariable
            ifTrue: [^ termRelation union: self with: anObject].
        anObject isCollection
            ifFalse: [^ self failToUnifyWith: anObject].
        anObject isEmpty
            ifTrue: [^ self failToUnifyWith: anObject].

        h := head isCollection
            ifTrue: [anObject first: head size]
            ifFalse: [1].
        t := head isCollection
            ifTrue: [anObject allButFirst: head size]
            ifFalse: [anObject allButFirst].
        partition := head unificationClosureWith: h in: termRelation.
    ^ tail unificationClosureWith: t in: partition.

The mild complication around head isCollection lets us support a head that is itself a collection. So let’s check that we can construct a partition using parts of things:

    | left right partition |
    left := (ListUnifier headNamed: #x tailNamed: #y).
    right := #(1 2 3).
    partition := VariableTrackingUnionFind
        usingArrayType: PersistentCollection
        partitioning: Dictionary new.
    partition := (partition find: left)
        unificationClosureWith: (partition find: right) in: partition.
    partition elementsOfClass: #x asVariable. "=> {1 . (#Variable #x)}"
    partition elementsOfClass: #y asVariable. "=> {#(2 3) . (#Variable #y)}"

We see that the partition that originally would only hold nodes in the structures may now hold parts of the original structure.

The original algorithm for determining the most general unifier from some partition as described in Baader & Snyder (pp. 461-462) runs the solution finder starting from the left operand in the unification. Consider the partition we have above. What elements are in the equivalence class of ListUnifier? Well, just the ListUnifier itself! Clearly we need to adjust the solution finder a bit. The obvious approach would be to start the solution-finding from an element in each class, and merge the partial solutions:

    findSolutionFor: aVariableAvoidingUnionFind
    ^ aVariableAvoidingUnionFind
        inject: MostGeneralUnifier new
        into: [:mgu :node |
            mgu addAll: (self new
                findSolutionFor: aVariableAvoidingUnionFind
                starting: node)]

where #addAll: merges the various MostGeneralUnifiers generated and #inject:into: folds over the representative node in each equivalence class. (Remember, a union-find always has a representative for each equivalence class, namely, myPartition find: someObject.) And it works, at the cost of turning a linear algorithm into a (worst case) quadratic one:

    | left right |
    left := (ListUnifier headNamed: #x tailNamed: #y).
    right := #(1 2 3).

    left =? right "=> MostGeneralUnifier((#Variable #x)->1 (#Variable #y)->#(2 3) )"

But “finding a solution” really means “to what must we assign each variable?”. So we can at least speed things up by only solution-finding in those classes in which variables occur:

    findSolutionFor2: aVariableAvoidingUnionFind
    ^ aVariableAvoidingUnionFind variableContainingClasses
        inject: MostGeneralUnifier new
        into: [:mgu :node |
            mgu addAll: (self new
                findSolutionFor: aVariableAvoidingUnionFind
                starting: node)]

This makes finding a solution O(NM), where N is the number of nodes in the structure, M the number of classes containing variables.

Frank Shearar

+ 5 = fourteen

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us