technology from back to front

Rolling your own control structures with lambdas

Squeak Smalltalk ships with an, ahem, mildly controversial feature: a case statement. Case statements usually evoke “but that’s not OO!” from people, usually with good reason: a complicated case statement only gets less understandable as it evolves, while a State implementation’s complexity remains more or less constant. (You can concentrate on only the bit you care about; a nightmare case statement requires you to know about a great deal more code.)

And yet. And yet, I managed to extend Andreas Raab’s DNSClient library[1] to parse SRV and NAPTR records by adding a single statement each, thanks to a case statement.

So with my hat firmly in the pragmatist corner (as opposed to the ideologically pure corner, for a change), let’s use our new unification library, and build a pattern matching (really, a unifying) case statement.

We use a case statement like this:

    DnsRecord>>readDataFrom: aStream
        "Reads and decodes the response specific data"

        | n |
        n := aStream nextNumber: 2.
        ^queryType caseOf: {
                [2]  -> [self readNameFrom: aStream]. "NS"
                [5]  -> [self readNameFrom: aStream]. "CNAME"
                [12] -> [self readNameFrom: aStream]. "PTR"
        } otherwise:[aStream next: n].

which is to say, #caseOf:otherwise: takes an array of associations of thunks (parameterless anonymous functions/closures). It evaluates the key thunk and compares it to self (queryType in this case); if they’re equal (= returns true) then the value thunk’s evaluated and returned. In the event of no matches, we evaluate the second parameter – also a thunk – and return its value.

Technically you don’t need to use thunks as keys. We evaluate a thunk by sending it the #value message, and since Object has a #value implementation (returning self), we could just use integer or string literals (or anything else that responds to #value). So why do we use thunks? Because we’re lazy. To be precise, by using thunks we avoid having to evaluate thunks unnecessarily.

An initial implementation of our new control structure is almost embarassingly simple, and is a near cut-and-paste of the existing #caseOf:otherwise:

    Object>>matchOneOf: someAssociations otherwise: aBlock
        someAssociations associationsDo:
            [:assoc | (assoc key value =? self) ifTrue: [^assoc value value]].
        ^ aBlock value

where =? is the unification operator (a short-hand form of #unify:). Of course this doesn’t work, because =? is not a Boolean operator: unification will result in either a most general unifier (which may be empty!) or fail. So we write instead:

    Object>>matchOneOf: someAssociations otherwise: aBlock
        someAssociations associationsDo:
            [:assoc | | mgu |
                [mgu := assoc key value =? self.
                ^ assoc value cull: mgu] on: UnificationFailure do: ["nothing"]].
        ^ aBlock value

More useful: now you can say

    (Node left: (Leaf value: #y asVariable)) matchOneOf:
        {[Leaf value: #x asVariable] -> ['a leaf'].
        [Node left: #x asVariable] -> ['a node'].}
        otherwise: ['foo']

and get the result 'a node'. Of course, you often want to know what that mgu was…

    Object>>matchOneOf: someAssociations otherwise: aBlock
        someAssociations associationsDo:
            [:assoc | | mgu |
                [mgu := assoc key value =? self.
                 ^ assoc value cull: mgu] on: UnificationFailure do: ["nothing"]].
        ^ aBlock value

So: for each key thunk, try unify the structure returned by the thunk with self. If we find no match, we return aBlock value. If we do, #cull: either evaluates the assoc value nullary block, or passes the mgu into the unary block:

    (Node left: (Leaf value: #y asVariable)) matchOneOf:
        {[(Leaf value: #x asVariable)] -> ['a leaf'].
        [Node left: #x asVariable] ->
            [:mgu | mgu printString].
        }

    "Evaluates to: <b>'a Dictionary((#Variable #x)->(#Leaf (#Variable #y)) )'</b>"

Note how the above demonstrates unification rather than pattern matching!

This isn’t a terribly efficient construct: worst-case it’s going to be quadratic (N possible thunk evaluations, with a (slightly super-) linear algorithm on each evaluated thunk). As it stands, the most interesting thing about the construct is that it shows how a lightweight closure syntax allows for interesting possibilities.

Future work might be to turn the construct into a proper object. That object could memoise the thunks, which reduces the time cost to a linear cost. Fancier still, and in the interests of safety, one might implement a compile-time transformation, letting the Compiler inspect the thunks’ ASTs to disallow certain behaviours, or augment the behaviours. Why, that sounds a lot like a macro. It’s not without precedent: Squeak ships with compile-time transformations of #future and #future: message sends. That’s work for another day though!

[1]If you’d like to see the code in question, you can install DnsClient thusly:

    (Installer ss project: 'ar')
        install: 'DnsClient-Core';  "all the code"
        install: 'DnsClient-Tests'; "the tests"
        install: 'DnsClient-Hacks'. "use DnsClient when present"
by
Frank Shearar
on
31/07/11
  1. I’ve never heard even hard-core OO people say that case statements per se are bad practice. The truly questionable practice is a case statement that is dispatching on the class of an object, rather than using a generic operation (“message”, whatever you call it) on the object. The latter is extensible (as long as you document the operation’s semantics properly).

    More generally, it raises a red flag any time that code asks “what is the type of this object”, as in the “instanceOf” operator in Java, since it suggests that you might also be missing a place in which extensibility would be better.

    None of this constitutes sacrosanct rules. The important thing is that the developer should understand these issues and take them into account.

  2. I’ve seen heated discussions over how case statements “aren’t OO”, involving people who should know better. I won’t link to it, in the interests of protecting the guily :).

    A case statement dispatching on type is the usual cause of such arguments, which conflates two issues: (a) dispatching on class means you need a virtual method/generic operation, and (b) a glorified if-else chain.

    The case statement in the first example above is one of the “nice” case statements – it’s not switching on class, but is simply a more concise if-else chain.

    It’s a short step from “case statements dispatching on types are bad” to “case statements are bad”, in a classic example of short-circuit thinking, unfortunately.

 
 


− three = 2

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us