technology from back to front

Least fixed points, and the grouping of behaviours

Sometimes one has a set of interrelated (monotonic) recursive equations one needs to solve. An naïve implementation will recurse infinitely. Handily there’s a solution: the least fixed point. I had need of one the other day, implementing parsing with derivatives. So why bother with least fixed points? Let’s find out…

When one parses with derivatives (we’ll eventually get around to actually parsing with derivatives in a later post, don’t worry), one often has to calculate whether or not a parser is the null language. This function – δ – returns the null language if its input contains the null string, and the empty set otherwise. Matt Might’s implementation uses Racket’s language features to great effect, giving the lucid definition below:

(define/fix (is-null? l)
  #:bottom #t
  (match l
    [(∅)            #f]
    [(ε _)          #t]    
    [(token _)      #f]
    [(∪ l1 l2)      (and (is-null? l1)  (is-null? l2))]
    [(∘ l1 l2)      (and (is-null? l1)  (is-null? l2))]
    [(* l1)         (or (is-null? l1) (is-empty? l1))]
    [(→  l1 _)      (is-null? l1)]))

This means:

  1. The empty parser (∅) does not accept the empty string.
  2. The empty string parser (ε) accepts the empty string.
  3. Any token parser does not accept the empty string.
  4. A union parser – this-parser-or-that-one – is nullable if both of its subparsers do.
  5. A concatenation parser – this-parser-then-that-one – accepts the empty string if both its subparsers do.
  6. The Kleene star parser accepts the empty string if its subparser does, or its subparser is empty.
  7. Any reduction parser accepts the empty string only if the parser it reduces does.

Note the recursive nature of this function. If we have a self-referring parser (like in a left-recursive rule in a grammar), we could not use a simple recursive calculation, because to find whether that parser was nullable, we’d need to know if it was nullable! Least fixed points solve this problem, allowing us to calculate nullability regardless of self-references.

Remember that the one over-arching principle in Smalltalk is that computation proceeds by objects sending each other messages. Objects define for themselves what any particular message means – how it will respond to a message. That means that, generally, we don’t write methods that internally pattern match (on classes) and dispatch. The Smalltalky way of writing nullable? is to defer implementation to the various objects. Let’s define our entry point to this function:

DerivingParser >> isEmptyString
    | emptyString |
    emptyString := [:x | x isEmptyString: emptyString] lfpWithBottom: true.
    ^ emptyString value: self

#lfpWithBottom

instantiates a LeastFixedPoint that starts with the bottom value true and repeatedly sends #isEmptyString: to subparsers until it, well, finds the least fixed point. Various parsers may now define their own meanings for #isEmptyString::

Empty >> isEmptyString: anLFP
    ^ false.

EmptyString >> isEmptyString: anLFP
    ^ true.

Token >> isEmptyString: anLFP
    ^ false.

Union >> isEmptyString: anLFP
    ^ (anLFP value: self this) and: [anLFP value: self that].

Cat >> isEmptyString: anLFP
    ^ (anLFP value: self left) and: [anLFP value: self right].

Rep >> isEmptyString: anLFP
    ^ (anLFP value: self lang) or: [self lang isEmptySet].

Red >> isEmptyString: anLFP
    ^ anLFP value: self lang.

Each #isEmptyString: implementation moves the computation one step along, and the LeastFixedPoint does the hard work of protecting against cyclical structures, memoising, and the like.

Something that leaped out at me during this translation work was that my tooling is inadequate. In the Racket implementation you can see in just a few lines of code the entirety of the algorithm. In contrast, in a normal Smalltalk image I have to bounce between multiple “pinhole” views of the code: a Browser shows one method at a time. In Ruby I’d be scrolling up and down buffers, so it’s not just Smalltalk lacking in the tooling here, even though the pain is slightly different. A default Squeak has half of what I’d like – I can hit the implementors-of button and see all the parts of the method… but one at a time. I’d really like to see a browser that displayed multiple methods in one window. Old-timers might point out the Whisker Browser. Yes, something like that, only I’d like the tooling to automatically display the methods. Something like “display, in one blob of text, the source for all implementors of #isEmptyString: that subclass DerivingParser”. That display surface can then just have the selector at its top, and elide the repetition we see in the chunk of code above.

In contrast, if one follows the oh-no-don’t-switch-on-class approach, we can have with something like this, the combinator that returns the null parser if a parser’s language contains the null string, and the empty set otherwise:

isNullable
    | nullable |
    nullable := [:x |
        x class caseOf:
            {[Empty] -> [false].
            [EmptyString] -> [true].
            [Token] -> [false].
            [Rep] -> [nullable value: x lang].
            [Nullable] -> [nullable value: x lang].
            [Union] -> [(nullable value: x this) or: [nullable value: x that]].
            [Cat] -> [(nullable value: x left) and: [nullable value: x right]].
            [Red] -> [nullable value: x lang]}] lfpWithBottom: false.
    ^ nullable value: self.

I’d like my tools to display with Racket’s brevity, but still keep the association of the behaviour with the object! Or, maybe I’m just an AND kind of guy: I’d like to view my code both by grouping all the bits of some function together, and I’d also like to view the same code by grouping all the bits of things that can act on one type.

by
Frank Shearar
on
27/07/12
 
 


five × 8 =

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us