technology from back to front

Algebraic Data Types and OMeta2

There have been a recent rash of grammar libraries written for Smalltalk. We have at least three Parsing Expression Grammar (PEG) libraries: OMeta2/Squeak, Xtreams and PetitParser. Today we’re going to look at OMeta2.

OMeta2′s quite interesting in that your rules are methods (written in OMeta2, of course) on a class. You write OMeta2 rules, in other words, through the same process and with the same tools as you would Smalltalk methods. You can do this because OMeta2Base>>#compilerClass changes the compiler the system uses for that class’ methods. That means you can type in something like

    a =
       a:as $a -> [as, $a] | $a -> [$a asString]

which accepts any non-empty string of “a” characters, or a* (returning the parsed collection of $a values as a String), and when you hit “accept”, the OMeta2Compiler compiles the rules, rather than the usual Compiler.

Did you notice something in that grammar, by the way? “PEG parsers can’t support left recursion.” Well, yes, they can (see section 3 of Expressing with Programming Languages. Some PEG parsers can denormalise directly recursive rules like the above – and can thus handle them without blowing a stack – but then they still can’t support indirectly left recursive rules. OMeta’s tweaks do, however:

    ab =
        $a ba:xs -> [$a asString, xs] | $a -> [$a asString]

    ba =
        $b ab:xs -> [$b asString, xs] | $b -> [$b asString]

accepts the language (ab)* – any number of “ab” pairs (and at least one).

Of course, there’s a price to pay: you lose guaranteed linear parse times. However, this isn’t that big a problem: in practice OMeta’s authors/users have found that almost all the time you get near-linear performance (or improved parse times – left recursion uses constant stack space, while right recursion uses linear stack space), except in the case of certain kinds of pathological grammars – you can construct a grammar that left-recurses in such a way that OMeta can’t reuse the memoised production. Practically speaking, that doesn’t really happen.

Since OMeta compiles to its host language, the rest of the image sees our rule as normal Smalltalk. In particular, our rule for a* decompiles to Smalltalk like this:

        | as |
        ^ self ometaOr: {[true
                ifTrue: [as := self apply: #a.
                    self apply: #exactly withArgs: {$a}.
                    as , $a]]. [true
                ifTrue: [self apply: #exactly withArgs: {$a}.
                    $a asString]]}

That true ifTrue: [...] looks pretty strange. It accomplishes the same thing as PROGN in Common Lisp: it’s an easy way to turn a list of statements into an expression. The alternative – [...] value – is less efficient, in that it creates an extra context. true ifTrue: [...] gets compiled into just a pair of jumps.

And you can see that [$a asString] compiles as plain Smalltalk. These kinds of expressions are semantic actions, arbitrary chunks of host language. One of the complaints I hear about OMeta is precisely that it embeds these semantics in the parser, making it difficult to reuse the grammar. An easy way to generalise the semantic actions in an OMeta2 grammar – to have multiple ways of processing the grammar – is to use a pluggable object. For instance,

    ab =
        $a:a ba:xs -> [visitor append: a to: xs] | $a:a -> [visitor leaf: a]

    ba =
        $b:b ab:xs -> [visitor append: b toL xs] | $b:b -> [visitor leaf: b]

visit is an instance variable of the grammar object, and you simply fill that value with something suitable.

Because in a sense OMeta’s compiled to Smalltalk, you can apply all the normal tools to OMeta code – senders-of, implementors-of, and so on. If you mis-spell a selector, it’ll prompt you for a correction, and so on. (This is a bit rough at the moment – you’ll get walkbacks rather than the expected correction.)

OMeta2 has another trick up its sleeve. It doesn’t only consume characters off a stream: it can parse streams of arbitrary objects. So let’s try out parsing these objects. We’ll define a class AlgebraicDataType with a few methods that let it act like a sequenceable collection:

  • #unapply will tear the structure apart into a sequenceable collection. (“Unapply” to hint at Scala’s nomenclature.)
  • #readStream will delegate to self unapply.
  • #isCollection, #isSequenceable will return true, telling other things we act like a sequenceable collection.

Then we define a Tree as being something that’s either a Node with two subtrees called left and right, or a Leaf with a value, or Empty.

    Object subclass: #AlgebraicDataType
        instanceVariableNames: ''.

    AlgebraicDataType subclass: #Tree
        instanceVariableNames: ''.

    Tree subclass: #Node
        instanceVariableNames: 'leftTree rightTree'.

    Tree subclass: #Leaf
        instanceVariableNames: 'value'.

    Tree subclass: #Empty
        instanceVariableNames: ''.

        ^ {#Node. leftTree unapply. rightTree unapply.}

        ^{#Leaf. value.}

        ^ #(#Empty)

With that in place, we can write, in OMeta, pattern-matching functions for our tree:

    depth =
        {#Empty} -> [0]
        | {#Leaf anything} -> [1]
        | {#Node depth:l depth:r} -> [(l max: r) + 1]

    sum =
        {#Empty} -> [0]
        | {#Leaf anything:v} -> [v]
        | {#Node sum:l sum:r} -> [l + r]

Since we made Trees look like (sequenceable) collections, OMeta can stream over their contents.

So far we haven’t bought much: the above pattern matching is tedious and bulky to use: TreeWalker match: myTree with: #depth is not the height of elegance!

We can hide some of the ugliness:

        ^ TreeWalker match: self with: #depth

        ^ TreeWalker match: self with: #sum

Better, one can remove the duplication of declaring a rule and forwarding a rule:

    Tree>>doesNotUnderstand: aMessage
        ^ (TreeWalker canUnderstand: aMessage selector)
            ifTrue: [TreeWalker match: self with: aMessage selector]
            ifFalse: [super doesNotUnderstand: aMessage]

Now we can call OMeta2 rules on our Tree objects directly – Tree empty sum, for instance. (When you send a message that Tree doesn’t understand, it checks whether TreeWalker can understand the message. If so, it forwards the message to the TreeWalker; if not, we follow the usual rules.)

We could improve things further by integrating OMeta more tightly into the language. Being able to embed OMeta in a new kind of literal, for instance, or being able to say that only certain methods should be compiled by OMeta, would make our lives easier. Both of these require fairly invasive changes though.

Frank Shearar

five + 7 =

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us