technology from back to front

Going m(on)ad with parser combinators

It’s about time someone started talking about Go again around here, so I picked up the old editor, and (painlessly!) installed Go. Maybe 5 minutes later I had the world’s faster compiler, a test framework, a coverage analyzer and a bunch of stuff besides available on my machine. But what to do? Hello World is so done, so I thought I’d grab my copy of Hutton & Meijer and implement a basic parser combinator.

First off, Go has no generic types, and apparently little incentive to do so. Which makes things slightly more difficult than I’d have liked. After all, Hutton & Meijer’s parser type has the form type Parser a = String -> [(a, String)]. No matter; we’ll just fake it ’til we make it.

type ParseTree struct{
    Value interface{}
    Remaining string
}

type ParseTrees []ParseTree

A parser may return 0, 1 or many results. No results means a failure to parse anything. A single result represents an unambiguous parse, and multiple results represent multiple interpretations of the same input: an ambiguous grammar. The simplest parser is the “zero” parser: it consumes no input, and returns no results:

type Zero struct {}

Now before we can continue, Go has an interesting take on object-oriented code. We declare an interface, and any type whose method set includes all the methods in that interface are “of that type”. It provides Go with a distinctly dynamic flavour (think Smalltalk, Ruby) while remaining statically typed. Alas, Go doesn’t infer types, but we can’t have everything.

type Parser interface {
    // Parse consumes some input and returns ParseTrees. If the
    // receiver failed to parse anything, Parse returns an empty
    // slice. A slice containing a single element represents an
    // unambiguous parse. Multiple results mean multiple parse
    // trees for the given consumed input.
    Parse(s string) ParseTrees
}

func (p Zero) Parse(s string) (ParseTrees) {
    return ParseTrees{}
}

So now we have a zero parser. We also need a parser that can actually consume input. Hutton & Meijer call it “item”:

type Item struct {}

func (p Item) Parse(s string) (results ParseTrees) {
    if len(s) == 0 {
        results = ParseTrees{}
    } else {
        reader := strings.NewReader(s)
        // ReadRune replaces an invalid rune with U+FFFD!
        // Err is _never_ set!
        first, size, _ := reader.ReadRune()
        if first == 'uFFFD' {
            results = ParseTrees{}
        } else {
            rest := s[size:]
            results = ParseTrees{ParseTree{first, rest}}
        }
    }
    return
}

It’s a pity about the weirdness in the middle there: if ReadRune isn’t supposed to return an error, why’s it in the signature?! Anyway, it could be worse.

Hutton & Meijer go into some detail why they prefer bind over seq in section 2.3 of their paper, which we won’t go into here. Obviously bind refers to the monadic operation, but you did see the “Monadic” in “Monadic Parser Combinators”, right? However, I’m not really interested in the monadic part of things in this post. Instead I’d like to dig a bit into how it feels to write Go this way. First we need to adjust the definition of a Parser:

// Parser is a monadic "function" that turns a prefix of a string
// into a parse tree of some kind.
type Parser interface {
    // Bind stitches together two parsers: it returns a new
    // parser p whose parse trees are those of mapping of
    // the receiver's parse trees through the given function f.
    Bind(f func(v interface{}) (Parser)) Parser

    // Parse consumes some input and returns ParseTrees. If the
    // receiver failed to parse anything, Parse returns an empty
    // slice. A slice containing a single element represents an
    // unambiguous parse. Multiple results mean multiple parse
    // trees for the given consumed input.
    Parse(s string) ParseTrees
}

Again, the lack of generic types bites us, with Go’s version of C’s void *. (I exaggerate; it’s not as bad as void *. I just miss generics.) So what’s it look like to write a Bind-using construct?

func TestFuncBindAnythingRunsFunctionFirst(t *testing.T) {
    i := Item{}
    // The first Func turns the rune into a string. The second Func doubles the string.
    bound := i.Bind(func (v interface{}) Parser { return Result{string(v.(rune))}})
    bound = bound.Bind(func (v interface{}) Parser { return Result{v.(string) + v.(string)}})
    expected := ParseTrees{ParseTree{"aa", ""}}
    actual := bound.Parse("a")
    checkResult(t, "Item `bind` Func `bind` Func", "a", expected, actual)
}

A touch of type inference would help that! OK, but how’s Bind even implemented? In Go, (a) you add methods to types, not interfaces, and (b) types automatically satisfy interfaces by the methods they understand. What that means is that this kind of thing won’t work:

func (p Parser) Bind(f func(v interface{}) (Parser)) (Parser) {
    // Store a function that
    // * gets the inner parser's parse trees
    // * applies f to the parse trees
    // * returns a Result containing those new parse trees.
}

Now remember that bind doesn’t care about which parser you have. From Hutton & Meijer we have

bind :: Parser a -> (a -> Parser b) -> Parser b
p `bind` f = inp -> concat [f v inp' | (v,inp') <- p inp]

Instead we have to write something like this:

// Func is a parser that applies some function to the
// results of a parse.
type Func struct {
    Parser Parser
    Apply func (parseTree interface{}) (Parser)
}

func (p Func) Parse(s string) (results ParseTrees) {
    inner_results := p.Parser.Parse(s)
    results = ParseTrees{}
    for _, result := range inner_results {
        rs := p.Apply(result.Value).Parse(result.Remaining)
        for _, r := range rs {
            results = append(results, r)
        }
    }
    return
}

func (p Item) Bind(f func(v interface{}) (Parser)) (Parser) {
    return Func{p, f}
}

func (p Zero) Bind(f func(v interface{}) (Parser)) (Parser) {
    return Func{p, f}
}

func (p Func) Bind(f func(v interface{}) (Parser)) (Parser) {
    return Func{p, f}
}

func (p Result) Bind(f func(v interface{}) (Parser)) Parser {
    return Func{p, f}
}

For each kind of parser, we have to write a separate Bind implementation. Which shouldn’t be terribly surprising, given (a) the OO nature of Go and (b) the fact that there’s no inheritance. In this particular case we’re able to pull out the common implementation in the form of a Func parser, minimising the duplication.

by
Frank Shearar
on
25/10/13
 
 


six − 2 =

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us