technology from back to front

December, 2011: Translating a persistent union-find from ML to Smalltalk

When I wrote my unification library a while back, I tried to add an “or matcher”. That is, something that would allow


| matcher mgu |
matcher := OrUnifier
    left: (TreeNode left: #x asVariable)
    right: (TreeNode right: #x asVariable).

mgu := matcher =? (TreeNode left: (Leaf value: 1)).
mgu at: (#x asVariable) "=> (Leaf value: 1)".

mgu := matcher =? (TreeNode right: (Leaf value: 1)).
mgu at: (#x asVariable) "=> (Leaf value: 1)".

Easy enough… until one tries to use an OrUnifier as an operand on the right hand side. See, as the unification progresses, if the first option fails, you’d like to backtrack part of the equivalence relation calculation, and with the imperative union-find in Nutcracker that’s not possible. What to do, what to do?

(more…)

by
Frank Shearar
on
31/12/11

December, 2011: Finding new albums by old bands

I’m once again visting music-related problems, with a look at a different aspect of the “discovering new music” problem. Now, the LShift jukebox is very good at introducing me to new and weird artists, and it also occasionally tells me about other work by artists I’m already aware of, but this is all rather haphazard.

The core problem I’m trying to address is as follows: you find a new artist, and you really like them and buy all their CDs. Problem is, is that there’s nothing notifying me when they do a new release, and given new releases are a relatively rare event, I forget to check… essentially I want Songkick, but for CDs (or MP3 albums, but the principle still applies). (more…)

by
Tom Parker
on
29/12/11

December, 2011: Benchmarking simple JSON generation in Java

What is the fastest way to produce JSON output in Java? Well if you have a complicated object tree to turn into JSON I would guess it is probaby Jackson. However, not all JSON output is complicated so maybe we can find quicker and simpler alternatives.

My test class is simple, I call him Thing, he has two fields name and content, he isn’t a Java bean because he doesn’t need to be (maybe your classes don’t need to be Java beans either!), here he is:

<pre>
public class Thing {
    public Thing(String name, String content)   {
        this.name = name;
        this.content = content;
    }

    final public String name, content;
}
</pre>

We will use JUnitBenchmarks to test my theory that you can be simpler and faster than Jackson. JUnitBenchmarks allows unit tests to be run multiple times and measurements taken, it also allows the code to be warmed up so any JIT compilation should have been carried out before measurements are taken. I’ve set my tests to run a warmup period of 50 iterations followed by 1000 measurement iterations. The Jackson code being tested looks like this:

<pre>
public class JacksonStreamingSerialiser implements Serialiser {

    public String toJson(Thing thing) {
        StringWriter out = new StringWriter();
        try {
            JsonGenerator generator = jsonFactory.createJsonGenerator(out);
            generator.writeStartObject();
            generator.writeStringField("name", thing.name);
            generator.writeStringField("content", thing.content);
            generator.writeEndObject();
            generator.close();
        }
        catch (IOException e) {
            e.printStackTrace();
        }

        return out.toString();
    }

    private final JsonFactory jsonFactory = new JsonFactory();
}
</pre>

When tested we get a mean measurement for writing 250 objects using Jackson of 0.95 seconds

json-lib is usually a worse performer than Jackson. Here is the equivalent code using json-lib:

<pre>
public class JsonLibSerialiser implements Serialiser {
    public String toJson(Thing thing) {
        JSONObject object = new JSONObject();
        object.put("name", thing.name);
        object.put("content", thing.content);

        return object.toString();
    }
}
</pre>

When tested we get a mean measurement for writing 250 objects using json-lib of 10.74 seconds. Not so good!

Who needs library code? Maybe that new fangled String.format will be quicker and simpler. Here is the code:

<pre>
public class StringFormatSerialiser implements Serialiser {
    public String toJson(Thing thing) {
        return String.format("{\"name\":\"%s\",\"content\":\"%s\"}", thing.name, thing.content);
    }
}
</pre>

When tested we get a mean measurement for writing 250 objects using String.format of 5.26 seconds. Better than json-lib but Jackson isn’t looking worried!

I guess that format string must be expensive to parse so lets try a StringBuilder. Here is the code:

<pre>
public class StringBuilderSerialiser implements Serialiser {
    public String toJson(Thing thing) {
        StringBuilder builder = new StringBuilder();
        builder.append("{\"name\":\"").append(thing.name).append("\",\"content\":\"").append(thing.content).append("\"}");

        return builder.toString();
    }
}
</pre>

When tested we get a mean measurement for writing 250 objects using StringBuilder of 0.91 seconds. Finally a winner, faster than Jackson but you had better quote those strings properly!

People have always told me that StringBuilder is unsyncronised so should be faster than an old fashioned StringBuffer so lets check. Here is the code:

<pre>
public class StringBufferSerialiser implements Serialiser {
    public String toJson(Thing thing) {
        StringBuffer buffer = new StringBuffer();
        buffer.append("{\"name\":\"").append(thing.name).append("\",\"content\":\"").append(thing.content).append("\"}");

        return buffer.toString();
    }
}
</pre>

When tested we get a mean measurement for writing 250 objects using StringBuffer of 0.60 seconds. Hang on! That is the fastest yet! What is going on?

What is going on is that I have been playing fast and loose with statistics by only presenting the mean times in seconds for each benchmark. By extracting the raw data and applying the power of statistics (well I looked at the distributions and standard deviations) it turns out you cannot tell the difference between the StringBuilder and the StringBuffer, so all is well, and String[Builder|Buffer] are both winners! Jackson is also a winner since for more complex object trees it will allow you to write more maintainable code than using a StringBuilder combined with loops and conditional logic and is almost as fast as a StringBuilder (or StringBuffer).

So what have we learnt? Firstly, use Jackson if you have to serialise your objects into JSON. Secondly, JUnitBenchmarks is a very handy library. Thirdly, if you don’t present a standard deviation with your benchmark results then your results may not mean what you think they mean.

by
tim
on
28/12/11

December, 2011: Implementing maps and folds using Zippers

Zippers continue to fascinate me. Let’s recap a bit: a zipper is a data structure representing the navigation and mutation of an otherwise immutable structure. We’ve looked at several different implementations here on the LShift blog, in several different languages.

Today, we’re going to see what else we can do with zippers.

(more…)

by
Frank Shearar
on
23/12/11
2000-13 LShift Ltd, 1st Floor Office, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK +44 (0)20 7729 7060