When I wrote my unification library a while back, I tried to add an “or matcher”. That is, something that would allow
| matcher mgu |
matcher := OrUnifier
left: (TreeNode left: #x asVariable)
right: (TreeNode right: #x asVariable).
mgu := matcher =? (TreeNode left: (Leaf value: 1)).
mgu at: (#x asVariable) "=> (Leaf value: 1)".
mgu := matcher =? (TreeNode right: (Leaf value: 1)).
mgu at: (#x asVariable) "=> (Leaf value: 1)".
Easy enough… until one tries to use an OrUnifier as an operand on the right hand side. See, as the unification progresses, if the first option fails, you’d like to backtrack part of the equivalence relation calculation, and with the imperative union-find in Nutcracker that’s not possible. What to do, what to do?
I’m once again visting music-related problems, with a look at a different aspect of the “discovering new music” problem. Now, the LShift jukebox is very good at introducing me to new and weird artists, and it also occasionally tells me about other work by artists I’m already aware of, but this is all rather haphazard.
The core problem I’m trying to address is as follows: you find a new artist, and you really like them and buy all their CDs. Problem is, is that there’s nothing notifying me when they do a new release, and given new releases are a relatively rare event, I forget to check… essentially I want Songkick, but for CDs (or MP3 albums, but the principle still applies). Read more…
What is the fastest way to produce JSON output in Java? Well if you have a complicated object
tree to turn into JSON I would guess it is probaby Jackson.
However, not all JSON output is complicated so maybe we can find quicker and simpler alternatives.
My test class is simple, I call him Thing, he has two fields name and content, he isn’t a Java bean
because he doesn’t need to be (maybe your classes don’t need to be Java beans either!), here he is:
We will use JUnitBenchmarks to test
my theory that you can be simpler and faster than Jackson. JUnitBenchmarks allows unit tests to be run
multiple times and measurements taken, it also allows the code to be warmed up so any JIT compilation
should have been carried out before measurements are taken. I’ve set my tests to run a warmup period of
50 iterations followed by 1000 measurement iterations. The Jackson code being tested looks like this:
When tested we get a mean measurement for writing 250 objects using Jackson of 0.95 seconds
json-lib is usually a worse performer than Jackson.
Here is the equivalent code using json-lib:
When tested we get a mean measurement for writing 250 objects using json-lib of 10.74 seconds. Not so good!
Who needs library code? Maybe that new fangled String.format will be quicker and simpler. Here is the code:
When tested we get a mean measurement for writing 250 objects using String.format of 5.26 seconds. Better than
json-lib but Jackson isn’t looking worried!
I guess that format string must be expensive to parse so lets try a StringBuilder. Here is the code:
When tested we get a mean measurement for writing 250 objects using StringBuilder of 0.91 seconds. Finally a
winner, faster than Jackson but you had better quote those strings properly!
People have always told me that StringBuilder is unsyncronised so should be faster than an old fashioned
StringBuffer so lets check. Here is the code:
When tested we get a mean measurement for writing 250 objects using StringBuffer of 0.60 seconds. Hang on!
That is the fastest yet! What is going on?
What is going on is that I have been playing fast and loose with statistics by only presenting the mean
times in seconds for each benchmark. By extracting the raw data and applying the power of statistics (well
I looked at the distributions and standard deviations) it turns out you cannot tell the difference between
the StringBuilder and the StringBuffer, so all is well, and String[Builder|Buffer] are both winners! Jackson
is also a winner since for more complex object trees it will allow you to write more maintainable code than
using a StringBuilder combined with loops and conditional logic and is almost as fast as a StringBuilder
So what have we learnt? Firstly, use Jackson if you have to serialise your objects into JSON. Secondly,
JUnitBenchmarks is a very handy library. Thirdly, if you don’t present a standard deviation with your benchmark
results then your results may not mean what you think they mean.
Zippers continue to fascinate me. Let’s recap a bit: a zipper is a data structure representing the navigation and mutation of an otherwise immutable structure. We’ve looked at several different implementations here on the LShift blog, in several different languages.
Today, we’re going to see what else we can do with zippers.
LShift is proud to announce that a significant grant from the
Technology Strategy Board has
been awarded for the design and development of the
Progressive Publishing System (PPS).
PPS will allow authors, designers, and publishers to automatically
convert existing electronic documents into standard e-publishing
formats. Existing conversion processes rely on a high degree of manual
intervention, require a mixture of costly, and often proprietary,
software components while delivering unreliable results.
LShift, along with its project partners Mute
and Minibar Ventures,
applied for the funding at the beginning of 2011. We had already
developed a series of in-house technologies to accelerate and automate
the creation of our own documents and so joining forces with Mute (a
leading arts and technology publisher) and Minibar Ventures (a highly
regarded technology start-up advisor) meant that we would be guided in
applying our own efforts to other publishing scenarios and have access
to existing market intelligence.
With the vast range of electronic document formats, unique production
workflows and various methods for handling a document’s “metadata”
(information describing each document) LShift saw the development of PPS
as a complex problem well-suited to its expertise. PPS will allow
developers to extend the range of supported conversion formats as they
change and emerge over time. Not only will complex document structures
be supported (such as tables of contents, indexes, footnotes, endnotes
and image data) but so will any descriptive data associated with the
document. PPS will also support and preserve the data required by direct
distribution and sales platforms (Amazon, Neilsen etc.) for content
creators and publishers.
Along with its extensive metadata tools, PPS will allow users to preview
converted content and ‘teach’ the system custom conversion rules as
idiosyncracies in authorship or materials are discovered over time. The
platform will ultimately allow collaboration and document management
features to facilitate contributions from individuals and groups into
any point of the writing, design and editorial cycle of a document.
The system will go into private beta in February 2012 with several
publishers who are active in diverse content creation and distribution
channels. They include traditional book and magazine publishers through to
organisations with a high-profile online presence but who wish to
automate the conversion of legacy materials.
You are currently browsing the LShift Ltd. blog archives for December, 2011.