technology from back to front

Archive for June, 2008 and that S-shape

In my last post about this I observed an S-shape in the results of the polling data, and speculated that it might show psychological bias on the part of the Intraders. I’m not so sure now. This graph shows all polls in the last 30 days; recent polls are dark colours and older ones lighter, and the S-shape is much less visible. So it may simply be an artifact of the way we aggregate polling data to generate a single figure.

Paul Crowley

Polling vs

I’ve been concentrating on using polls to predict the outcome of the Presidential election here, but another alternative is to let someone else do it for you – or lots of people, who are prepared to put their money where their mouth is. This is the appeal of prediction markets like participants (who I call “Intraders”) effectively bet on the outcome, and the bet is backed not by the company, but by other Intraders. Their collective opinion on the likelihood of the different outcomes sets the market price, and if you think they’ve got it wrong you can put money on it.

Prediction markets for the 2008 election were recently discussed in (referencing this discussion on; his conclusion was that Intraders just follow the polls, and so you might as well just look at the polls directly. Is this the right conclusion?

I wasn’t entirely happy with the way the curve shown in the graph was chosen – I wanted a more direct way to show the relationship between polling and prices. So I’ve translated the market prices into a measure more amenable to calculation, which I call the PPF. First, I translate the prices into a Democratic victory probability by dividing the price for a Democratic market in each state by the sum of the prices for the Democratic and Republican markets in that state; this works around the fact that for various reasons these prices don’t quite add up to 100. Second, I feed this into Φ-1, the “percentage point function” of the normal distribution – so I’m assuming that Intraders are making a guess at the probability distribution of the eventual margin of victory, and that it’s normally distributed.

The advantage of manipulating the figures in this way is that we can then just fit a straight line to the numbers to see what that implies about what Intraders believe about the market.


I think the resulting graph shows that Intraders are strongly influenced by the polls, but that they are by no means the only influence on how they bet.

First, look at how far the points stray from the line. New Hampshire (NH) and California (CA) look about the same as far as the polls are concerned, but Intraders are much more confident of a Democratic victory in CA than they are in NH. This scatter is representative of all the non-polling data that the Intraders are bringing to bear in making their estimates.

Second, we can learn something from the line we’ve fitted. From where the line crosses the x-axis, we can conclude that Intraders think that Obama is going to lose a percentage point on average, nationally, compared to today’s polls. That’s not enough to lose the election, but it’s a significant shift; if it reflects a pro-Republican bias on the part of Intraders then there’s money to be made betting on Democrats there. And the slope of the line means they think the standard deviation of the difference between the polls and the final results will be around 15%.

Third, there’s an interesting S-shape visible in the graph. Our conversion from probabilties to PPF should have eliminated that, and left us with something closer to a straight line. I think this reflects psychological errors on the part of Intraders – they are happy to use guesswork when they polls even, follow the polls when they show wider margins, but when the polls show very wide margins they can’t quite buy it, and offer prices that would be more appropriate for a tighter race. I strongly suspect that this means one could make some money betting exactly according to the line fitting on this graph – ie betting on the Republicans for the points above the line, and on the Democrats for the points below it – and if I had money to spare I’d try it instead of writing about it here.

Paul Crowley

Mercurial merge technique

We’re using Mercurial here at LShift for much of our development work, now, and we’re finding it a great tool. We make heavy use of branches (“branch per bug”) for many projects, and this is also a pretty smooth experience. One issue that has come up is policy regarding merging the trunk (“default”) into any long-lived feature/bug branches: should you do it, or should you not?

My vote is that you should merge default into long-lived branches fairly regularly; otherwise, you have a big-bang, all-at-once nightmare of a merge looming ahead of you. If you do merge frequently, though, there’s one subtlety to be aware of: hg diff is not history aware, so in order to get an accurate, focussed picture of all the changes that have been made on your long-lived branch, you need to do one of two things:

* either, merge default into your long-lived branch right before you merge the long-lived branch back into default, and run hg diff after that’s complete; or
* (recommended) do a throw-away test-merge of the long-lived branch into default directly.

Imagine a history like this:

 (2) (3)
  |   |
   \ /

… where (1) is an ancestral revision, (2) is the default branch, and (3) is the long-lived branch – let’s call it “foo”.

Given this history, running hg update -C default (to make the working copy be the default branch, i.e. revision (2)) followed by hg diff foo will give you a misleading diff – one that undoes the changes (1) to (2) before doing the changes from (1) to (3). This is almost certainly not what you want!

Instead, run a test merge, by hg update -C default followed by hg merge foo and then plain old hg diff. Note that this modifies your working copy! You will need to revert (by hg update -C default) if you decide the merge isn’t ready to be committed.

The output of hg diff after the hg merge shows a history-aware summary of the changes that the merge would introduce to your checked-out branch. It’s this history-awareness (“three-way merge”) that makes it so much superior to the history-unaware simple diff (“two-way merge”).


TiddlyWiki, Quining, and Smalltalk

Yesterday I visited Osmosoft, the developers of TiddlyWiki, to chat about getting some DVCS-like functionality into TiddlyWiki. Jeremy mentioned in passing that TiddlyWiki is, if you squint, a slightly cheating kind of a Quine. It struck me this morning that TiddlyWiki has strong similarities to another famous almost-Quine, the Smalltalk system.

TiddlyWiki is a composition of

* a HTML page, with embedded Javascript, acting as a container for the pieces of content, called tiddlers, in the wiki;
* some tiddlers, specially marked as plugins, containing source code permitting extension of the skeleton system; and
* other tiddlers containing wiki markup text.

The HTML/Javascript container of the tiddlers (along with the web browser it runs on!) is like the Smalltalk virtual-machine. The tiddlers themselves are the live objects in the system. The process of running the embedded plugin scripts is the same as running Smalltalk’s post-image-load startup hooks. The plugins-in-tiddlers is the same as the source-in-the-image. The process of saving the TiddlyWiki instance to disk is the same as a Smalltalk instance extracting its live object graph and serializing it to disk.

The main difference I can see is that Smalltalk doesn’t carry so much of its VM around with its images: like Smalltalk’s reliance on an external VM being present, TiddlyWiki can rely on the browser being present for a big chunk of its VM, but has to carry the bootstrapping container code bundled with the stored tiddlers/live-objects. Also, TiddlyWiki instances are (these days!) constructed by a special assembly process from small, separate text files checked into Subversion, whereas Smalltalk images were not historically constructed from scratch very often at all. Finally, TiddlyWiki’s boot process is heavier than Smalltalk’s, because it’s forced by the browser to recompile all the sources in the system, where Smalltalk gets away with having bytecode in the image alongside the sources.


TravelDK Wins ‘Best Use of Technology’ Award

The Dorling Kindersley TravelDK site won the ‘Best Use of Technology’ award at the 2008 Travolution awards.

Technically designed and built by LShift, the site won the Best Use of Technology award in the Travel Information category, at the Travolution Awards this year. The awards recognise the best operators, agents, portals, digital marketers and suppliers in the travel industry. Travolution had a record number of entries which were scrutinised by a panel of judges from across the travel industry.

Judges said “DK has utilised cutting edge e-publishing techniques to produce a flexible and simple service for visitors to its website. It takes the in-resort guidebook to the next level.”

Launched last year, was awarded a 2007 TravelMole Web award for Best Website in the Holiday/Travel Extras category. The DK website attracts a growing number of unique visitors and continues to develop with more destinations and added features.

With Google maps, and thousands of images, reviews and ratings, the site enables travellers to add their own travel highlights and read those of others alongside recommendations from the DK Top 10 travel guide series. Uniquely also, DK can create customized guides for marketing partners.

“I’m delighted with this award”, says Georgina Atwell, DK Online Director. “We knew had to offer travellers something genuinely different, and giving the power to the community to create their own pdf or print-on-demand travel guide has proved an enormous success.”


diff3, merging, and distributed version control

Yesterday I presented my work on Javascript diff, diff3, merging and version control at the Osmosoft Open Source Show ‘n Tell. (Previous posts about this stuff: here and here.)
The slides for the talk are here. They’re a work-in-progress – as I think of things, I’ll continue to update them.

To summarise: I’ve used the diff3 I built in May to make a simple Javascript distributed version-control system that manages a collection of JSON structures. It supports named branches, merging, and import/export of revisions. So far, there’s no network synchronisation protocol, although it’d be easy to build a simple one using the rev import/export feature and XMLHttpRequest, and the storage format and repository representation is brutally naive (and because it doesn’t yet delta-compress historical versions of files, it is a bit wasteful of memory).

You can try out a few browser-based demos of the features of the diff and DVCS libraries:

* a demo of diff, comm, and patch functionality.
* a demo of three-way merge and conflict-handling functionality.
* a demo of a Javascript DVCS, a bit like Mercurial, that manages a collection of JSON objects (presenting them in a file-like way, for the purposes of the demo).

The code is available using Mercurial by hg clone (or by simply browsing to that URL and exploring from there). It’s quite small and (I hope) easily understood – at the time of writing,

* the diff/diff3 code and support utilities are ~310 lines; and
* the DVCS code is ~370 lines.

The core interfaces, algorithms and internal structures of the DVCS code seem quite usable to me. In order to get to an efficient DVCS from here, the issues of storage and network formats will have to be addressed. Fortunately, storage and network formats are only about efficiency, not about features or correctness, and so they can be addressed separately from the core system. It will also eventually be necessary to revisit the naive LCA-computation code I’ve written, which is used to select an ancestor for use in a merge.

The code is split into a few different files:

* The sources for the diff and diff3 demos and the DVCS demo. In the latter, check out the definition of presets.preset1 for an example of how to use the DVCS, and presets.ambiguousLCA for an example of the repository format and the use of the revision import feature.
* The diff and diff3 code itself.
Graph utilities (for computing LCA etc)
* The DVCS and pseudo-file-system code.
* The repository history-graph-drawing code and a python script for drawing the little tile images used in rendering a repository history graph.


Last word on Clinton v Obama: I think it’s illusory

Clinton will probably drop out of the race in the next few days, so let’s give the diagram showing both of them one last airing. This looks at a month’s worth of polling data to give a picture of how their relative chance of victory has changed over time – it’s an animated GIF, so you’ll need to have GIF animation enabled in your browser.

She’s moved from being a percentage point below Obama to two percentage points ahead of him. What changed so much over the course of May? My guess is simply that people who aren’t natural Democratic voters are more likely to feel warmth to Clinton the further the nomination gets from her grasp, and we’d be seeing the exact opposite picture if it were Clinton who was expecting the concession call any day now.

Update: more commentary on this curious shift that seems to make a similar point.

Paul Crowley



You are currently browsing the LShift Ltd. blog archives for June, 2008.



2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us