In my last post about this I observed an S-shape in the results of the polling data, and speculated that it might show psychological bias on the part of the Intraders. I’m not so sure now. This graph shows all polls in the last 30 days; recent polls are dark colours and older ones lighter, and the S-shape is much less visible. So it may simply be an artifact of the way we aggregate polling data to generate a single figure.
I’ve been concentrating on using polls to predict the outcome of the Presidential election here, but another alternative is to let someone else do it for you – or lots of people, who are prepared to put their money where their mouth is. This is the appeal of prediction markets like Intrade.com: participants (who I call “Intraders”) effectively bet on the outcome, and the bet is backed not by the company, but by other Intraders. Their collective opinion on the likelihood of the different outcomes sets the market price, and if you think they’ve got it wrong you can put money on it.
Prediction markets for the 2008 election were recently discussed in electoralvote.com (referencing this discussion on electoralmap.net); his conclusion was that Intraders just follow the polls, and so you might as well just look at the polls directly. Is this the right conclusion?
I wasn’t entirely happy with the way the curve shown in the graph was chosen – I wanted a more direct way to show the relationship between polling and Intrade.com prices. So I’ve translated the market prices into a measure more amenable to calculation, which I call the Intrade.com PPF. First, I translate the prices into a Democratic victory probability by dividing the price for a Democratic market in each state by the sum of the prices for the Democratic and Republican markets in that state; this works around the fact that for various reasons these prices don’t quite add up to 100. Second, I feed this into Φ-1, the “percentage point function” of the normal distribution – so I’m assuming that Intraders are making a guess at the probability distribution of the eventual margin of victory, and that it’s normally distributed.
The advantage of manipulating the figures in this way is that we can then just fit a straight line to the numbers to see what that implies about what Intraders believe about the market.
I think the resulting graph shows that Intraders are strongly influenced by the polls, but that they are by no means the only influence on how they bet.
First, look at how far the points stray from the line. New Hampshire (NH) and California (CA) look about the same as far as the polls are concerned, but Intraders are much more confident of a Democratic victory in CA than they are in NH. This scatter is representative of all the non-polling data that the Intraders are bringing to bear in making their estimates.
Second, we can learn something from the line we’ve fitted. From where the line crosses the x-axis, we can conclude that Intraders think that Obama is going to lose a percentage point on average, nationally, compared to today’s polls. That’s not enough to lose the election, but it’s a significant shift; if it reflects a pro-Republican bias on the part of Intraders then there’s money to be made betting on Democrats there. And the slope of the line means they think the standard deviation of the difference between the polls and the final results will be around 15%.
Third, there’s an interesting S-shape visible in the graph. Our conversion from probabilties to PPF should have eliminated that, and left us with something closer to a straight line. I think this reflects psychological errors on the part of Intraders – they are happy to use guesswork when they polls even, follow the polls when they show wider margins, but when the polls show very wide margins they can’t quite buy it, and offer prices that would be more appropriate for a tighter race. I strongly suspect that this means one could make some money betting exactly according to the line fitting on this graph – ie betting on the Republicans for the points above the line, and on the Democrats for the points below it – and if I had money to spare I’d try it instead of writing about it here.
We’re using Mercurial here at LShift for much of our development work, now, and we’re finding it a great tool. We make heavy use of branches (“branch per bug”) for many projects, and this is also a pretty smooth experience. One issue that has come up is policy regarding merging the trunk (“default”) into any long-lived feature/bug branches: should you do it, or should you not?
My vote is that you should merge default into long-lived branches fairly regularly; otherwise, you have a big-bang, all-at-once nightmare of a merge looming ahead of you. If you do merge frequently, though, there’s one subtlety to be aware of:
hg diff is not history aware, so in order to get an accurate, focussed picture of all the changes that have been made on your long-lived branch, you need to do one of two things:
* either, merge default into your long-lived branch right before you merge the long-lived branch back into default, and run
hg diff after that’s complete; or
* (recommended) do a throw-away test-merge of the long-lived branch into default directly.
Imagine a history like this:
(2) (3) | | \ / V | (1)
… where (1) is an ancestral revision, (2) is the default branch, and (3) is the long-lived branch – let’s call it “foo”.
Given this history, running
hg update -C default (to make the working copy be the default branch, i.e. revision (2)) followed by
hg diff foo will give you a misleading diff – one that undoes the changes (1) to (2) before doing the changes from (1) to (3). This is almost certainly not what you want!
Instead, run a test merge, by
hg update -C default followed by
hg merge foo and then plain old
hg diff. Note that this modifies your working copy! You will need to revert (by
hg update -C default) if you decide the merge isn’t ready to be committed.
The output of
hg diff after the
hg merge shows a history-aware summary of the changes that the merge would introduce to your checked-out branch. It’s this history-awareness (“three-way merge”) that makes it so much superior to the history-unaware simple diff (“two-way merge”).
Yesterday I visited Osmosoft, the developers of TiddlyWiki, to chat about getting some DVCS-like functionality into TiddlyWiki. Jeremy mentioned in passing that TiddlyWiki is, if you squint, a slightly cheating kind of a Quine. It struck me this morning that TiddlyWiki has strong similarities to another famous almost-Quine, the Smalltalk system.
TiddlyWiki is a composition of
* some tiddlers, specially marked as plugins, containing source code permitting extension of the skeleton system; and
* other tiddlers containing wiki markup text.
The main difference I can see is that Smalltalk doesn’t carry so much of its VM around with its images: like Smalltalk’s reliance on an external VM being present, TiddlyWiki can rely on the browser being present for a big chunk of its VM, but has to carry the bootstrapping container code bundled with the stored tiddlers/live-objects. Also, TiddlyWiki instances are (these days!) constructed by a special assembly process from small, separate text files checked into Subversion, whereas Smalltalk images were not historically constructed from scratch very often at all. Finally, TiddlyWiki’s boot process is heavier than Smalltalk’s, because it’s forced by the browser to recompile all the sources in the system, where Smalltalk gets away with having bytecode in the image alongside the sources.
The Dorling Kindersley TravelDK site won the ‘Best Use of Technology’ award at the 2008 Travolution awards.
Technically designed and built by LShift, the site won the Best Use of Technology award in the Travel Information category, at the Travolution Awards this year. The awards recognise the best operators, agents, portals, digital marketers and suppliers in the travel industry. Travolution had a record number of entries which were scrutinised by a panel of judges from across the travel industry.
Judges said “DK has utilised cutting edge e-publishing techniques to produce a flexible and simple service for visitors to its website. It takes the in-resort guidebook to the next level.”
Launched last year, traveldk.com was awarded a 2007 TravelMole Web award for Best Website in the Holiday/Travel Extras category. The DK website attracts a growing number of unique visitors and continues to develop with more destinations and added features.
With Google maps, and thousands of images, reviews and ratings, the site enables travellers to add their own travel highlights and read those of others alongside recommendations from the DK Top 10 travel guide series. Uniquely also, DK can create customized guides for marketing partners.
“I’m delighted with this award”, says Georgina Atwell, DK Online Director. “We knew traveldk.com had to offer travellers something genuinely different, and giving the power to the community to create their own pdf or print-on-demand travel guide has proved an enormous success.”
The slides for the talk are here. They’re a work-in-progress – as I think of things, I’ll continue to update them.
You can try out a few browser-based demos of the features of the diff and DVCS libraries:
* a demo of diff, comm, and patch functionality.
* a demo of three-way merge and conflict-handling functionality.
The code is available using Mercurial by
hg clone http://hg.opensource.lshift.net/synchrotron/ (or by simply browsing to that URL and exploring from there). It’s quite small and (I hope) easily understood – at the time of writing,
* the diff/diff3 code and support utilities are ~310 lines; and
* the DVCS code is ~370 lines.
The core interfaces, algorithms and internal structures of the DVCS code seem quite usable to me. In order to get to an efficient DVCS from here, the issues of storage and network formats will have to be addressed. Fortunately, storage and network formats are only about efficiency, not about features or correctness, and so they can be addressed separately from the core system. It will also eventually be necessary to revisit the naive LCA-computation code I’ve written, which is used to select an ancestor for use in a merge.
The code is split into a few different files:
* The sources for the diff and diff3 demos and the DVCS demo. In the latter, check out the definition of
presets.preset1 for an example of how to use the DVCS, and
presets.ambiguousLCA for an example of the repository format and the use of the revision import feature.
* The diff and diff3 code itself.
* Graph utilities (for computing LCA etc)
* The DVCS and pseudo-file-system code.
* The repository history-graph-drawing code and a python script for drawing the little tile images used in rendering a repository history graph.
Clinton will probably drop out of the race in the next few days, so let’s give the diagram showing both of them one last airing. This looks at a month’s worth of polling data to give a picture of how their relative chance of victory has changed over time – it’s an animated GIF, so you’ll need to have GIF animation enabled in your browser.
She’s moved from being a percentage point below Obama to two percentage points ahead of him. What changed so much over the course of May? My guess is simply that people who aren’t natural Democratic voters are more likely to feel warmth to Clinton the further the nomination gets from her grasp, and we’d be seeing the exact opposite picture if it were Clinton who was expecting the concession call any day now.
Update: more commentary on this curious shift that seems to make a similar point.
You are currently browsing the LShift Ltd. blog archives for June, 2008.