technology from back to front

Monte Carlo model for Presidential elections

Given state-by-state polling of a given contest for the Presidency – Obama v McCain, say – it’s easy to put together who is predicted to win the contest according to the polls. Work out who wins each state according to the poll, add up the electoral votes, and whoever gets the most is the winner.

However, this could be quite a misleading impression. Polling numbers are not 100% accurate. Among other factors, there is statistical error in the polling process itself, and in the process of handling sampling biases; there is the unpredictability of turnout; and there is the straightforward change in voter intent between polling time and voting time. If all of the apparent winner’s states are won by tiny margins, while all of the loser’s states are substantial wins, then a proper analysis should show that the polling favours them more than a simple win/lose analysis might show. How can we take that into account?

Simulated elections are one approach. To start with, we assume that every poll has an unknown error term, which is normally distributed with a mean of zero and a standard deviation of x%. Then we run thousands of simulated elections. In each election, we guess the noise term by generating an appropriate random variable, and subtract this from the poll numbers. Then as before we give each state to the apparent winner and add up who gets the most EVs. By taking thousands of simulated elections, we can obtain a reasonable estimate of the probability that a given candidate will win given the poll data available to us.

There’s an “x” in the above paragraph; how much error should we assume in the polls? The only way to get a proper answer to this question would be to examine historical data; for the moment I’m punting on this and just looking at the different answers you can get from different estimates of the noise term.

The two lines cross around the 11% mark, so if you think that the final voting is likely to vary by 11% or more then the model shows Obama having a slight advantage, while for numbers less than that Clinton’s advantage is clear.

A few days ago, this graph was showing a crossover from Clinton to Obama at around the 6% mark, meaning that if you thought that today’s polls were likely to be out by 6% or more, you should vote for Obama. However, new polls in Michigan, Oregon, and Texas have changed the picture: both Oregon and Texas are now better for both candidates, while Michigan has moved from a win for Obama and a loss for Clinton to the exact opposite.

The moral of this story is probably that it is far too early to guess who will win the Presidential election. But it’s fun to try.

Data:, May 12. Tip of the hat to pylab for by far the nicest way I’ve found so far to generate graphs for publication.

Paul Crowley

× two = 4

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us