I had hoped Nate Silver was going to announce explicitly that this was his final pre-election prediction, but less than three and a half hours to go before the first polls close, I think there’s not much time to make another one. I’ve updated the battleground chart with his predictions, and I’ll update it as polls are called until I fall asleep. Let me know if you find this useful – it’s certainly the only way I can tell what it means when they call a state!
I’ve fixed it to handle the fact that states are simply “called” on election night, with no estimate of the margin of victory given. Instead the original projection is shown. So if Ohio is the first state called, and it goes to Obama as Mr Silver currently predicts, the top of the map might look like this:
If however it goes to Romney, the bottom of the map will look like this:
I’ll try to keep the map up to date until I go to bed – hopefully the outcome of the election will be obvious by then!
Those guys at Arup are taking this seriously and unlike many others who postulate in this area they have the clout and the commercial imperative to influence decisions and make changes. Interesting times ahead…
I’d anticipated making this post within days of the election, but while the winner was known as soon as they called California, the result in Missouri has only been called in the last couple of days following a tight recount. In the end the state went to John McCain, a blow to the pride of the former “bellwether state” which has gone to the winner in every Presidential election in the last century except this one and 1956. So we are now ready to present the final chart for the 2008 US Presidential elections, including scattergrams that show how this year compares to 2004, and how the final results compare to the final projections from Nate Silver’s fivethirtyeight.com that we used through the night. Incidentally, Silver accurately predicted the winner in every state but one, Indiana, which went to Obama by less than a 1% margin.
This was useful on election night, but it was a lot less useful than I had hoped, because what I didn’t take into account is that states are “called” for one side or another long before any estimates of the final voting percentages are available. Next time around I shall re-design it to take that into account. For now, time to get busy on something to watch during the next UK general election!
I find the maps and charts that the TV networks provide nearly useless for understanding the state of play during an election night, so I’ve taken to designing my own diagrams. For tomorrow’s Presidential elections, I’ve turned the projections on fivethirtyeight.com into a graph which illustrates the likely outcome of the election and the paths to victory for the two candidates:
The x-axis represents the projected margin of victory – leftwards for Obama, rightwards for McCain. The y-axis represents electoral votes. The states are ordered by margin of victory.
From this graph you can immediately see that Obama is projected to take all the Kerry states by a margin of 6.9% or more, and the Bush states Iowa and New Mexico appear to be firmly in his pocket with projected margins of 11.0% and 8.6%. That puts him 5 EVs from the middle line – a draw – and thus 6 EVs from victory. So if Obama wins all of these states plus any other state with 6 EVs or more – or any two other states – he wins the election.
That’s useful for now, but what about during election night itself? You can see a chart that says something like Kerry is 10 EVs ahead of Bush, but that doesn’t help clarify which of them is really doing better – if they’ve called New York a whole load of New England states while a lot of Southern states are still waiting to announce, then the Democratic lead might be no more than you would expect, or it might even be less.
Here’s a fantasy scenario what I have lined up for the Presidential elections tomorrow [removed - the final result is now online]. When they call states, I’ll move them to the top or bottom of the graph as appropriate. The area inbetween the called states is the remaining battleground – and anyone who can win all the states up to and across the finish line can win the election.
I will also be maintaining a scattergram showing how the projections have done against reality, and a cartogram illustrating the electoral college.
This is tricky information to represent in a single graph, so any ideas for improvements will be gratefully received – thanks!
In my last post about this I observed an S-shape in the results of the polling data, and speculated that it might show psychological bias on the part of the Intraders. I’m not so sure now. This graph shows all polls in the last 30 days; recent polls are dark colours and older ones lighter, and the S-shape is much less visible. So it may simply be an artifact of the way we aggregate polling data to generate a single figure.
I’ve been concentrating on using polls to predict the outcome of the Presidential election here, but another alternative is to let someone else do it for you – or lots of people, who are prepared to put their money where their mouth is. This is the appeal of prediction markets like Intrade.com: participants (who I call “Intraders”) effectively bet on the outcome, and the bet is backed not by the company, but by other Intraders. Their collective opinion on the likelihood of the different outcomes sets the market price, and if you think they’ve got it wrong you can put money on it.
Prediction markets for the 2008 election were recently discussed in electoralvote.com (referencing this discussion on electoralmap.net); his conclusion was that Intraders just follow the polls, and so you might as well just look at the polls directly. Is this the right conclusion?
I wasn’t entirely happy with the way the curve shown in the graph was chosen – I wanted a more direct way to show the relationship between polling and Intrade.com prices. So I’ve translated the market prices into a measure more amenable to calculation, which I call the Intrade.com PPF. First, I translate the prices into a Democratic victory probability by dividing the price for a Democratic market in each state by the sum of the prices for the Democratic and Republican markets in that state; this works around the fact that for various reasons these prices don’t quite add up to 100. Second, I feed this into Φ-1, the “percentage point function” of the normal distribution – so I’m assuming that Intraders are making a guess at the probability distribution of the eventual margin of victory, and that it’s normally distributed.
The advantage of manipulating the figures in this way is that we can then just fit a straight line to the numbers to see what that implies about what Intraders believe about the market.
I think the resulting graph shows that Intraders are strongly influenced by the polls, but that they are by no means the only influence on how they bet.
First, look at how far the points stray from the line. New Hampshire (NH) and California (CA) look about the same as far as the polls are concerned, but Intraders are much more confident of a Democratic victory in CA than they are in NH. This scatter is representative of all the non-polling data that the Intraders are bringing to bear in making their estimates.
Second, we can learn something from the line we’ve fitted. From where the line crosses the x-axis, we can conclude that Intraders think that Obama is going to lose a percentage point on average, nationally, compared to today’s polls. That’s not enough to lose the election, but it’s a significant shift; if it reflects a pro-Republican bias on the part of Intraders then there’s money to be made betting on Democrats there. And the slope of the line means they think the standard deviation of the difference between the polls and the final results will be around 15%.
Third, there’s an interesting S-shape visible in the graph. Our conversion from probabilties to PPF should have eliminated that, and left us with something closer to a straight line. I think this reflects psychological errors on the part of Intraders – they are happy to use guesswork when they polls even, follow the polls when they show wider margins, but when the polls show very wide margins they can’t quite buy it, and offer prices that would be more appropriate for a tighter race. I strongly suspect that this means one could make some money betting exactly according to the line fitting on this graph – ie betting on the Republicans for the points above the line, and on the Democrats for the points below it – and if I had money to spare I’d try it instead of writing about it here.
Clinton will probably drop out of the race in the next few days, so let’s give the diagram showing both of them one last airing. This looks at a month’s worth of polling data to give a picture of how their relative chance of victory has changed over time – it’s an animated GIF, so you’ll need to have GIF animation enabled in your browser.
She’s moved from being a percentage point below Obama to two percentage points ahead of him. What changed so much over the course of May? My guess is simply that people who aren’t natural Democratic voters are more likely to feel warmth to Clinton the further the nomination gets from her grasp, and we’d be seeing the exact opposite picture if it were Clinton who was expecting the concession call any day now.
Update: more commentary on this curious shift that seems to make a similar point.
Here’s the graph of polling error against probability of victory we saw before, updated for the latest polling data:
So if you assume that the polls will be out by 5% or less, then Obama’s chances of victory against McCain are over 60%, but they dip under 60% when you assume a larger polling error.
However, that accounts only for random error in the polls; we assume that each state’s final tally will differ from today’s polls by a normally distributed value which is totally independent in each state. In reality, there is likely to be a systematic, country-wide change in his standing against McCain between now and voting.
Suppose he gains one percentage point countrywide. We can simulate that by just adding 1% to his margin in each poll and re-running the test:
With 1% extra his chance of victory is now between 60% and 80%, depending on how big you think the random error is. Now let’s take off 1%:
and we see that if he loses 1% he has less than a 50% chance of winning the election. Finally let’s bracket that with estimates for +2% and -2%:
Here’s the same picture, going by Clinton’s poll results against McCain:
which shows that if we assume a 5% error in the polls (a rather low estimate) then Clinton can lose 2% of her popularity nationally and still be virtually guaranteed victory.
Here are the two superimposed:
We see that if we assume a 10% error in the polls, then Clinton can lose 2% of her national popularity and still be ahead of where Obama is now.
No-one is talking about Clinton’s apparent huge advantage over Obama in the Electoral College. Is there some systematic error in the polling? Are Rush Limbaugh fans claiming to be Clinton supporters, as per his encouragement, to throw the Democrats into confusion? Or should the few remaining uncommited supers be throwing themselves behind the losing nomination candidate in a desperate bid to keep the one who will guarantee them victory in November?
Given state-by-state polling of a given contest for the Presidency – Obama v McCain, say – it’s easy to put together who is predicted to win the contest according to the polls. Work out who wins each state according to the poll, add up the electoral votes, and whoever gets the most is the winner.
However, this could be quite a misleading impression. Polling numbers are not 100% accurate. Among other factors, there is statistical error in the polling process itself, and in the process of handling sampling biases; there is the unpredictability of turnout; and there is the straightforward change in voter intent between polling time and voting time. If all of the apparent winner’s states are won by tiny margins, while all of the loser’s states are substantial wins, then a proper analysis should show that the polling favours them more than a simple win/lose analysis might show. How can we take that into account?
Simulated elections are one approach. To start with, we assume that every poll has an unknown error term, which is normally distributed with a mean of zero and a standard deviation of x%. Then we run thousands of simulated elections. In each election, we guess the noise term by generating an appropriate random variable, and subtract this from the poll numbers. Then as before we give each state to the apparent winner and add up who gets the most EVs. By taking thousands of simulated elections, we can obtain a reasonable estimate of the probability that a given candidate will win given the poll data available to us.
There’s an “x” in the above paragraph; how much error should we assume in the polls? The only way to get a proper answer to this question would be to examine historical data; for the moment I’m punting on this and just looking at the different answers you can get from different estimates of the noise term.
The two lines cross around the 11% mark, so if you think that the final voting is likely to vary by 11% or more then the model shows Obama having a slight advantage, while for numbers less than that Clinton’s advantage is clear.
A few days ago, this graph was showing a crossover from Clinton to Obama at around the 6% mark, meaning that if you thought that today’s polls were likely to be out by 6% or more, you should vote for Obama. However, new polls in Michigan, Oregon, and Texas have changed the picture: both Oregon and Texas are now better for both candidates, while Michigan has moved from a win for Obama and a loss for Clinton to the exact opposite.
The moral of this story is probably that it is far too early to guess who will win the Presidential election. But it’s fun to try.
You are currently browsing the archives for the Politics category.