Tuesday, October 28, 2008

A Simple Election Simulation

As of today, Real Clear Politics lists Pennsylvania as "Strong Obama" with a +10.8 spread (51.4 to 40.6). However, those two percentages only add up to 92%, leaving 8% independents. If all of the independents go for McCain, then the race is much closer, 51.4 to 48.6. Recently there was an article in Salon arguing that independents will, in fact, break heavily for McCain. The article cites poll numbers and voting outcomes from a few recent elections to argue that for African American candidates, "what you see is what you get." In other words, if Obama is polling at 51.4% in Pennsylvania, then on election day he'll get very close to 51.4% of the vote. Hence, Obama needs to stay above 50% in key state polls, regardless of McCain's numbers.

If we take this perspective, then the race is much closer than the spread numbers would seem to indicate.

I did a quick simulation with numbers from Real Clear Politics and Pollster.com. Here's what I did and how it turned out.

I simulated the national election 10,000 times. For each iteration, I went through the 50 states (plus the District of Columbia) one by one, and for each state, I had the computer flip a biased coin to see which candidate would win the state's electoral votes on November 4. To bias each state's coin, I used a gaussian distribution with a mean equal to Obama's polling percentage and a standard deviation of 3 percentage points. So for example, in Pennsylvania Obama's chance of victory was governed by a gaussian distribution of mean 0.514 and standard deviation 0.03. If the number drawn from this distribution was greater than 0.5, I put the state in Obama's column; otherwise, McCain's.

The results were sobering. Obama wins the election in only 70% of my simulations.




***

Based on the simulations, state priorities for the two candidates emerge as follows.

Obama's Priorities. I identified these as all of the states that participate in more than 70% of Obama victories in the simulation and yet have poll numbers below 52% for Obama. New Mexico is not here because it only participates in 62% of Obama victories, Ohio is not here because it only participates in 59% of Obama victories, and Florida is not here because it only participates in 30% of Obama victories.

The first number is the percent of Obama victories in which the state goes for Obama. The second number is Obama's poll number for the state.

PA, 81.5, 51.7
IA, 80.3, 52.3
VA, 76.5, 51.7
WI, 76.5, 51.8
MN, 71.9, 51.3
NH, 71.2, 51.5
CO, 70.7, 51.3

These numbers may explain why Obama is circling back to Iowa this week.

McCain's Priorities. I identified these as all of the states which participate in more than 30% of McCain's victories in the simulation and yet have poll numbers below 52% for McCain (defined here as poll numbers above 48% for Obama). Iowa is not here because it only participates in 29% of McCain victories (just below the arbitrary threshhold I set), and Georgia is not here because Obama is only polling at 44.7% in Georgia (and 100-44.7 = 55.3, a comfortable victory for McCain). Florida is not here because Obama is only polling at 47.7%, which puts McCain above 52% according to the "what you see is what you get" thesis.

The first number is the percent of McCain victories in which the state goes for McCain. The second number is McCain's poll number for the state (here defined as 100 minus Obama's poll number).

NC 81.9 51.2
OH 76.8 50.1
RI 76.7 51.8
NV 67.3 51
PA 54.1 48.3
NM 45.9 49.3
MN 44.0 48.7
CO 42.6 48.7
VA 42.6 48.3
WI 38.2 48.2
NH 34.1 48.5

The true "battlegrounds" - the states on both candidates' lists - are:

Pennsylvania
Virginia
Wisconsin
Minnesota
New Hampshire
Colorado

The output for each state is shown below.



***

SportsBook.com is giving odds of 4-1 for a bet on John McCain, i.e. a 20% chance of winning. Since my simulation gives John McCain a 30% chance of winning, I figure McCain is undervalued by the betting market. Maybe I should put ten bucks on the old gambler to win.

***

Nothing has been proved here, of course. The whole thing is much too complicated to compute. The only thing to do is fight until the end.

Even mathematically, within the assumptions of the model, the 70% figure is sensitive to the assumption of a 3 point margin of error in the polls. Here is the sensitivity analysis:



So my result would agree with the oddsmakers if I had assumed a margin of error of 2 points, not 3. To me that sounds like cutting it fine. The reason I chose 3 points is that many of the pools have about 1000 respondents, which makes the noise level (1 over the square root) about 3 percentage points. And systematic errors are bound to be at least this large.

3 comments:

tim said...

Jason,
Thanks for this. Very thought provoking and a clear explanation of the stats.

Tim
(one of your former 7A students)

Jilly said...

http://blackboxvoting.org/

danimal said...

Jason,

Nice work there -- it is too complicated for a mathematical model to capture as you say, but at least your working hypothesis (WYSIWYG for black candidates) has some empirical support.

I think there is a possibility that Obama has captured the public imagination in a way different than (most) previous black candidates.

In any case, I am nervous too, as I finish up teaching my 2nd lab period of the day. I did donate some to Obama (although not nearly the commitment you made).

Some students took their lab data very efficiently and went to vote. Here at Berry, I am not sure if students will support Obama or McCain more strongly, as we're in a conservative area of the country (and of Georgia). Of course, that's none of my business really.

I'll keep my eye on the "real battleground" states your model identified, and also whether Obama's percentages track his polling numbers closely (more closely than other candidates?)