The Dim-Post

May 14, 2012

Big Picture

Filed under: polls — danylmc @ 8:58 am

Thanks to Bradluen who came through with the R code I begged for in my previous post. It’s available here. He comments:

Currently the lines don’t go through the actual election results. However, just upweighting the election results in the Loess gives weird results. In fact I kind of think Loess isn’t quite right here. Use another smoother?

Here’s the resulting chart from the Wiki polling data showing aggregated poll results since the 2005 General Election:

The black lines are election dates. Why is Labour trending down, even though most of their post-election polls have been improvements on their election result? My guess is all those low poll results just before the election are weighing the distribution. Should the polls be weighted by date? Update: changed the parameters of the script to generate a slightly different graph.


  1. Trying to get my head around this. These are actual poll results but should not the election be in there somewhere it is a poll. If we look at what I presume is NZ First they do not seem to ever get over the 5% yet they did as we know.

    Comment by Ron Wilson — May 14, 2012 @ 9:06 am

  2. Welcome to the world of statistical analysis. The election results are in there, but they’re just unweighted data points.

    Maybe a better way to do it would be to calculate a new curve for each post-election period, starting with the actual results? Or maybe Brad is right and some other regression method is the way to go?

    Comment by danylmc — May 14, 2012 @ 9:27 am

  3. I am not in any way an expert on this, but I believe the Loess filter has an adjustable window-width parameter, and the the default in R is to use one based on the data length, which tends to over-smooth

    Comment by Mark — May 14, 2012 @ 9:37 am

  4. Do the polling questions tend to change just before and after the election? If it goes from “In the election to be held next week…” to “In the election to be held in three years…” or “If an election were to be held tomorrow…”, doesn’t that make each pre-election period a discrete series?

    Comment by James Butler (@j20r) — May 14, 2012 @ 9:46 am

  5. I am beginning to have a lot of sympathy for countries than ban public polling for a period prior to elections. There seems to be a lot of research that says otherwise.

    Comment by Ron Wilson — May 14, 2012 @ 10:02 am

  6. Weighting probably makes sense for pre-election data.
    Calculating the curves intra-period might also make the trend lines more attractively sinusoidal.

    Also, would it be informative to note what significant events other than elections are likely to have triggered clusters (e.g. mid 2007)?

    Comment by Gregor W — May 14, 2012 @ 10:06 am

  7. The question is always, from my experience, “if the election were held tomorrow”.

    Comment by Graeme Edgeler — May 14, 2012 @ 10:16 am

  8. are the election results percentages of enroled voters or of voter turnout?

    Comment by NeilM — May 14, 2012 @ 10:19 am

  9. My 2c – If you want loess to me more reactive just lessen the span. I wouldn’t include the election results in the regression at all – polls and elections are different (the number of non-voters is usually greater than the number of undecideds…). The polls are an estimator, you have to use your judgement to know how they’ll differ from election day results.

    (Oh yeah, if you do the plots in ggplot2 it will make it stupidly easy to do, say, a panel plot with different fits for each pollster or change the span of the loess between plots)

    Comment by David Winter — May 14, 2012 @ 10:29 am

  10. A chart that shows the slow but inevitable decline of socialism as the populace gradually wakes up to the fact that they’ve been sold a gold brick by cheap and nasty conmen.

    Comment by Redbaiter — May 14, 2012 @ 10:30 am

  11. Dam I thought what it was showing was the gradual decline of intelligence by the voting public

    Comment by Ron Wilson — May 14, 2012 @ 10:36 am

  12. You’re right, Redbaiter!

    Did you notice it also shows a direct correlation between piracy off the horn of Africa and Green party support?

    Comment by Gregor W — May 14, 2012 @ 10:36 am

  13. “If you want loess to me more reactive just lessen the span.”

    I just tried that! The chart is here. It seems more plausible.

    Comment by danylmc — May 14, 2012 @ 10:39 am

  14. @ Redbaiter – That only holds true if you believe Labour are still a socialist party.

    Comment by alex — May 14, 2012 @ 10:39 am

  15. We’re spending $80 billion a yesr on government and we’re not socialist??

    For fucks sake get an education.

    (don’t ask me where)

    Comment by Redbaiter — May 14, 2012 @ 10:47 am

  16. the really exciting bit is that we are going to see National swallowing a dead rat and working to get Winston on board for 2014 election otherwise they are out. Already Key is softening the language

    Comment by Ron Wilson — May 14, 2012 @ 10:52 am

  17. The Greens at 14% LOLLY LOLLY LOL.

    Comment by merv — May 14, 2012 @ 11:07 am

  18. It’s interesting that during the 2008 election, Labour’s support rose (although not enough) but during the 2011 election, it fell.

    Comment by Hugh — May 14, 2012 @ 11:39 am

  19. “correlation between piracy off the horn of Africa…”
    Maths, pirates AND sex: watch those readership stats rise.

    Comment by Clunking Fist — May 14, 2012 @ 1:14 pm

  20. You’re right RedFrother! We should abandon socialist education forthwith, and school ourselves in the philosophy of freedom at the Ayn Rand Libertarian University of Non-Socialism. Or is that too organised to be truly free?

    Perhaps we should homeschool ourselves through that doctorate in Nuclear Physics (which is what Osama was trying of course, proving conclusively his links to Obama – whose name is still suspiciously similar – and the nanny statists, not to mention the black helicopters in silent hover mode).

    With apologies to homeschoolers, and vegetarians (for throwing the red meat to the troll).

    On topic – why weight ‘election period’ polls, or the election itself? Surely if the election period polls are skewed (vis true public opinion), that will show up as a regular election-time dip (or bump)? Though David W is right that election results are substantively different to the polls…

    Comment by bob — May 14, 2012 @ 1:24 pm

  21. It’s interesting to replace the lines denoting election dates with lines marking major scandals (Winston Peters, Richard Worth, Darren Hughes ect) and see what happens to the polls. Scandals DO seem to be a major driver of poll change, but only if they play out over a longish time.

    Comment by danylmc — May 14, 2012 @ 2:41 pm

  22. There are a couple of different things we could be trying to represent with the best-fit lines:

    – the average of recent polls
    – what would happen if an election were to be held tomorrow

    These aren’t quite the same thing. For the former, the easiest thing to do is to just average the last poll from each agency, and it’s kind of arbitrary whether you include elections or not.

    For the latter, you really do need the lines to go through the election results. Also, the lines after an election should be heavily influenced by the election results, and not at all by the polls before the election (which are now irrelevant), so fitting separate curves between each pair of elections would seem right. If this is what we’re trying to show, then the confidence intervals are way too narrow.

    Another issue: I don’t think the best amount of smoothing to use is constant — public opinion is more volatile close to elections.

    Comment by bradluen — May 14, 2012 @ 2:49 pm

  23. Every time you screen-scrape, god kills a kitten. I’ve written a python script that parses the results from the page source grabbed from the Wikipedia API and outputs them as csv. Much cleaner I think: Needs the mwlib python package to be installed.

    Comment by TrouserPython — May 14, 2012 @ 2:51 pm

  24. Scandals DO seem to be a major driver of poll change, but only if they play out over a longish time.

    I wonder if it reflect some form of ‘aggregate loathing index’ inasmuch as people don’t care about the scandal as such, but it contributes to their stored impression of the party affected.

    Also, might the length of the scandal induce some sort of increase in partisan bias; more party supporters irrationally leap to defend the indefensible while opponents get more frothy – either in a quantitative sense that polls get more respondents for scandal/hot button issues, and qualitatively that the response is more extreme.

    Comment by Gregor W — May 14, 2012 @ 3:00 pm

  25. sorry for italics chaps

    Comment by Gregor W — May 14, 2012 @ 3:01 pm

  26. A trend line which has National’s current support at below the level of any of the last five polls suggests to me that you may have over-corrected.

    Comment by Graeme Edgeler — May 14, 2012 @ 3:40 pm

  27. Scandals DO seem to be a major driver of poll change, but only if they play out over a longish time.

    Evidence from US polling (I’m thinking specifically of the great work done by Nate Silver’s – who i’m sure would be able to offer some suggestions) indicates ‘scandals’ as we of beltway disposition think of them, actually have very little effect over, which fades quickly.
    Equally, the ‘bounce’ from good news (eg; Bin Laden’s death on Obama’s approval) doesn’t last long.

    Comment by Phil — May 14, 2012 @ 4:35 pm

  28. I think I’ll die laughing. This mess proves only that none of you has a clue about anything.

    Comment by Adolf Fiinkensein — May 14, 2012 @ 6:42 pm

  29. TrouserPython @23, God doesn’t exist, so it must be Microsoft killing the kittens.

    Comment by Clunking Fist — May 14, 2012 @ 6:56 pm

  30. things will become a lot clearer soon once we all have smart phones and up date our voter preference minute by minute.

    Comment by NeilM — May 14, 2012 @ 10:46 pm

  31. Great graph – thanks Danyl and Bradluen.

    I think the election results should play no role in this calculation at all. They are a kind of poll that is more methodologically different from the other polls than the other polls are from each other, making them likely to be the outlier. They are also not frequent enough for averaging to cancel out this effect. Therefore I would leave them out of the calculation of the lines, and instead put them on as discrete points, so that you can see how the opinion polls (represented by the lines) relate to the election results they are a prediction for.

    If election results are an outlier and you include them in the calculation, you will not see how opinion polls relate to election results, because the inclusion of the election result in the calculation will make the lines veer towards the election result at election time, disguising the inconsistency between predicted and actual as a veering of the line at election time.

    Comment by kahikatea — May 19, 2012 @ 12:45 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at

%d bloggers like this: