The Dim-Post

May 11, 2012

Calling all R programmers

Filed under: polls — danylmc @ 8:46 am

Wikipedia has a useful little resource: opinion polls for New Zealand politics dating back to 2005. The poll results are in different pages according to which election they preceded. They also have charts showing the aggregated results of these polls, generated by R scripts written by the Wiki authors. There’s a sample of one of these scripts here.

What I’d like – what I think a lot of people would like – is a script that compiled all of the polls into one big table, and then let you specify the date range for the chart you wanted to generate. I can’t program in R and I don’t have time to learn right now, so if anyone out there has a little R experience and a spare hour or two then I’d really like to hear from you. I’ll put some pseudo-code on how I’d like the script to work over the break.

If we can get this working I’ll make a permanent page on the site and post the code with some instructions. I think it would be useful to researchers, as well as politics geeks like me.

#date variables that can be modified

daterangestart = 01/01/2005

daterangeend = today

#build an array of html results; array should be dynamic so you can add future wiki pages for future elections

html[0] = http://wwww.wikipedia.com/2005election.html

html[1] = http://wwww.wikipedia.com/2006election.html

etc

loop through the array, find the tables as per the current script linked above and aggregate them all into one matrix, checking to see that the top and tails aren’t duplicates. Add a column that indicates if an entry is a general election result instead of a poll

create the chart and dynamically allocate tick marks appropriate to the date range variables given above

now populate the chart based on date range variables, and apply smoothing to the data series, and indicate CI as per the script linked above.

add black bar down the y axis every time there’s a general election entry

About these ads

26 Comments »

  1. I’ll leave the cleaning up of the table to someone else, but just grabbing the polling data is easy enough:

    library(XML)
    polls_url <- "http://en.wikipedia.org/wiki/Opinion_polling_for_the_New_Zealand_general_election,_2005&quot;
    tabs <- readHTMLTable(polls_url)
    polls_messy <- as.data.frame(tabs[[2]])

    Comment by David Winter — May 11, 2012 @ 9:24 am

  2. Are programmers at R already? I lost interest when they got to C.

    Comment by insider — May 11, 2012 @ 10:42 am

  3. #2: If it wasn’t for C, we’d be programming in Basi, Pasal, and Obol. If it wasn’t for R, we’d still be using Fotan and Javascipt.

    Comment by deepred — May 11, 2012 @ 11:20 am

  4. @ DR

    Did you mean FORTRAN or were you making a subtle joke about utilising Chinese coders in Hong Kong?

    Comment by Gregor W — May 11, 2012 @ 11:24 am

  5. I take it back….

    Comment by Gregor W — May 11, 2012 @ 11:26 am

  6. Is it program like a pirate day?

    Comment by Sanctuary — May 11, 2012 @ 12:08 pm

  7. $10 to whoever can do it in brainfuck.

    Comment by Chris Bull — May 11, 2012 @ 12:43 pm

  8. The question is though, are these polls always fair and accurate?

    Comment by alex — May 11, 2012 @ 1:38 pm

  9. @alex

    either way title(main=”This is bad for Phil Goff”, col.main=”552″)

    Comment by Gregor W — May 11, 2012 @ 2:10 pm

  10. You know threads like this are a negative readership growth factor.

    Comment by merv — May 11, 2012 @ 4:30 pm

  11. This is a sweet idea and I can’t see why it would be bad for readership.

    Heaven forbid the DimPost starts taking a ratings-based approach to content – just look at what’s happened to TV1.

    Comment by Rob — May 11, 2012 @ 4:54 pm

  12. There’s such a thing as narrowing your target market too much & this is such a thing.

    Comment by merv — May 11, 2012 @ 4:57 pm

  13. on the slim chance that non-geeks are still with us, can i just please point out that

    1. the software being discussed here was developed in Auckland: http://en.wikipedia.org/wiki/R_(programming_language).

    2. this is open source software, so despite what you might think from the (thin gruel) information about TPP negotiations, lots of normal people create really valuable intellectual property for reasons other than profit. So limits on IP protection need not stop the creation of valuable new IP.

    Comment by jps — May 11, 2012 @ 5:09 pm

  14. Go have a cry about it?

    Comment by Rob — May 11, 2012 @ 5:18 pm

  15. Personally I think my pirate joke has got legs in it yet.

    Comment by Sanctuary — May 11, 2012 @ 7:02 pm

  16. Personally I think my pirate joke has got legs in it yet.

    If you have a day job, Sanc, keep at it

    Comment by Paul Rowe — May 11, 2012 @ 7:53 pm

  17. ‘Job’

    Comment by Wayne Wanker — May 11, 2012 @ 10:53 pm

  18. I appreciated it Sanc.

    But I would say it had peg legs…..

    Comment by Gregor W — May 11, 2012 @ 11:23 pm

  19. Don’t speak code… but shit this would be great to replicate …http://www.spiegel.de/flash/0,5532,21034,00.html

    Comment by Niko (@nikoelsen) — May 12, 2012 @ 12:26 pm

  20. +1 Niko and cheers for the link; now that’s web 2.0

    Screw the target market (disbelieving there ever was one) as disappointingly dry as grounds-keeper danyl’s latest was much of the ensuing diatribe (excl sour puss merv) is gold.

    ..so how’s the progress on that code peeps? maybe the crew at statschat.org.nz could be courted to assist..

    Comment by Luke — May 12, 2012 @ 4:17 pm

  21. I actually thought someone else would do this, goddammit.

    First attempt (haven’t tried to make anything look nice yet):

    http://bradluen.posterous.com/using-wikipedias-nz-election-data

    Comment by bradluen — May 13, 2012 @ 9:55 am

  22. and here’s a link to the data since the 2005 election as a Google doc for spreadsheet tragics

    https://docs.google.com/spreadsheet/ccc?key=0AlnSVyi_b7gRdGJ3TTNMMjZLeUtTMUl6bk5LQ19URXc

    Comment by bradluen — May 13, 2012 @ 10:20 am

  23. Nice job Brad, I had started making a spreadsheet too.. in a slightly different format since I was just sticking it into Google Visualisations for some testing (and nothing squirting it through Refine can’t fix). This visualisation doesn’t have the nice proper stats graphs that everyone wants, just the usual boring line chart as with the Der Spiegel chart Niko linked to. I’ve only done up to 2008 since I realised the trend lines and deviation shadows etc. are probably the interesting bit, and I can’t do those with my noddy web 2.0 widgets, so I was probably wasting my time.. I’ll do it up to 2012 if people want it though.. . Chart: http://www.anonymous.org.nz/poll2.html Spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0AomjDPsnlEtJdFBIQk1hOWNLTzdkV0dKYUFLbGFqdkE#gid=0 (public view, I’ll give edit permissions to anyone who wants to do stuff)

    -k

    Comment by Kim Shepherd (@kimshepherd) — May 14, 2012 @ 8:36 am

  24. That looks awesome Brad. Thanks.

    Comment by danylmc — May 14, 2012 @ 8:47 am

  25. This must be the occasional post in a series about what we want and don’t want from this blog (the last being where the political divide was united against United’s Mr PG). Could we have more pirate references please? Kitty photos would be tacky, though.

    Comment by Clunking Fist — May 14, 2012 @ 1:10 pm

  26. Calling all R programmers! Unite! You have nothing left to lose except your Command Line Interface!

    (PS I would have used the TLA “CLI”, but I didn’t want to be perceived as being someone who has R on his Linux partition as well as a Win XP partition)

    Comment by theecanmole@gmail.com — May 15, 2012 @ 11:39 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 336 other followers

%d bloggers like this: