New article published with Taha Yasseri in IT – Information Technology. A short piece making the case for theoretically informed social media predictions, which is part of a larger project we are running with support from the Fell Fund over the next year or so. Read it here: http://www.degruyter.com/view/j/itit.2014.56.issue-5/itit-2014-1046/itit-2014-1046.xml?format=INT
I’ve recently been appointed as an editor of the OII’s journal Policy and Internet, which recently celebrated its fifth anniversary. There are major plans underway to develop and expand the journal so it’s an exciting project to be joining,
September 18, 2014 at 1:39 pm · Filed under Social Web
My colleague Taha Yasseri and I are currently working on a Fell Fund project on social media data and election prediction, looking especially at data from Google and Wikipedia (first paper out soon; will also be presenting on that at IPP 2014 which should be great). As part of that we thought we’d have a bit of fun looking at Scotland’s independence referendum on Wikipedia.
For election prediction the method is relatively straightforward: examine readership stats on the party Wikipedia pages of the country in question, and see which page is read the most (of course that doesn’t correspond straight away to election results – would that life were so simple – and the idea of the project is to see what corrections and biases need to be accounted for to make it work). It isn’t quite so clear how to do that for Scotland, but (just for fun really) we compared the following pages:
First we look at the UK and Scotland -> interesting how Scotland has leapfrogged the UK in the last days of the independence campaign. Points to a yes victory?
In terms of flags, though, the Union Jack is well ahead of the Saltire, peaking in the last few days. Is it a last minute outbreak of unionism?
In terms of national dishes, meanwhile, Haggis has been dominating Fish and Chips for the full period of the campaign, with interest in Haggis especially spiking in the last couple of days.
Well, one of these graphs will predict the winner of the referendum: we just don’t know which one ;-) More seriously, I think its interesting how most of these terms are spiking in the days before the vote, showing again how the social web really responds to political events.
UPDATE: Taha has passed me the comparison of the Yes and No campaign pages, as below. Yes for a narrow win following months of No dominance – you heard it here first.
Last week I was at the VOXPOL conference @ King’s College London. Vast majority of researchers were talking about terrorists and extremists, so I was a bit out of my field; though interestingly they were also all talking about big data and computational social science, which seems to be a staple in every social science conference these days. Ongoing debate about whether we need more teams of social scientists + computer scientists, or whether social scientists need to up their computing skills. I think both approaches are fine in the short term but in the long run social scientists need to skill up, as computer scientists won’t always be interested in our questions (we will want to use automatic content analysis in social science long after it becomes a boring topic in computer science, in the same way as we are still using the t test).
I gave a presentation on the relationship between ideology and social structure on twitter, arguing that political groups at the ideological extremes are more likely to exhibit closed and centralising communication patterns than those in the middle, which is an early result from a join project between myself, Diego Garzia and Alex Trechsel. The main point of the presentation was to discuss different ways of measuring closure and centralisation, which I’m still not sure about. Luckily most of our measures point in a similar direction, so I’m pretty sure there’s an interesting result in there somewhere.
Recently I have been combining the two packages to create maps of events happening around the world. The size and colour of the marker on the map can be varied, meaning it’s possible to fit quite a lot of information in one graphic.
However one thing I really struggled with was the legend. If you use different colour points matplotlib makes it easy to add a colour bar, with something like:
c = plt.colorbar(orientation='vertical', shrink = 0.5)
Shrink gives you a quick way of adjusting the size of the bar relative to the graphic.
However I couldn’t find an equivalent command which would give me a legend for the size of the points (something that ggplot does easily). After fiddling with
get_label() for ages, and trying to capture and do something useful with the results of
plt.scatter(), I finally came across this useful post, which basically says that this feature doesn’t really exist and if you want such a legend you have to make it yourself. However, the trick to doing it is quite simple – draw three or four points on your plot with location set to , , (so they won’t actually show up), each one representing a certain size in your scale. These points can then be passed to
plt.legend with some hand written labels. Overall it looks something like this:
l1 = plt.scatter(,, s=10, edgecolors='none')
l2 = plt.scatter(,, s=50, edgecolors='none')
l3 = plt.scatter(,, s=100, edgecolors='none')
l4 = plt.scatter(,, s=200, edgecolors='none')
labels = ["10", "50", "100", "200"]
leg = plt.legend([l1, l2, l3, l4], labels, ncol=4, frameon=True, fontsize=12,
handlelength=2, loc = 8, borderpad = 1.8,
handletextpad=1, title='My Title', scatterpoints = 1)
Well, I still think that should be easier, but at least it works and it also gives you a lot of flexibility with what goes on the legend.