CAUTION: THIS POST CONTAINS A LOT OF SWEARWORDS. BECAUSE I ANALYSED SWEARWORDS. BUT ALSO JUST BECAUSE.
It’s twenty four hours since the Conservatives won a remarkable majority in the 2019 General Election. There has already been an avalanche of nuanced (and not so nuanced) debate about quite whether Johnson won, or Corbyn lost, about whether Brexit is The Beginning or The End, and on, and on, and on.
This post is going to deal with exactly none of that stuff. If you want serious business, go stick your face in Politico or the FT. (I like the FT, by the way. This isn’t a dig at them.) If what you’d like right now is something lighter, but still surprisingly informative (er, maybe) then I present here a range of probably useless, but possibly interesting facts about twenty-four hours of Twitter data gathered during the final critical hours of the 2019 UK General Election. This blog post tells you all about the top devices, videos, pictures, links, names, places, issues, emotions, and swearwords, pretty much in that order, so if you came here for all the fucks, just scroll waaaaay down to the bottom. Similarly, if you want to just skip down to the start of the fun stuff, click here. Otherwise, if you appreciate a little data salad and caution sauce with your result reuben, then keep reading.
I used my software, FireAnt, to collect all the tweets sent from 10pm on the 12th of December (the time when the polling stations formally closed) through to 10pm on Friday the 13th of December. Historically, directly after the ballot boxes close, a final exit poll is published and this has usually been a reasonable indicator of the probable outcome. Not always, but often enough that many people see it as a pretty good indicator of whether to go to bed already and cry under the covers or stay up and party. Thereafter, the results from the 650 constituency ballot boxes are announced all through the night, and people tend to wait feverishly for those constituencies held by the big names – party leaders, Cabinet Ministers, and so forth.
In my trial attempt the night before, I tried collecting tweets containing all the various hashtag permutations I could think of – generalelection, generalelection19, ukelection19, ukgeneralelection2019, ge19, ge2019, ukge19, etc. etc. etc. It turns out that (a) I am extremely unimaginative when it comes to how many different hashtags people can come up with independently, (b) accidental and deliberate trending hashtags with typos are a thing (ge2109, generalerection), and (c) holy fuckballs that would have been a lot of data. Twenty-eight minutes into the first attempt, my computer crashed and I already had something like 177,000 tweets, which translates to over a gigabyte of data. Multiply that by forty-eight for the whole day and it would have been a dataset of something like 8.5 million tweets. I have absolutely no objection to handling such dataset and have far bigger ones already, but they have usually been collected over weeks, and if the speed of the download is going to crash the software five minutes in, there is just no point running the risk of killing the whole endeavour purely through greed.
I therefore decided to pick just one hashtag, and I worked on the probability that people might favour a shorter one to leave more space for the tweet itself. On that basis, I went with just GE19. It probably actually wasn’t the most popular in the end but I don’t have the data to say which was and it’s done now so what’re we gonna do about it.
Anyway, the GE19 collection was a success, and returned a modest little dataset of 66,678 tweets that totalled roughly 1,662,269 words. (I say roughly because do you count hashtags as words? What about URLs? What about emoji? What about “abso fucking lutely”. Is that one word or three? etc.) These tweets were sent by 40,712 unique accounts, so each account sent roughly 1.7 tweets each on the subject of GE19 over the course of twenty four hours.
(Incidentally, if you’d like to use FireAnt to do this kind of thing yourself, you totally can. It was created due to the amazing support of the ESRC, it is free to download and use, and you can do all the same things I do in this blog post here. All I ask is that you give me and Laurence Anthony credit. 😘)
Right, let’s get stuck into some findings.
What and where
The top three devices for tweeting during the twenty-four hour GE19 data collection window were iPhones (25,799 tweets), Android devices (19,885), and browsers (13,295). And the top ten self-described locations of the people sending those tweets were:
Similarly, when those people were tweeting about places, most commonly mentioned locations were Scotland (6,669 mentions), the North (4,318), Belfast (3,615), the South (3,262), and Ireland (2,680). Note that Ireland could be part of the phrase “Northern Ireland” too, so that will be conflating at least two counts.
Here’s a world and a UK map showing where the placenamed GE19 tweets came from (click to embiggen):
Shared media: hashtags, videos, pictures, and links
Aside from the GE19 hashtag which, unsurprisingly dominates my dataset (funny that), and all the other generalelection-esque variations, the most popular accompanying hashtags were #Brexit (349 instances), #exitpoll (256), and Labour (192). I should note here that my search is case sensitive, which is a bit annoying, so these figures will be artificially low since variations like #brexit and #ExitPoll and so on will certainly appear on the list, but much lower down.
The top three shared videos in GE19 were the following:
— Ryan Moffat (@rymoffat) December 12, 2019
(1762 shares in the GE19 dataset)
— chris Jones. 🇬🇧 🇺🇸 🇮🇱 (@spinner2000) October 25, 2019
(96 shares in the GE19 dataset)
— Josh Smith (@Boy1010Tory) December 12, 2019
(75 shares in the GE19 dataset)
Similarly, the top three shared pictures were:
— Richard Chambers (@newschambers) December 12, 2019
(153 shares in the GE19 dataset)
— Jack Horgan-Jones (@JackHoJo) December 12, 2019
(119 shares in the GE19 dataset)
— Sarah Creighton 🍂 (@Saraita101) December 13, 2019
(118 shares in the GE19 dataset)
The dramatis personae
When we look at the sheer raw frequencies of words in the tweets – literally which words were used most, the results start to paint the picture that is coloured in more in the sections below. The top political party was the Conservatives, including variations such as Conservative and Tory (10,816 mentions), but next in line was the SNP (10,666), and if you didn’t follow the elections minutely at the time, you’ll begin to see why soon. In third place (ahem) was Labour (6,387 mentions), then in fourth, the DUP (4,780), and in fifth, the SDLP (2,026). Yes, the Liberal Democrats were nowhere to be seen. They crawl in way down the list with a maximum of 922 mentions if you count every possible permutation of their name.
But what of their accompanying cast of characters? What of the big names? Who took the Twitter championships for most mentions?
The major hitter was nicolasturgeon. Yes, all one word. Just that one version of her name was tweeted about 6,349 times, so I didn’t even bother to go find and count all the individual Nicolas and Sturgeons kicking round the dataset. By contrast, Johnson (just the surname) barely managed half that with 3,627 mentions. Boris (first name only) came in directly afterwards with 3,494 mentions, and then trailing behind, we have Corbyn with 2,462 hits.
Something that is fun (if your version of fun is computing large amounts of linguistic data and turning it into numbers and then visualising those numbers and then explaining them) is to see how those names played out over the night and into the next day, so I did a search for all permutations of six blockbuster names: Boris Johnson, Jeremy Corbyn, Jo Swinson, Nicola Sturgeon, and then, just for fun, Donald Trump and Nigel Farage. (Again, my version of fun may not look like yours. At. All.)
To understand this graphic it’s useful to have a rough idea of the key events. Just after 10pm, when the dataset starts, the exit poll results come in suggesting not only a Conservative win, but also a massive SNP increase, and accordingly, Sturgeon rockets through the stratosphere into outerspace. Relative to her, Johnson gets a far more modest spike, and Corbyn struggles around the bottom with Swinson and the other also-rans.
At about 1:40am, the SNP take their first scalp of the night, and it looks like people begin to jubilantly believe the SNP exit poll predictions. Then, between 3am and about 4:30am, it all kicks off. In quick succession, the DUP’s Westminster leader loses his seat, then Chuka Umunna (ex-Labour) and Zac Goldsmith (Conservative) lose their seats. Then Labour leader Jeremy Corbyn holds his Islington seat, but makes a statement that the night has been very disappointing and that he will not be standing as Labour leader again. Then Johnson arrives at his constituency. Then, minutes later, Liberal Democrat leader Jo Swinson loses her seat to the SNP. Then Johnson wins his seat and gives a speech. Then Trump tweets to say that it’s looking good for Johnson overall. Then Sturgeon gives a speech that she has a mandate to give Scotland a choice. And somewhere in all that the DUP lose their deputy too.
But the players are not the policies, so what were the issues that people were tweeting about?
Wedge issues gonna wedge
The top two words in GE19 in raw frequency terms were Brexit (5,860 mentions) and referendum, as both the full word or as ref, indieref, etc. (2,071 mentions). But there have been a lot of discussions in recent days about the NHS and anti-Semitism and so forth, so how did those various wedge issues and awkward stories play out across the night? (Click to embiggen.)
Yup. It’s Brexit.
Emojindex: the nation’s minute-by-minute reactions
Therapy couch time. Finally. In light of this onslaught of drama and destruction, how does everyone feel? Fortunately modern technology gives us a nice contemporary way of conveying our feelings online: emoji. Let’s see what the top twenty emoji of the dataset were:
These actually map pretty well onto the top twenty emoji as listed by the Unicode Org. Probably there were others that were frequent in the dataset but at the moment these are actually fairly hard to search for because the unicode support in lots of corpus tools isn’t there, so this took a frustrating amount of work-arounds. More to the point, if these are little emotional barometers, how do they also map across the night as the results come in? Well, firstly I had to group them by theme, so I had laughter and smiles (😂 🤣😅😊😍😁😉😆), supportive symbols (👏💪👌👍🙏💙❤), and then sadness and cynicism (🤔🙄😭😢💔). Naturally, some of these are quite simply more popular than others, and some categories have more emoji in than others so again, take this with a couple of kilos of salt.
Anyway, the heartbeat of the nation through the night essentially looked like this (click to embiggen):
Early on there’s a huge surge of grief and sadness as the exit polls come out. There is also joy, of course, but as midnight approaches and we sink into true darkness, the despair eclipses the celebrations. Then there are spikes where people seem to be praying, hoping, and wishing at around 3am, 4am, and 5am-ish. There’s also a 4am spasm of joy that might correlate with either Corbyn’s announcement that he will stand down or Swindon’s defeat. Then, as the world wakes up and gets online at around 8:30am there’s another spike of delight, and as the day progresses into the afternoon, a general rhythm of happiness takes over.
Something to observe here is that most people simply have to go to bed, so actually, though the spikes through the earlier hours are lower, when you consider how many people will have been actively awake and tweeting at those times, they are actually remarkable. A far smaller pool of people were clearly motivated enough by their feelings to express them to such an extent that they still register as substantial spikes on the graph.
One last thing I did was contemplate whether the happies, sads, and hopefuls were talking to each other, so I made a very quick network. Each red dot is a person in the dataset and each line is a tweet they’ve sent with an emoji in it that is either happy (green), sad (blue), or hopeful (orange). And as you can see, whilst the happies and hopefuls seem to have things to say to each other, the happies and sads interact less, and the happies and hopefuls least of all. Click to massively embiggen:
Cloudy with a chance of shitstorms
But not everyone is into emoji, and where brightly-coloured little icons fail to convey our innermost reactions to events like these, nature has provided us with that oldest and most cathartic outlet of all: swearing.
The top swearword in the dataset is, well, fucked, and you can probably infer its most common contexts of use. As a result, it is, in its own right, extraordinary. In raw frequency terms, it is the 141st most frequent word in the whole dataset. If you don’t work with large amounts of language, this likely won’t sound all that amazing, so I’ll try to explain it a little. Closed-class, grammatical (supposedly “meaningless” but not really) words like the, in, of, it, to, up, you, and so on tend to dominate most of the first 120 or so spots. Then with increasing frequency you start to get the really common open-class lexical words. These can be, as in this case, heavily skewed by the topic, so in this dataset the really frequent words are poll, election, vote, and so on. Accordingly, these will usually eat up many more spaces. That fucked should appear within the first thousand words is remarkable, let alone as high as 141st.
For the purposes of this analysis, though, I thought it would be interesting to look not just at variations of fuck, but at all the possible swearwords. I decided, in my infinite wisdom, to smash together OFCOM’s offensive language classification system with the Met Office’s severe weather categories, and then grouped the words into three levels of offensiveness. The true fucks are technically Category 3 words, but that aside, examples for each category include…
|CAT 1 (damnation)||arse, arses, bloody, bugger, buggers, crap, damn, damned, etc.|
|CAT 2 (shitstorm)||arsehole, arseholes, bullshit, feck, fecking, fecked, fecker, fecks, shit, bastard, bastards, bellend, bellends, dickhead, dickheads, etc.|
|CAT 3 (clusterfuck)||cunt, cunts, clusterfuck, clusterfucks, fuck, fucking, fucked, fucker, fuckers, fucks, motherfucker, motherfuckers, motherfucking, etc.|
I say “examples” because it’s easy to find all permutations of, say, fuck by just searching for *fuck*. This will find fuck, fucked, fucking, fucker, unfuckingbelievable, absofuckinglutely, etc. etc. Similarly, *shit* will find shit, shite, shitty, and so on. (The asterisks in this case are acting as wildcards, not as indications of actions…)
Aaaanyway, these lists really only show the major variants that FireAnt will have retrieved, but there will have been many more besides. Irritatingly, the Scunthorpe Problem does mean that there will be a small handful of false positives. Remarkably, Pearseon appears in the corpus and that has arse in the middle of it, but the net total of problematic results should be negligible.
Another important point is that swearing is not, de facto, bad, or aggressive. Plenty of people were fucking delighted. They were here for this shit. This was the motherfucking bee’s knees. And so on. So it’s not possible to equate the quantum of expletives with the quality of emotion. All we can really infer is that whatever emotion it was, it probably tended towards being stronger, rather than weaker.
So how did the maledictions pour in through the night? (Click to embiggen.)
A trio of little Category 2 Shitstorms squall in throughout 10pm, 2am, and 5am, but the night is mainly dominated by Category 3 Clusterfucks. Indeed, after the 10pm exit poll apocalypse, there’s another severe localised outbreak just before 3am. What, you might ask, is that all about? Well, it coincides very closely with Swinson (leader of the Liberal Democrats) losing her seat to the meteorically ascendant SNP but it is also proximate to Corbyn’s announcement that he will stand down, and news does not spread equally swiftly across all accounts.
But there’s a crucial question that remains unanswered here…
How many fucks does Twitter normally give?
Or in other words, how normal is this level of swearing? Does this stand out from the background norm of swearing or is it peculiarly high or low? To answer that question, we need to do some maths.
In the GE19 dataset, there were 40,712 unique accounts that produced 3,926 swearwords in the course of 24 hours. If we wildly extrapolate, then the average user in this dataset has been cruising along giving an average of 35.19 fucks per year about this subject.
(Yes, I am including all the swearwords when I say “fucks given”. And yes, this is silly maths. The GE wasn’t announced till last month. The context is wildly unrepresentative. The extrapolation is bananas. If all this bothers you go read some SlateStarCodex to reset your internal equilibrium.)
But how does 35.19 fucks per person per year in the GE19 dataset compare to non-GE/non-bonkers data? Is this a lot, you ask? Or normal? Or actually mild?
Well, to compare, this morning I collected a reference corpus of 85,092 tweets. (It was meant to be a cool 85k but I thought I could run to the toilet and back before it hit the right figure and it turns out I ate a lot more pies this year than I realised.) Anyway, this dataset included 77,512 unique users. The higher number reflects the fact that the collection wasn’t restricted by topic. When the data is filtered for a specific topic, especially a topic that is highly of-the-moment like the GE results, then the same users will be tweeting about that same topic repeatedly, whilst also typically preferring to use the same hashtags.
Back to the point. This generic reference corpus posted 3,114 swearwords in the course of 2 hours and 20 minutes, so if we wildly extrapolate that again, the average person on Twitter is giving just a tiny shade more than 151 fucks per year. On whatever subject(s) they give fucks about.
Why is the average Twitter user giving four times more fucks about literally anything than the people tweeting about GE19 as the results are unfolding?
There are a surprising array of possible explanations. Firstly, since the majority of the people in the country voted for the Conservatives, it also makes sense that a majority of users watching the results would be delighted with the outcome. Swearwords are traditionally (though inaccurately) thought of as signals of anger or frustration. This is certainly not always the case but you are far more likely to hear words like bellend and bastard used as pejoratives than as compliments so their use in a celebration-dominated corpus would probably be lower.
Another explanation could be that the GE19 dataset involves lots of British people who are supposedly prone to radical understatement. Given that I haven’t intensively lived in a variety of other cultures I couldn’t fairly comment on this.
Another explanation, and the one that I find most compelling, is that a lot of the GE19 tweets came from media sources. In the list of the most frequent tweeters in that crucial twenty four hour period we have @BelTel (the Belfast Telegraph), the @Daily_Record, the @ScottishSun, @newsdirect, @RTUKnews, @BBCNewsNI, and @TheScotsman all in the top fifteen places. Those seven outlets also account for 694 tweets between them. And scanning further down the list we have universities, gambling businesses, journalists, and more.
Professional accounts (for want of a better name) tend to be very wary about swearing since even innocuous-seeming examples can trigger widespread condemnation. At the same time, all these media outlets are racing to be the first with their news and views, and since they are actually breaking the news, they are producing more content. Overall, then, this is almost certainly having a heavy dampening effect on the data. Essentially, the carefully bloodless, inoffensive tweets of the institutional accounts have probably diluted the emotional intensity of the electorate in the data. I suspect that if I could weed out all the institutional accounts and just leave ordinary users, the rate of fucks given per minute would go supernova.
Other future amusing studies with this data would involve separating out all the types of fucks given (delight versus dismay) and a task I was going to do but ran out of steam – a network map of who was talking to whom, which accounts had the greatest influence at spreading news, and so forth. A further topic I’d contemplated was analysis of account bios to pick out political affiliations and then chart their reactions through the night, but again, sanity prevailed and this day is only so long.
Overall, then, there you go. A collection of fun, silly, and halfway serious facts about twenty four hours on Twitter during a General Election.