The devil in the data: are women as aggressive online as men?

So, sometimes things grind along veeeeery slowly in the background, and eventually, finally, at long last, I get to them. This has been one of those slow-burn projects. To understand what’s going on, it’s worth briefly casting our eyes back to this post. Alternatively, if you want a very short overview, then here it is…

On the 26th of May, 2016, Demos thinktank’s Centre for the Analysis of Social Media (CASM) presented the results of an investigation into the use of misogynistic terms on Twitter to the House of Commons. This work was a collaboration between the thinktank Demos and the TAG laboratory at the University of Sussex, and the release of the report was timed to coincide with a political, cross-party campaign, Recl@im the Internet. Launched by Labour MP Yvette Cooper, the aim of Recl@im the Internet is described thus: Continue reading

Missing Melania II: has Trump used the First Lady’s Twitter accounts in the past?

With her absence from Camp David this weekend, Melania Trump’s absence continues into the 24th day, and as time passes, the number of theories about her disappearance are increasing. Twitter is an excellent source for these if you’re interested, but the main ones appear to be that she is secretly giving evidence to Mueller; that she is back in New York with her son; and that she is busy instigating divorce proceedings.

However, whilst all of that foments away in the background, something quite different caught my eye – the highlighted tweet below from @MELANIATRUMP back on the 18th of October 2013:

@realDonaldTrump: I love watching the dishonest writers @NYMag suffer the magazine’s failure.

@DanAmira: Your wife is waiting for you to die (…)

@MELANIATRUMP: @DanAmira @NYMag Only a dumb “animal” would say that! You should be fired from your failing magazine! (Link)

Intriguingly, in 2016, New Yorker noted about this very tweet that, “One couldn’t help but detect Donald’s influence when Melania fired off [this] reply”, and plenty of users replied with similar questions about precisely who might have written it. Continue reading

Missing Melania: is the First Lady’s Twitter account being used by Trump?

As we speak (01st June 2018), interesting conspiracies are breaking across some news networks about Melania Trump. The First Lady has allegedly not been seen in public for twenty days, and according to White House sources, she is recovering from an operation. However, the increasing concern for her welfare was heightened with the latest (as of this moment) tweet from her FLOTUS account on Wednesday 30th May which reads:

I see the media is working overtime speculating where I am & what I’m doing.  Rest assured, I’m here at the @WhiteHouse w my family, feeling great, & working hard on behalf of children & the American people! (Link)

The reaction by some to this tweet was an immediate cynicism that she had written it. Some felt that it was authored by Trump, and others demanded to see her holding a current newspaper. So, is that tweet truly unusual for the First Lady? Or is this a case of mass-confirmation-bias, where many people who already hold Donald Trump in contempt have simply found another possible avenue of attack? I thought I’d have a look at it out of curiosity and see what I could see. Continue reading

The savage garden of social media: London’s violent crime surge

Over the past few days, the media has been reporting on a “surge” in violent crime in the capital. Figures such as Met Commissioner Cressida Dick and Home Secretary Amber Rudd have framed this fluctuation with a narrative that social media is playing a key role in arguments between young people, particularly those in gangs, by allowing them to react quickly to online grievances with offline violence. For instance, Met Commissioner Dick claims that “sites and apps such as YouTube, Snapchat and Instagram are partially to blame for the bloodshed” (source). As someone who researches online aggression, I find this notion particularly interesting. Continue reading

The Ghost(writer) Busters: Can machine learning help in the fight against contract cheating?

Yesterday morning (Mon 12 Feb 18) the Times Higher published an article entitled “Caution over Turnitin’s role in fight against essay mills” (tagline: New software to identify ghost-written essays welcomed, but experts say it is not a panacea). To summarise, the article describes how, later this year, Turnitin will be releasing their new tool, Authorship Investigation, which “will use machine learning algorithms and forensic linguistic analysis to detect major differences in students’ writing style between papers”. Continue reading

PyClaireH versus RyClaireH: which bot wins the imitation game?

I’ve been meaning to write about my newest bot for a while, but finally here we are. Welcome, RyClaireH, to the fold. Yay! In case you’re wondering who the other bot is, you can find out all about PyClaireH in this blog post. For those already familiar with Py, the easiest way to describe Ry is through her differences. Where Py is a Markov chain (more detail on this), Ry is a much more sophisticated pseudo-Markov chain. Py essentially uses word-level probabilities to construct sentences based on the likelihood that one word will occur after another. On the other hand, Ry uses NLP (natural language processing). From this toolkit, she tags each word in the data for its part of speech (e.g. noun, verb, adverb, adjective, etc.) in advance, and also uses dictionaries of nouns and adjectives to help her formulate more syntactically coherent tweets. In a nutshell, Py is a very simple bot that works at the level of the word (lexis), whereas Ry works with both words (lexis) and grammar (syntax). Neither bot has any help at the higher levels of language, such as with the meanings of words (semantics), and certainly not with the meanings communicated beyond the word (pragmatics). Arguably a semantically “aware” bot is possible – semantic tagging by something like the USAS tagger could provide a nice way in, but to incorporate that into the model requires a level of programming competence I don’t possess.

As I mentioned in my Py post a few months ago, one of my interests was in whether more coding and extra tools would help Ry to be more convincing than Py, or if it would actually hinder her, so this post is a non-serious, barely-scientific, entirely-amusement-driven shoot-out between the two. You can (and indeed should) pick about three hundred holes in the general methods and rigour of this, but I refuse to let that stop me.

So, onwards. Continue reading

Lies, damned lies, and slippery surveys

On the afternoon of Thursday 16th February, perhaps in light of a week that has been even rockier than usual, current US President Donald Trump held a controversial press conference. Whilst this was, in itself, newsworthy for a variety of reasons, there was an unexpected plot-twist. Trump followed up with sending out a mailshot to his Republican supporters…

Click to embiggen, but just in case you can’t read the image for some reason, as its preface, the email opened with the ongoing narrative that “the mainstream media” (this damning moniker seems to exclude pro-Trump agencies such as Fox News, incidentally) is carrying out hit jobs, attacks, deceptions, and so forth, specifically against Trump and the Republicans. As part of the resistance to this, recipients were encouraged to complete a “Mainstream Media Accountability Survey” (PDF for posterity).

Very quickly, that terribly biased, pesky mainstream media noted that this survey was, itself, rather biased. In fairness to both sides, claiming that a survey is biased is an easy win. Every survey and questionnaire contains bias right from the start – the goal of the survey, the topic choice, the time of asking, the person who is asked, the person doing the asking – all are the product of intentional choice and have an ability to alter the results, but the point is to limit and control for all of these factors as much as possible. More importantly, it’s an easy claim to make because it can be surprisingly difficult to pick out the exact features that are creating larger degrees of bias than we would consider acceptable. You might read the survey and get an intuitive sense that it isn’t playing fairly, but it’s helpful to be able to specifically identify the very methods that are being employed to push you one way or another. And that’s what I do in this post. Continue reading

Cloning humans with code: building a bot that speaks just like me

Over a few weeks, I’ve been gradually looking into setting up and running my own Twitter bot, so I’d like to introduce you to PyClaireH, my very first digital clone. She may be slightly… erm… sweary. In fact, this is probably a pretty close insight into what I’d be like if I drunk-tweeted.


What’s a bot?

A bot is, usually, simple software doing a simple job. Some bots are nice and some are less so. The good ones tend to be playing simulated characters in video games, producing art, crawling the internet for data, putting bids in on eBay for you, providing answers to customer questions, or holding a (hopefully convincing) conversation with you in some way. These latter “chatty” types are informally called chatbots. Meanwhile, the malicious bots are busily exploiting vulnerabilities in computer systems, crashing servers with artificially high traffic, sending out torrents of spam or abuse, scraping data they’re not allowed to collect, and maliciously impersonating people.

What kind of bot is PyClaireH?

Well, that’s an interesting question. PyClaireH (and, in the future, hopefully, RyClaireH) are chatbots who both produce communication and respond to certain linguistic prompts. However, PyClaireH is also intended to impersonate me. That’s hardly in the realms of malicious, of course, but it does have malicious applications, and that’s my interest: in the online arms race of fraud and counter-fraud, how well can bots pretend to be us, or more accurately, specific instances of us. In an experimental setting, for instance, could PyClaireH ever fool someone into believing that she really was me? And can we identify the linguistic tells that distinguish the ghosts in the machine from the humans? Continue reading

Oxford Dictionary’s Word of the Year is “post-truth”, but what makes a newly coined word survive?

From the distribution of letters in a box of Cheez-It Scrabble crackers to the incorporation of new words into the dictionary, it seems that we are constantly fascinated by all aspects of language, and particularly by its newest developments. Today, the Oxford Dictionary Word of the Year was announced: post-truth – relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief.

Interestingly, this has links to ideas that have been around for quite some time. Centuries ago, Jonathan Swift (1677-1745) was lamenting that whilst falsehood flies, the truth comes limping after it. Similarly, for the modern era we have Brandolini’s law, “The amount of energy needed to refute bullshit is an order of magnitude bigger than to produce it.” From a political perspective, this lends itself to highly expedient game-playing. Misstatements and convenient omissions are today’s front-page attention-grabbing headlines, and retractions are tomorrow’s tiny, overlooked addendums. The benefits gained from the lie may exponentially outweigh whatever consequences trail along in its wake. This kind of post-truth politics has even driven the rise of sites like Fact Check, Politifact, and snopes as audiences increasingly recognise that they may not be getting a fair representation of issues.

To return to the main point of this post, however, I find these Words of the Year/Decade/Century events linguistically interesting for three reasons – how a word is born and flourishes (or not), the social importance of new words, and the method behind identifying them (corpus linguistics). And since I’ve had some media interest in this already today, I thought I’d write out my ideas more fully here. Continue reading