The Ghost(writer) Busters: Can machine learning help in the fight against contract cheating?

Yesterday morning (Mon 12 Feb 18) the Times Higher published an article entitled “Caution over Turnitin’s role in fight against essay mills” (tagline: New software to identify ghost-written essays welcomed, but experts say it is not a panacea). To summarise, the article describes how, later this year, Turnitin will be releasing their new tool, Authorship Investigation, which “will use machine learning algorithms and forensic linguistic analysis to detect major differences in students’ writing style between papers”. For those unfamiliar with Turnitin, it is a text-matching software that is primarily used to investigate cases of alleged plagiarism, though it does also offer other services.

To conclude this blog post with a sensible opinion on how feasible such a venture is, there are several issues to cover first, including “standard plagiarism” versus “contract cheating”; deception and base truth; the usefulness of machine learning in such cases; style and variation; potential misuse; and other possible solutions, so I’ll try to say a little bit about each in turn as briefly as I can.


Plagiarism versus contract cheating

There are plenty of disagreements about precisely where plagiarism starts and ends, especially across different cultures, but for the purposes of this post, it is essentially presenting someone else’s work (words, ideas, music, imagery, computer code, etc.) as though it were your own, i.e. without credit to the original source, or in some cases, with insufficient credit. There are other ways to plagiarise too – collusion and self-plagiarism are two that quickly spring to mind – but such joys should be saved for another day.

The most typical version of “standard” plagiarism is that Stu Dent goes onto Wikipedia, copies a paragraph, pastes it into his essay without any attribution to the source, sometimes without even deleting those tell-tale numbers in square brackets (really), and hands the work in, hoping that the pilfered paragraph will slip by with its true authorship undetected. Academia frowns on this as a form of intellectual theft because Mr Dent has appropriated someone else’s work for his own gain – presumably for a better mark and potentially even a better degree – without giving that person their due credit.

Similarly, “contract cheating”, also occasionally referred to as commissioned plagiarism or ghostwriting, involves Soph O’More presenting the work of another as if it were her own, but the real author wrote the essay at Soph’s request. This ghostwriter might be someone close to Ms O’More such as a parent concerned about their sinking investment in university fees or a romantic partner or friend who undertook the work out of misguided kindness. Alternatively, Soph might have gone online and found herself one of the countless commercial services where one can submit an assignment brief, pay, and receive in return an essay fitting her requirements within a timeframe suited to her needs.

How can there be so many of these services if they are selling methods of cheating, you might wonder? Surely they would be quickly shut down? And if you’ve ever seen any of these sites, you might be even more surprised to note that many of them boast plagiarism-free-essays-or-your-money-back style guarantees. The explanation for both issues is found in the small print: these sites typically position themselves as purveyors of highly-tailored examples of academic practice designed purely for the student to read and learn from. These services observe that the essays they provide are absolutely not for students to submit. Oh no. And if Soph should be feckless enough to submit the essay she bought as though it were hers, well, then the consequences would rest entirely with her…

Quite how an entire industry could positively flourish based on people buying bespoke essays purely to read as exemplars before scrupulously setting them aside and writing something different is, well, the non-mystery of the century.

Anyway, back to the point: unlike “standard” plagiarism which involves intellectual theft from an unwitting source, in contract cheating, the ghostwriter almost certainly knows that Ms O’More will be handing that essay in as if it were her own work, and they instead receive a different form of credit, usually in the form of shiny gold coins of the realm.


Divining deception

Both “standard” plagiarism and contract cheating involve deception – Stu and Soph would each like you to believe that they wrote all of their own essays – but there are differences that fundamentally affect how we can investigate and draw conclusions about each kind of malpractice.

In “standard” plagiarism, the essay’s patchwork nature is often its downfall. With experience, markers usually become sensitive to sudden internal shifts in an essay’s style, sophistication, and yes, even just the font. More crucially, once a problematic section has been found, the original textual source for it is out there… somewhere… Sometimes a ten-second search will produce a literal embarrassment of hits. Other times, it’s easier to let a product such as Turnitin do the heavy lifting. In whatever way the source is found, its existence is key. It stands as external evidence of possible misconduct, and the longer and more unchanged the matched section, the higher the probability that this is indeed plagiarism rather than some wild, statistically implausible fluke of identical wording. In research on deception, we could call this external evidence the base or ground truth, and we have access to it.

By contrast, rather than being a patchwork of paragraphs by different authors knitted together, a bespoke essay should be a stylistically complete whole authored entirely by one person. As a result, any sudden shift in in style would only become apparent if Soph’s essay was compared with another that she had supposedly written. Additionally, to uphold any guarantee of being plagiarism-free, the essay should also be entirely original (appropriately referenced quotes and paraphrases notwithstanding) and therefore immune to even the most tenacious web searcher or text-matching software. As a result, unless I could somehow winkle a confession out of Ms O’More or, less likely still, her ghostwriter, I would not have any external evidence of misconduct, or, in other words, I lack access to the base or ground truth.

Why does having access to the base truth matter? Because in a “standard” plagiarism case, there is an objective, external, non-partisan body of evidence that we can weigh up in the form of the source text and its relationship to the student’s essay. No matter how much we may like Mr Dent, if 90% of his work came from Wikipedia without attribution, the case is pretty incontrovertible. But what of Ms O’More’s work? We may have suspicions, but we have no objective, external, non-partisan source to guide us. The gut may say “cheater”, but how does the brain ever get to the actual truth of the matter? Opinion is not fact, and any conviction for academic malpractice that we can feel comfortable with should rest on more than mere distrust.

This is where Turnitin’s Authorship Investigation tool suggests that it can help, by using “machine learning algorithms and forensic linguistic analysis to detect major differences in students’ writing style between papers”. The student’s other work therefore becomes an external benchmark to measure against. (Forensic linguists have a host of terms about Known Texts and Disputed Texts but for the non-specialist audience this is aimed at, I’m trying to keep the technical jargon to a minimum.) Anyway, why wouldn’t I be thrilled with such a tool as a possible way forward? Well, for several reasons. Just one is  the assumption that the student wrote any of their other essays. They might all be paid for, after all. But there are other concerns besides, starting with the proposed method.


Machine learning: the blackest box

My first problem is that the code and algorithms powering Turnitin’s new tool are highly unlikely to be open to public scrutiny, now or ever. Businesses are strangely reluctant to give away their proprietary work, and therefore their market advantage and profit margins, even in the spirit of academic rigour.

Secondly, even if Turnitin were suddenly minded to let us peep under the hood, as I’ve said in the past, most machine learning algorithms that analyse language in this way are black box. By this, I mean that data goes into the black box, highly complex abstracted analyses happen in the darkness, and then results come out of the box. Quite what leads us from X features to Y results is a mystery. If Turnitin’s tool is like this that could be very problematic. In a plagiarism case we want a clear picture of the relationship between the questioned elements of the essay and the other texts. If the computer claims that Soph’s latest essay is very different in style to the one she submitted last week, we would want to know what those differences are, how those differences are being weighted (will using a wider vocabulary be ranked as more tell-tale than using simpler sentences, for instance) and more importantly, are they differences that we could reasonably expect given that this is, after all, a different essay. That takes us neatly onto the next point…


Style and variation

Perhaps my gravest concern is that this software may produce endless false positives, and even have a chilling effect on students developing their writing styles. Essays should differ, and in some cases markedly. After all, the same student will be expected to write on widely different topics, modules, and even degree programmes. A joint honours student studying maths and literature should certainly not be producing stylistically interchangeable essays for both The Gothic Novel and Differential Equations. Even within the same subject the topic will range widely. Students of linguistics might be providing a hard-science report on phonological variation in children with speech disorders one day, and a stylistic analysis of the poetry of Brian Bilston the next. (You should follow him. He is rather amazing.)

But this is by far and away not the only factor. They will write different essays, and even different bits of the same essay at different times of the day, in very different moods and states. Think about any extended piece you have ever written. This blog post has so far been written, edited, and tweaked whilst half-asleep in bed using the pillow as a very poor micro-keyboard-rest, on my tablet during a self-pitying Netflix binge whilst suffering from shivery, sweaty, sickly flu [edited 18 Feb 18 to add: turns out it was possibly mild pneumonia – who knew?!], in bed again later whilst high as a kite on anti-flu meds, in the car on my smartphone whilst also trying to navigate (don’t worry, I wasn’t driving), and right now in the guest bedroom on the laptop with the news on in the background…

Whilst that’s far more detail than you wanted, the point is that all of these factors – illness, tiredness, enthusiasm, distractions, location, device, etc. – affect style. So too does the purpose of the work. A lengthy literature review of a seminal social science theory calls for a student to extensively outline and critique the views of others whereas a short exploratory empirical STEM investigation demands that the student concisely explain their own actions and tentative results. Similarly, as I already hinted at above, academic experience and ongoing development affect style. What a student produces in their final dissertation should (one hopes) be quite different to what they handed in for their very first essay. We don’t want to create a chilling effect whereby students become wary of changing their academic practice for fear of being dragged into an investigation in which they literally cannot produce any evidence that they did not purchase their work.


Exorcising the ghostwriters

It’s very easy (flu pneumonia notwithstanding) to sit here in my lovely bed, under the duvet, watching the latest political nonsense unfold, and explain the reasons why I think Turnitin’s tool may face challenges, but I try to make it a habit to follow up with suggestions on how the problem – in this case, contract cheating – could otherwise be approached. By singular good luck, Prof Tim Grant has saved me an enormous amount of work here by tweeting an excellent thread on exactly this, which will give you a timely break from me. Here are his much more concise thoughts on the matter:

I’ve had a string of people trying to interest me in a project on this and I’ve always said ‘no’. The solution is educational practice not technology – plagiarism of all varieties will be solved by 3 things:

(i) good teaching – which always has and always will require enough time for the teachers to learn and understand their students abilities including their writing abilities;

(ii) good assessment practice – this is easy and leads to a positive student experience (and a more interesting marking experience). For example, getting students to collect and analyse their own data makes plagiarism difficult;

(iii) good regulations – the submission of a piece of work is about a student providing evidence that they’ve achieved certain learning outcomes. Regulations should be couched in this and in terms of academic honesty.

This means a marker should need only a reasonable suspicion of malpractice to invite the student for a discussion. (This doesn’t mean marking up a plagiarised work with every copied citation). This discussion should look at the needs and intent of the student: in a viva it’s fairly easy to tell if a student is struggling academically, whether they wrote the piece, and whether there is intended dishonesty. At the end of the viva the student may have provided evidence that they have achieved the learning objectives (yay!) – or that disciplinary action may be required. (Prof Tim Grant, Twitter thread, 12 Feb 18)

Yep. What he said. And I would add to (ii) by noting that essays are only one form of assessment. There are many ways to test a student’s learning progress. Essays have their place but so too do presentations, exams, posters, and so on.


Tabloid terrors

One other concern that I have is with uncritical misuse of this tool beyond its intended scope. There are plenty of “journalists” out there who will delight in putting through celebrity autobiographies, children’s-books-by-actors-turned-philanthropists, tweets from allegedly hacked accounts, and more, all in quest of an outrage-generating headline of supposed cheating by their current hate figure. It’s very convenient to throw about accusations that can be attributed to software, so I anticipate at least some nonsense in that vein.


In conclusion…

It’s important that I finish on the limitations of my own work. This blog post is derived from what I know about plagiarism, language, deception, style, variation, and the barebones descriptions given of Turnitin’s new tool in press releases and the media. Crucially, I haven’t seen it in action yet, so it may prove entirely possible that some, most, or even all of my concerns have been taken thoroughly into account. Whatever the case, I am eagerly anticipating its release and getting stuck in with early tests, so feel free to watch this space for future updates.


Originally posted by DrClaireH at 11:20, Tue 12 Feb 18, followed by an hour of typo bashing and clarity tweaks.

Edited at 19:40, Sun 18 Feb 18 to add exciting pneumonia details.

One thought on “The Ghost(writer) Busters: Can machine learning help in the fight against contract cheating?

  1. Pingback: Hack Education Weekly News | Daniel's EDC blog

Comments are closed.