Popoola – “It’s the story, stupid!” How MARV (Multivariate Analysis of Register Variation) can save the world from fake news.

The FORGE is delighted to announce our third external speaker: Olu Popoola (University of Birmingham). Details of his talk are below:

TITLE
“It’s the story, stupid!” How MARV (Multivariate Analysis of Register Variation) can save the world from fake news.

ABSTRACT
Computer-aided fake news detection can be a useful complement to human efforts. On its own, fact-checking is often too slow to prevent the viral spread of disinformation; debunking news stories and communicating corrections can also have a backfire effect of reinforcing the false belief (Lazer et al. 2018). Most computational methods frame fake news detection as a text classification task (Shu et al. 2017) and so require data pre-labelled for veracity. However, the complexities of defining fake news (e.g. fabricated facts or undisclosed advertising?), the different types of fake news (imposter news vs. low-quality news vs. inaccurate news), the difficulty in establishing objective ground truth as well as the weaponization and dilution of ‘fake news’ as a concept leave the collection of pre-labelled data fraught with epistemological issues.

Semi-supervised multivariate statistical techniques may overcome these limitations by modelling news veracity as a latent variable whose value can be estimated from the presence of deception clues and a novel deception scoring approach. This study tested the hypothesis that i) there is significant linguistic variation within the online news genre and that ii) variation is correlated with deceptive situational parameters of communication. Multivariate register analysis was conducted on 5000 stories from the political section of 15 online news sources selected as representative of the online news ecosystem (i.e. a mix of UK and US legacy and new online media from across the full political spectrum). Linguistic parameters were defined from a feature set combining lexico-grammatical and cohesion-based features; situational parameters were drawn from expert-defined fake news detection heuristics and used to calculate a deception score. Visualisation techniques (Diwersy, Evert and Neumann, 2014) were used to assess whether this situational analysis revealed any dimensions of deception and deceptive text clusters.

The study found that linguistic variation in the online news genre is highly correlated with the probability of veracity, with absence of narrative the main indicator of potential deception. This result was unexpected as storytelling is generally associated with deception. However, in the context of a profession which places supreme value on the news story it makes sense that narrative register is a key veracity indicator. Semi-supervised multivariate analysis with deception scoring emerges as a viable alternative to text classification for automated deception detection in epistemologically challenging genres.

REFERENCES
Diwersy, S., Evert, S. and Neumann, S., 2014. A weakly supervised multivariate approach to the study of language variation. Aggregating dialectology, typology, and register analysis. linguistic variation in text and speech, pp.174-204.

Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., Metzger, M.J., Nyhan, B., Pennycook, G., Rothschild, D. and Schudson, M., 2018. The science of fake news. Science, 359(6380), pp.1094-1096.

Shu, K., Sliva, A., Wang, S., Tang, J. and Liu, H., 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), pp.22-36.

BIO
Olu Popoola is a PhD candidate researching methods for cross-domain deception detection at the University of Birmingham, and moonlights as a deception detection trainer and OSINT investigator. By day, Olu is a Teaching Fellow at Aston University where he teaches information integrity to future health professionals (a third career, following ten years in advertising and consumer research and another ten in English language teaching). Olu is married with two canal boats and a cat.

TIME & PLACE
1100-1200, Wed 20th Mar, County South B89

Dearden – Alternative fakes

FORGE is delighted to announce a talk by our upcoming internal speaker: Ed Dearden (Computing & Communications). Details of his talk are below:

TITLE
Alternative fakes

ABSTRACT
Lies have always been told to try and influence the opinions of others. But the ease of information-propagation allowed by the web and social media has made it an increasing problem. False information, both intentional (“disinformation”) and unintentional (“misinformation”), propagates like wild fire in this environment. Much research is (rightly!) concerned with characterising disinformation in this social media and online news landscape. Though this focus is understandable, there is much to learn by looking at other forms of false information, as the concept of people spreading lies is, sadly, not a new phenomenon. This talk will discuss some of the challenges of looking at different forms of false information and how the concepts of belief and deceptive intent affect the language of false information. The talk will then discuss a couple of case studies of false information: April Fools hoaxes and the Flat Earth Society forum.

TIME & PLACE
1100-1200, Wed 13th Mar, County South B89

All are welcome to attend.

Dance – Linguistics and disinformation: motivations and solutions for sharing fake news

FORGE is delighted to announce a talk by our upcoming internal speaker: William Dance (Linguistics & English Language). Details of his talk are below:

TITLE
Linguistics and disinformation: motivations and solutions for sharing fake news

ABSTRACT
Fake news, intentionally factually incorrect news that is published to deceive and misinform its reader, has become a very prominent issue in the public arena in recent years. It has been estimated that factually un-true stories were shared more than 30-million times during the 2016 U.S. presidential election (Allcott and Gentzkow, 2017) and already in 2019, the British government has published a white paper to tackle the spread of disinformation. However, at its core fake news is a contentious issue: should we even use the term ‘fake news’?; is fake news as damaging as people claim it to be?; can anything be done to stop it?

Fake news is a modern name for a very old phenomenon and it has been an issue for centuries, shown by Charles II’s 17th century proclamation “to restrain the spreading of false news, and licentious talking of matters of state and government” (Early English Books Online, 2017). Organisations such as the Department for Agitation and Propaganda in Soviet Russia and the Ministry of Popular Culture in Italy all created fake news under different names during the 20th century and under Hitler’s rule of Germany parts of the press were referred to as the Lügenpresse (literally: lying press). However, it is only in the last five years the term ‘fake news’ has entered our daily lives.

This talk will be a complete beginner’s guide to researching fake news. It will give a history of fake news that will discuss how old the phenomenon is, provide definitions of fake news and will explain why fake news exists. Then, recent seminal works exploring fake news will be discussed as well as the various government reports that are currently being published across the world to tackle fake news. I’ll then go on to give a work-in-progress report of my current research into fake news and give some examples of cursory data analysis that looks at social media users’ motivations and rationales for disseminating fake news online.

References
Allcott, H. & Gentzkow, M. (2017). Social Media and Fake News in the 2016 Election. The Journal of Economic Perspectives, 31(2), 211-236.

Donath, J. (1999). Identity and deception in the virtual community. In Communities in Cyberspace. London: Routledge. (pp. 343-359).

Guess, A., Nyhan, B., & Reifler, J. (2018). Selective exposure to misinformation: Evidence from the consumption of fake news during the 2016 US presidential campaign. European Research Council, 9.

Hardaker, C. (2013). “Uh…..not to be nitpicky,,,,,but…the past tense of drag is dragged, not drug.”: an overview of trolling strategies. Journal of Language Aggression and Conflict, 1 (1). pp. 57-86.

Rayson, P. (2008). From key words to key semantic domains. International Journal of Corpus Linguistics, 13(4), 519-549.

TIME & PLACE
1100-1200, Wed 06th Mar, County South B89

All are welcome to attend.

Wright – “You need to speak in English, you’re in f***ing England”: how the British press fan the flames of linguistic discrimination

The FORGE is delighted to announce our second external guest speaker: Dr David Wright (NTU). Details of his talk are below:

TITLE
“You need to speak in English, you’re in f***ing England”: how the British press fan the flames of linguistic discrimination

ABSTRACT
Every so often a story of linguistic discrimination makes the national news in Britain. Whether it’s offensive graffiti in an East London borough, tourists being verbally abused in the street, or acts of physical violence towards people on the tube, the motivation for these attacks is the same – the victims aren’t native English speakers.

In this talk, I demonstrate the ways in which such criminal behaviours have been at best legitimised, and at worst incited, by some sections of the British national press. I examine the ways in which non-native English speakers living in Britain are framed as a ‘problem’ for the native English majority, and how discriminatory, exclusionary and prejudiced ideologies about race, ethnicity and nationality are packaged in discourse about ‘language’.

Using a 5-million word corpus of British press reporting from 2005-2017, I explore the various ways in which non-native English speakers are vilified and demonised by the press. I also trace the development of certain discourses over time, and the means by which particular ideologies and arguments are ushered into the public debate, before being escalated and amplified. Most specifically, I observe the impact that the results of the 2011 Census had on the nature of such reporting, when it was revealed that 138,000 people (or 0.26% of the British population) do not speak English.

BIO
Dr David Wright is a forensic linguist at Nottingham Trent University. His research applies methods of corpus linguistics and discourse analysis in forensic contexts and aims to use language analysis to help improve the delivery of justice. His research spans across a range of intersections between language and the law, language in crime and evidence, and discourses of abuse, harassment and discrimination. He is co-author of An Introduction to Forensic Linguistics: Language in Evidence (with Malcolm Coulthard and Alison Johnson) and has published in international journals in forensic linguistics, corpus linguistics and critical discourse studies.

TIME & PLACE
1100-1200, Wed 27th Feb, County South B89

Gillings – Building a corpus of deception: methodological and analytical considerations

FORGE is delighted to announce a talk by our upcoming internal speaker: Mathew Gillings (Linguistics & English Language). Details of his talk are below:

TITLE
Building a corpus of deception: methodological and analytical considerations

ABSTRACT
The field of deception detection has been largely dominated by researchers from the field of psychology, and it is clear that those in linguistics have a lot more to offer than they have done thus far. Previous work in this area has primarily been carried out using LIWC (Pennebaker et al, 2001), which identifies what percentage of a given text can be attributed to particular personalities and mental states. However, in more recent years, Archer and Lansley (2015) and McQuaid et al (2015) have applied corpus linguistic methods to the field, using Wmatrix to investigate between truthful and deceptive corpora. However, there has never been a large-scale and systematic corpus study using deceptive spoken data. Similarly, up until now, the sociolinguistic nature of deception has never been investigated.

In this talk, I will discuss the state-of-the-art in deception detection and identify a series of issues with that early work. In particular, I will discuss how certain methodological decisions can impact on the quality and validity of results that arise from the data, and how a different method of analysis can lead to more intuitive and nuanced findings. I will explain how I created a corpus of deception by following best practice in increasing motivation, cognitive load, and ecological validity. I will then discuss how more traditional corpus linguistic methods (such as keyword analysis, effect size measures, dispersion, and concordance analysis), combined with a more flexible, user-friendly analysis tool, can provide further insight into the sociolinguistic nature of deceptive discourse.

References
Archer, D. and C. Lansley. (2015). Public appeals, news interviews and crocodile tears: an argument for multi-channel analysis. Corpora. Vol. 10(2): 231-258.

McQuaid, S., M. Woodworth, E. Hutton, S. Porter, and L. ten Brinke. (2015). ‘Automated insights: verbal cues to deception in real-life high-stakes lies’. Psychology, Crime and Law. Pp. 1-

Pennebaker, James & E. Francis, M & J. Booth, R. (2001). Linguistic Inquiry and Word Count (LIWC): LIWC2001. Mahwah: Lawrence Erlbaum Associates.

TIME & PLACE
1100-1200, Wed 06th Feb, County South B89

All are welcome to attend.

Rayson – Wmatrix for forensic linguistics: a practical hands-on demo

UCREL CRS and FORGE are pleased to announce the next speaker for this year’s seminar series: Dr Paul Rayson (Lancaster). Details of his talk can be found below:

TITLE
Wmatrix for forensic linguistics: a practical hands-on demo

ABSTRACT
Wmatrix was originally conceived in the REVERE project (1998-2001) as a web interface to facilitate the availability of Natural Language Processing (NLP) and Corpus Linguistics (CL) tools and methods to software engineers who were studying legacy systems through document archaeology alone (Rayson et al 2001, 2005). Since then, its web interface has been extended to expose more underlying details of the language analysis rather than hiding them away, and it has supported applications of NLP and CL methods in many other areas such as political discourse analysis, tracing facework, corpus stylistics, metaphor analysis, topic modelling, evaluating problem based learning and the language of illness. In the short talk at the beginning of this session, I will highlight applications in forensic, legal, and policing settings, for example: online child protection (Rashid et al 2013), predicting collective action (Charitonidis et al 2017), scientific fraud (Markowitz and Hancock 2014), and studies of the language of international criminal tribunals (Potts and Kjær 2015), sex offenders (Lord et al 2008), extremism and counter extremism (Prentice et al 2012), and psychopaths (Hancock et al 2013). In the remainder of the two-hour session, participants will follow the online tutorials which introduce the key semantic domains method. We will use the new version 4 of Wmatrix running on a dedicated server with secure HTTPS access, which went public in December 2018. Users will be provided with existing manifesto datasets but you are welcome to bring your own English corpora to upload.

Charitonidis C., Rashid A., Taylor P.J. (2017) Predicting Collective Action from Micro-Blog Data. In: Kawash J., Agarwal N., Özyer T. (eds) Prediction and Inference from Social Networks and Social Media. Lecture Notes in Social Networks. Springer, Cham

Jeffrey T. Hancock, Michael T. Woodworth and Stephen Porter (2013) Hungry like the wolf: A word-pattern analysis of the language of psychopaths. Legal and Criminological Psychology. Volume 18, Issue 1, pages 102-114. http://dx.doi.org/10.1111/j.2044-8333.2011.02025.x

Lord V, Davis B, Mason P. 2008. Stance-shifting in language used by sex offenders. Psychology, Crime & Law 14, 357-379.

Markowitz DM, Hancock JT (2014) Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel. PLoS ONE 9(8): e105937. doi:10.1371/journal.pone.0105937

Potts, A. and Kjær, A.L. (2015) Constructing Achievement in the International Criminal Tribunal for the Former Yugoslavia (ICTY): A Corpus-Based Critical Discourse Analysis. International Journal for the Semiotics of Law. doi: 10.1007/s11196-015-9440-y

Prentice, S, Rayson, P & Taylor, P 2012, ‘The language of Islamic extremism: towards an automated identification of beliefs, motivations and justifications’ International Journal of Corpus Linguistics, vol. 17, no. 2, pp. 259-286. DOI: 10.1075/ijcl.17.2.05pre

Rashid, A, Baron, A, Rayson, P, May-Chahal, C, Greenwood, P & Walkerdine, J 2013, ‘Who am I? Analysing Digital Personas in Cybercrime Investigations’ Computer, vol. 46, no. 4, pp. 54-61. DOI: 10.1109/MC.2013.68

Rayson, P., Emmet, L., Garside, R., & Sawyer, P. (2001). The REVERE project: Experiments with the application of probabilistic NLP to systems engineering. In Natural Language Processing and Information Systems – 5th International Conference on Applicationsof Natural Language to Information Systems, NLDB 2000, Revised Papers (pp. 288-300).

Sawyer, P., Rayson, P., & Cosh, K. (2005). Shallow Knowledge as an Aid to Deep Understanding in Early-Phase Requirements Engineering. DOI: 10.1109/TSE.2005.12

TIME & PLACE
1100-1300, Wed 16th Jan, Management School A001c (PC/Learning Lab)

Larner – How children and young people disclose sexual abuse

The FORGE is delighted to announce our first external guest speaker: Dr Sam Larner (MMU). Details of his talk are below:

TITLE
How Children and Young People Disclose Sexual Abuse: A linguistic analysis of NSPCC ChildLine online chat transcripts

NOTES
THIS TALK IS ON A TOPIC, AND WILL CONTAIN EXTRACTS OF DATA, THAT SOME MAY FIND DISTRESSING.

DISCRETION IS STRONGLY ADVISED.

ABSTRACT
Research indicates that when children and young people make the difficult decision to disclose that they have been sexually abused, their linguistic capabilities may limit the extent to which they can make a full and clear disclosure. This may be problematic from a safeguarding perspective since the recipient of the disclosure may not realize or fully appreciate what the child or young person is trying to disclose, or even that an attempt at disclosure is being made. Whilst the process of, and barriers to, disclosure have been extensively researched, the linguistic strategies used to communicate disclosure have received relatively little attention. In order to provide a novel perspective, this research addresses the question ‘How do children and young people disclose that they have been sexually abused?’ Online chat conversations in which sexual abuse was disclosed (n=40) between children and young people (aged 10—18 years old) and ChildLine counsellors were analysed. Whilst some children and young people do use explicit terms to describe sexual abuse, these are predominantly used to seek definitions and clarification. Furthermore, counsellors play an instrumental role in recognising that a disclosure is being made, and then eliciting and reframing the disclosure as sexual abuse. The findings provide insight into why some victims of sexual abuse report having attempted to tell an adult but feel like they were not heard. This raises questions about how disclosures are made in other contexts and whether institutional safeguarding policies are fit for purpose.

BIO
Dr Sam Larner holds a BA (Hons.) in Linguistics from Lancaster University, an MA (Distinction) in Forensic Linguistics from Cardiff University, and a Ph.D. in Forensic Linguistics from Aston University. He is a Fellow of the Higher Education Academy, a member of the International Association of Forensic Linguists, and a member of the British Association for Applied Linguistics. Dr Larner’s experience in forensic linguistics spans over ten years. He joined Manchester Metropolitan University in 2015, and he has also held lectureships at the University of Central Lancashire and Newman University as well as giving guest lectures in the Czech Republic and Germany.

TIME & PLACE
1100-1200, Wed 12th Dec, County South B89

Luther – Nudging Eyewitnesses: The Effect of Social Influence on Recalling Witnessed Events

The FORGE is pleased to announce the next speaker for this year’s seminar series: Dr Kirk Luther (Lancaster). Details of his talk can be found below:

TITLE
Nudging Eyewitnesses: The Effect of Social Influence on Recalling Witnessed Events

ABSTRACT
Interviewing witnesses and victims (i.e., interviewees) is a core component of policing. Interviewers were likely not present when a crime was committed, and therefore must obtain information about what happened from interviewees. Due to the importance of interviews for solving crimes, researchers continue to explore ways to enhance interviewee recall. One promising area that has received relatively limited attention as an interviewing tool is social influence. The goals of the current experiment are to determine the extent to which various social influence techniques are able to enhance witness recall beyond what can be achieved when such techniques are absent, and to compare the relative performance of the social influence strategies.

TIME & PLACE
1100-1200, Wed 21st Nov, County South B89

Warmelink – “If you go down in the woods today…”

The FORGE is pleased to announce the next speaker for this year’s seminar series: Dr Lara Warmelink (Lancaster). Details of her talk can be found below:

TITLE
“If you go down in the woods today…”

ABSTRACT
Psychologists use different types of automatic language tagging to help analyse participants’ statements in a quick and low cost way. Erik Mac Giolla, Sofia Calderon, Kalle Ask, Timothy Luke and I (all psychologists) were studying the effect of veracity on people’s concreteness when speaking about future actions. We hypothesised that liars would be less concrete than truth tellers. We received data from 6 studies in which participants were interviewed about their future plans, with instruction to either lie or tell the truth. The statements’ concreteness was measured using two automatic language taggers: one based on a 40.000 word dictionary of words rated for concreteness (Brysbaert, Warriner, & Kuperman, 2014) and one based on the Linguistics Category Model (Seih, Beier & Pennebaker, 2017), which uses Treetagger and WordSmith. Both analysis showed that there was no difference between liars and truth tellers in their levels of concreteness. We also found no correlation between the two measures, which led to some concerns about the validity of one (or both?) of the measures. This talk will discuss the problems we encountered and invite your thoughts about the usefulness of operationalizing psychological concepts by language tagging.

TIME & PLACE
1100-1200, Wed 31st Oct, County South B89