Gillings – Building a corpus of deception: methodological and analytical considerations

FORGE is delighted to announce a talk by our upcoming internal speaker: Mathew Gillings (Linguistics & English Language). Details of his talk are below:

Building a corpus of deception: methodological and analytical considerations

The field of deception detection has been largely dominated by researchers from the field of psychology, and it is clear that those in linguistics have a lot more to offer than they have done thus far. Previous work in this area has primarily been carried out using LIWC (Pennebaker et al, 2001), which identifies what percentage of a given text can be attributed to particular personalities and mental states. However, in more recent years, Archer and Lansley (2015) and McQuaid et al (2015) have applied corpus linguistic methods to the field, using Wmatrix to investigate between truthful and deceptive corpora. However, there has never been a large-scale and systematic corpus study using deceptive spoken data. Similarly, up until now, the sociolinguistic nature of deception has never been investigated.

In this talk, I will discuss the state-of-the-art in deception detection and identify a series of issues with that early work. In particular, I will discuss how certain methodological decisions can impact on the quality and validity of results that arise from the data, and how a different method of analysis can lead to more intuitive and nuanced findings. I will explain how I created a corpus of deception by following best practice in increasing motivation, cognitive load, and ecological validity. I will then discuss how more traditional corpus linguistic methods (such as keyword analysis, effect size measures, dispersion, and concordance analysis), combined with a more flexible, user-friendly analysis tool, can provide further insight into the sociolinguistic nature of deceptive discourse.

Archer, D. and C. Lansley. (2015). Public appeals, news interviews and crocodile tears: an argument for multi-channel analysis. Corpora. Vol. 10(2): 231-258.

McQuaid, S., M. Woodworth, E. Hutton, S. Porter, and L. ten Brinke. (2015). ‘Automated insights: verbal cues to deception in real-life high-stakes lies’. Psychology, Crime and Law. Pp. 1-

Pennebaker, James & E. Francis, M & J. Booth, R. (2001). Linguistic Inquiry and Word Count (LIWC): LIWC2001. Mahwah: Lawrence Erlbaum Associates.

1100-1200, Wed 06th Feb, County South B89

All are welcome to attend.

Rayson – Wmatrix for forensic linguistics: a practical hands-on demo

UCREL CRS and FORGE are pleased to announce the next speaker for this year’s seminar series: Dr Paul Rayson (Lancaster). Details of his talk can be found below:

Wmatrix for forensic linguistics: a practical hands-on demo

Wmatrix was originally conceived in the REVERE project (1998-2001) as a web interface to facilitate the availability of Natural Language Processing (NLP) and Corpus Linguistics (CL) tools and methods to software engineers who were studying legacy systems through document archaeology alone (Rayson et al 2001, 2005). Since then, its web interface has been extended to expose more underlying details of the language analysis rather than hiding them away, and it has supported applications of NLP and CL methods in many other areas such as political discourse analysis, tracing facework, corpus stylistics, metaphor analysis, topic modelling, evaluating problem based learning and the language of illness. In the short talk at the beginning of this session, I will highlight applications in forensic, legal, and policing settings, for example: online child protection (Rashid et al 2013), predicting collective action (Charitonidis et al 2017), scientific fraud (Markowitz and Hancock 2014), and studies of the language of international criminal tribunals (Potts and Kjær 2015), sex offenders (Lord et al 2008), extremism and counter extremism (Prentice et al 2012), and psychopaths (Hancock et al 2013). In the remainder of the two-hour session, participants will follow the online tutorials which introduce the key semantic domains method. We will use the new version 4 of Wmatrix running on a dedicated server with secure HTTPS access, which went public in December 2018. Users will be provided with existing manifesto datasets but you are welcome to bring your own English corpora to upload.

Charitonidis C., Rashid A., Taylor P.J. (2017) Predicting Collective Action from Micro-Blog Data. In: Kawash J., Agarwal N., Özyer T. (eds) Prediction and Inference from Social Networks and Social Media. Lecture Notes in Social Networks. Springer, Cham

Jeffrey T. Hancock, Michael T. Woodworth and Stephen Porter (2013) Hungry like the wolf: A word-pattern analysis of the language of psychopaths. Legal and Criminological Psychology. Volume 18, Issue 1, pages 102-114.

Lord V, Davis B, Mason P. 2008. Stance-shifting in language used by sex offenders. Psychology, Crime & Law 14, 357-379.

Markowitz DM, Hancock JT (2014) Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel. PLoS ONE 9(8): e105937. doi:10.1371/journal.pone.0105937

Potts, A. and Kjær, A.L. (2015) Constructing Achievement in the International Criminal Tribunal for the Former Yugoslavia (ICTY): A Corpus-Based Critical Discourse Analysis. International Journal for the Semiotics of Law. doi: 10.1007/s11196-015-9440-y

Prentice, S, Rayson, P & Taylor, P 2012, ‘The language of Islamic extremism: towards an automated identification of beliefs, motivations and justifications’ International Journal of Corpus Linguistics, vol. 17, no. 2, pp. 259-286. DOI: 10.1075/ijcl.17.2.05pre

Rashid, A, Baron, A, Rayson, P, May-Chahal, C, Greenwood, P & Walkerdine, J 2013, ‘Who am I? Analysing Digital Personas in Cybercrime Investigations’ Computer, vol. 46, no. 4, pp. 54-61. DOI: 10.1109/MC.2013.68

Rayson, P., Emmet, L., Garside, R., & Sawyer, P. (2001). The REVERE project: Experiments with the application of probabilistic NLP to systems engineering. In Natural Language Processing and Information Systems – 5th International Conference on Applicationsof Natural Language to Information Systems, NLDB 2000, Revised Papers (pp. 288-300).

Sawyer, P., Rayson, P., & Cosh, K. (2005). Shallow Knowledge as an Aid to Deep Understanding in Early-Phase Requirements Engineering. DOI: 10.1109/TSE.2005.12

1100-1300, Wed 16th Jan, Management School A001c (PC/Learning Lab)