TITLE
“It’s the story, stupid!” How MARV (Multivariate Analysis of Register Variation) can save the world from fake news.
ABSTRACT
Computer-aided fake news detection can be a useful complement to human efforts. On its own, fact-checking is often too slow to prevent the viral spread of disinformation; debunking news stories and communicating corrections can also have a backfire effect of reinforcing the false belief (Lazer et al. 2018). Most computational methods frame fake news detection as a text classification task (Shu et al. 2017) and so require data pre-labelled for veracity. However, the complexities of defining fake news (e.g. fabricated facts or undisclosed advertising?), the different types of fake news (imposter news vs. low-quality news vs. inaccurate news), the difficulty in establishing objective ground truth as well as the weaponization and dilution of ‘fake news’ as a concept leave the collection of pre-labelled data fraught with epistemological issues.
Semi-supervised multivariate statistical techniques may overcome these limitations by modelling news veracity as a latent variable whose value can be estimated from the presence of deception clues and a novel deception scoring approach. This study tested the hypothesis that i) there is significant linguistic variation within the online news genre and that ii) variation is correlated with deceptive situational parameters of communication. Multivariate register analysis was conducted on 5000 stories from the political section of 15 online news sources selected as representative of the online news ecosystem (i.e. a mix of UK and US legacy and new online media from across the full political spectrum). Linguistic parameters were defined from a feature set combining lexico-grammatical and cohesion-based features; situational parameters were drawn from expert-defined fake news detection heuristics and used to calculate a deception score. Visualisation techniques (Diwersy, Evert and Neumann, 2014) were used to assess whether this situational analysis revealed any dimensions of deception and deceptive text clusters.
The study found that linguistic variation in the online news genre is highly correlated with the probability of veracity, with absence of narrative the main indicator of potential deception. This result was unexpected as storytelling is generally associated with deception. However, in the context of a profession which places supreme value on the news story it makes sense that narrative register is a key veracity indicator. Semi-supervised multivariate analysis with deception scoring emerges as a viable alternative to text classification for automated deception detection in epistemologically challenging genres.
REFERENCES
Diwersy, S., Evert, S. and Neumann, S., 2014. A weakly supervised multivariate approach to the study of language variation. Aggregating dialectology, typology, and register analysis. linguistic variation in text and speech, pp.174-204.
Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., Metzger, M.J., Nyhan, B., Pennycook, G., Rothschild, D. and Schudson, M., 2018. The science of fake news. Science, 359(6380), pp.1094-1096.
Shu, K., Sliva, A., Wang, S., Tang, J. and Liu, H., 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), pp.22-36.
BIO
Olu Popoola is a PhD candidate researching methods for cross-domain deception detection at the University of Birmingham, and moonlights as a deception detection trainer and OSINT investigator. By day, Olu is a Teaching Fellow at Aston University where he teaches information integrity to future health professionals (a third career, following ten years in advertising and consumer research and another ten in English language teaching). Olu is married with two canal boats and a cat.
TIME & PLACE
1100-1200, Wed 20th Mar, County South B89