Vanroose-Peer – The Risk of Data Contamination in (Forensic) Voice Identification: a Perception Experiment using Voice-mixed Speech Samples

FACTOR is pleased to announce our next talk by Scarlett Varoose-Peer (FACTOR, LAEL), in partnership with PhonLab:

TITLE

The Risk of Data Contamination in (Forensic) Voice Identification: a Perception Experiment using Voice-mixed Speech Samples

ABSTRACT

Data contamination is a widely acknowledged phenomenon across many forensic disciplines. However, in an apparent single-speaker sample obtained from a multi-speaker recording, how could we establish whether that sample has been ‘mixed’ with other voices present in that original recording? This talk introduces ‘voice-mixing’ – a form of data contamination that specifically impacts voice identification analyses. In doing so, this talk details the design and results of a perception experiment in which 121 lay listeners were exposed to 21 audio samples created using the West Yorkshire Regional English Dataset (WYRED; Gold, Ross & Earnshaw, 2018): 6 were ‘authentic’ (i.e., non-voice-mixed) and 15 were ‘voice-mixed’. The voice-mixed samples ranged from low-to-high similarity speaker pairs. Using a 5-point Likert scale, listeners reported the extent to which they believed each sample to be voice-mixed/authentic. The results revealed that there was not a significant relationship between actual exposure to a voice-mixed sample and perceiving it as voice-mixed. However, on modelling the degree of similarity within the voice-mixed samples, this revealed a significant relationship: listeners were more likely to rate high-similarity voice-mixed samples as ‘authentic’ than they were for lower similarity samples. The results also revealed that several participant-intrinsic and participant-extrinsic factors impacted the perception of voice-mixed versus authentic samples. Taken together, I argue that data contamination- and specifically voice-mixing, poses a very real risk in (forensic) voice identification. In a climate of striving for trust in intelligence operations and the criminal justice system, it is time we start to address these risks.

TIME & PLACE

W07, 1500-1550, Tue 18th Nov 2025

In-person: County South B89

Online: Teams

Find information on how to get to campus here, and how to navigate campus buildings here.