{"id":12,"date":"2024-01-01T00:00:28","date_gmt":"2024-01-01T00:00:28","guid":{"rendered":"https:\/\/wp.lancs.ac.uk\/botornot\/?page_id=12"},"modified":"2026-02-22T12:14:36","modified_gmt":"2026-02-22T12:14:36","slug":"speech-editions","status":"publish","type":"page","link":"https:\/\/wp.lancs.ac.uk\/botornot\/home\/speech-editions\/","title":{"rendered":"Speech editions"},"content":{"rendered":"<p>The <a href=\"https:\/\/lancasteruni.eu.qualtrics.com\/jfe\/form\/SV_bCOFSkGXuzd1C9U\"><em>Speech Edition<\/em><\/a> (v2) of <strong>Bot or Not?<\/strong> (est. 2024) investigates a question with clear security, forensic, and societal implications:<\/p>\n<blockquote><p><strong>how accurately can listeners distinguish between human and AI-generated speech \u2014 and how well calibrated is their confidence?<\/strong><\/p><\/blockquote>\n<div id=\"attachment_70\" style=\"width: 260px\" class=\"wp-caption alignright\"><a href=\"http:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/qr_bonse_v2.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-70\" class=\"wp-image-70 size-full\" src=\"http:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/qr_bonse_v2.png\" alt=\"BoNSE QR code\" width=\"250\" height=\"250\" srcset=\"https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/qr_bonse_v2.png 250w, https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/qr_bonse_v2-150x150.png 150w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/><\/a><p id=\"caption-attachment-70\" class=\"wp-caption-text\">Want to share this quiz?<br \/>Feel free to use this QR code<\/p><\/div>\n<p>Led by\u00a0<a href=\"http:\/\/www.lancaster.ac.uk\/linguistics\/about\/people\/claire-hardaker\">Prof Claire Hardaker<\/a>,\u00a0<a href=\"http:\/\/www.lancaster.ac.uk\/linguistics\/about\/people\/georgina-brown\">Dr Georgina Brown<\/a>, and\u00a0<a href=\"http:\/\/www.lancaster.ac.uk\/linguistics\/about\/people\/hope-mcvean\">Hope McVean<\/a>, this Edition examines perceptual discrimination in the context of rapidly advancing voice synthesis. Synthetic speech systems now produce highly naturalistic prosody, intonation, and timbre. While this technology has legitimate applications, it is also routinely exploited in fraud, defamation, impersonation, disinformation, and harassment contexts. The evidential stakes are therefore considerably higher than mere curiosity.<\/p>\n<h2>Design overview (v2)<\/h2>\n<p>Participants are presented with <strong>15 speech samples<\/strong> of around 3-5 seconds in length (longer, we would note, than the phrase &#8220;My voice is my password&#8221;) which have been randomly drawn from a larger curated bank of <strong>400 recordings<\/strong>. This bank is created from:<\/p>\n<ul>\n<li>200 authentic human speech samples<\/li>\n<li>200 AI-generated speech samples<\/li>\n<\/ul>\n<p>All materials are drawn from <a href=\"https:\/\/www.asvspoof.org\/index2019.html\">ASVspoof 2019<\/a>, a widely used benchmark dataset in automatic speaker verification and anti-spoofing research. This allows the perceptual findings to sit alongside computational work conducted internationally.<\/p>\n<p>Just as the Text and Music Editions have varying quality levels in the AI-generated content, so too does the Speech Edition. Samples are striated across both <strong>voice quality<\/strong> (how good we felt they were, ranging from very poor\/obvious AI voices through to extremely convincing, human-like voices) and <strong>recording quality<\/strong> (phone quality, internet quality, studio quality):<\/p>\n<table style=\"border-collapse: collapse;width: 100%;height: 192px\">\n<tbody>\n<tr style=\"height: 96px\">\n<td style=\"width: 20%;height: 96px\">\n<p style=\"text-align: right\"><strong>Line Quality<\/strong><\/p>\n<p style=\"text-align: left\"><strong>Voice quality<\/strong><\/p>\n<\/td>\n<td style=\"width: 20%;text-align: center;height: 96px\"><strong>Phone<\/strong><\/p>\n<p>(poor)<\/td>\n<td style=\"width: 20%;text-align: center;height: 96px\"><strong>Internet<\/strong><\/p>\n<p>(medium)<\/td>\n<td style=\"width: 20%;text-align: center;height: 96px\"><strong>Studio<\/strong><\/p>\n<p>(excellent)<\/td>\n<td style=\"width: 20%;text-align: center;height: 96px\"><strong>Total<\/strong><\/p>\n<p>&nbsp;<\/td>\n<\/tr>\n<tr style=\"height: 24px\">\n<td style=\"width: 20%;height: 24px;text-align: left\"><strong>Poor<\/strong><\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\">22<\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\">22<\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\">23<\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\"><strong>67<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px\">\n<td style=\"width: 20%;height: 24px;text-align: left\"><strong>Medium<\/strong><\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\">22<\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\">23<\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\">22<\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\"><strong>67<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px\">\n<td style=\"width: 20%;height: 24px;text-align: left\"><strong>Excellent<\/strong><\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\">23<\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\">22<\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\">22<\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\"><strong>67<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px\">\n<td style=\"width: 20%;height: 24px;text-align: left\"><strong>Total<\/strong><\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\"><strong>67<\/strong><\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\"><strong>67<\/strong><\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\"><strong>67<\/strong><\/td>\n<td style=\"width: 20%;height: 24px;text-align: center\"><strong>201<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This is to test whether, for instance, a poor quality line such as a phone recording may mask more artefacts of AI generated speech versus a studio quality recording which may only allow the most convincing instances of AI generated language to pass as human.<\/p>\n<p>As with other editions, any given participant may receive all human samples,\u00a0all AI samples,\u00a0or a mixture. For each sample, participants:<\/p>\n<ol>\n<li>Make a <strong>binary judgement<\/strong> (human or bot)<\/li>\n<li>Provide <strong>confidence ratings<\/strong> (pre- and post-test)<\/li>\n<li>Submit a <strong>free-text explanation<\/strong> describing the cues or reasoning underlying their decision (before receiving their score).<\/li>\n<\/ol>\n<p>This structure enables analysis of:<\/p>\n<ul>\n<li>overall detection accuracy<\/li>\n<li>human vs. AI hit rates<\/li>\n<li>confidence calibration<\/li>\n<li>metacognitive shifts across the task<\/li>\n<li>reported cue salience versus statistically predictive features<\/li>\n<\/ul>\n<p>As with the other tests, separating classification from confidence allows cleaner modelling of judgement under uncertainty.<\/p>\n<h2>Current performance snapshot (v2)<\/h2>\n<p>How well do people do at this particular quiz? Our latest summary statistics (Feb 2026) are as follows:<\/p>\n<table style=\"border-collapse: collapse;width: 100%;height: 48px\">\n<tbody>\n<tr style=\"height: 24px\">\n<td style=\"width: 16.6667%;height: 24px\"><strong>Responses<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Lowest score<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Highest score<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Mean<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>SD<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Variance<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px\">\n<td style=\"width: 16.6667%;height: 24px\">1,161<\/td>\n<td style=\"width: 16.6667%;height: 24px\">4<\/td>\n<td style=\"width: 16.6667%;height: 24px\">15<\/td>\n<td style=\"width: 16.6667%;height: 24px\">10.46<\/td>\n<td style=\"width: 16.6667%;height: 24px\">2.10<\/td>\n<td style=\"width: 16.6667%;height: 24px\">4.43<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><a href=\"http:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/b8563274-7142-414d-b30e-370d6d6eeb2b.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-107\" src=\"http:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/b8563274-7142-414d-b30e-370d6d6eeb2b.jpg\" alt=\"\" width=\"720\" height=\"800\" srcset=\"https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/b8563274-7142-414d-b30e-370d6d6eeb2b.jpg 720w, https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/b8563274-7142-414d-b30e-370d6d6eeb2b-270x300.jpg 270w, https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/b8563274-7142-414d-b30e-370d6d6eeb2b-676x751.jpg 676w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/a><\/p>\n<h2>Analytical scope<\/h2>\n<p>The Speech Edition (v2) supports work in:<\/p>\n<ul>\n<li>forensic phonetics and speaker comparison<\/li>\n<li>deception and impersonation detection<\/li>\n<li>human-AI perceptual modelling<\/li>\n<li>risk assessment in fraud and social engineering<\/li>\n<li>public resilience to synthetic media<\/li>\n<\/ul>\n<p>Of particular interest is the mismatch &#8211; frequently observed in related work &#8211; between the cues listeners <em>believe<\/em> are diagnostic (e.g., &#8220;accents&#8221;, &#8220;flat intonation&#8221;, &#8220;breaths\/breathing&#8221;, &#8220;odd pacing&#8221;) and those that empirically predict correct classification. Understanding this gap is essential if training, regulation, or technical safeguards are to be evidence-based rather than intuition-driven.<\/p>\n<h2>What about version 1?<\/h2>\n<p>The initial Speech Edition (v1) was swiftly implemented for a short notice event as a modest little online form comprising <strong>12 speech samples<\/strong>. You can even still play it <a href=\"https:\/\/docs.google.com\/forms\/d\/e\/1FAIpQLSc6557bcuDNlO8Nz-0BHOyq6EkmtstKbqZSIGZKBp2rPJg3yQ\/viewform\">here<\/a>. Participants received a score at the end, but this version didn&#8217;t collect:<\/p>\n<ul>\n<li>confidence measures<\/li>\n<li>qualitative explanations<\/li>\n<li>scores<\/li>\n<\/ul>\n<p>Although several hundred individuals completed the task and we learned a lot from being present whilst people undertook the quiz, the absence of stored response data limited its research utility. The simplicity of the instrument was a win for our needs in that moment, but we quickly missed the methodological depth. For these reasons, the project was redeveloped into the current Qualtrics-based v2, massively expanding the stimulus bank, capturing richer response data, and aligning the Speech Edition methodologically with the Text and Music editions.<\/p>\n<p>(But we do love a scrappy quick fix, which is why we&#8217;ve never had the heart to take it down.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Speech Edition (v2) of Bot or Not? (est. 2024) investigates a question with clear security, forensic, and societal implications: how accurately can listeners distinguish between human and AI-generated speech \u2014 and how well calibrated is their confidence? Led by\u00a0Prof Claire Hardaker,\u00a0Dr Georgina Brown, and\u00a0Hope McVean, this Edition examines perceptual discrimination in the context of [&hellip;]<\/p>\n","protected":false},"author":77,"featured_media":87,"parent":2,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-12","page","type-page","status-publish","has-post-thumbnail","hentry"],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages\/12","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/users\/77"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/comments?post=12"}],"version-history":[{"count":11,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages\/12\/revisions"}],"predecessor-version":[{"id":131,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages\/12\/revisions\/131"}],"up":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages\/2"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/media\/87"}],"wp:attachment":[{"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/media?parent=12"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}