{"id":10,"date":"2023-01-01T00:00:12","date_gmt":"2023-01-01T00:00:12","guid":{"rendered":"https:\/\/wp.lancs.ac.uk\/botornot\/?page_id=10"},"modified":"2026-04-16T16:56:44","modified_gmt":"2026-04-16T16:56:44","slug":"text-editions","status":"publish","type":"page","link":"https:\/\/wp.lancs.ac.uk\/botornot\/home\/text-editions\/","title":{"rendered":"Text editions"},"content":{"rendered":"<p>The <a href=\"https:\/\/lancasteruni.eu.qualtrics.com\/jfe\/form\/SV_77BzuG0OQBlwX7E\"><em>Text Edition<\/em><\/a> (v2) of <strong>Bot or Not?<\/strong> (est. 2023) examines a question that looks relatively mundane on the surface, but that is in fact criminologically and societally significant:<\/p>\n<blockquote><p><strong>how well can readers distinguish between authentic and AI-generated online reviews, and how confident are they in doing so?<\/strong><\/p><\/blockquote>\n<div id=\"attachment_59\" style=\"width: 260px\" class=\"wp-caption alignright\"><a href=\"http:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/qr_bonte_v2.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-59\" class=\"wp-image-59 size-full\" src=\"http:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/qr_bonte_v2.png\" alt=\"BoNTE QR code\" width=\"250\" height=\"250\" srcset=\"https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/qr_bonte_v2.png 250w, https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/qr_bonte_v2-150x150.png 150w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/><\/a><p id=\"caption-attachment-59\" class=\"wp-caption-text\">Want to share this quiz?<br \/>Feel free to use this QR code<\/p><\/div>\n<p>Led by\u00a0<a href=\"http:\/\/www.lancaster.ac.uk\/linguistics\/about\/people\/claire-hardaker\">Prof Claire Hardaker<\/a> and\u00a0<a href=\"http:\/\/www.lancaster.ac.uk\/linguistics\/about\/people\/georgina-brown\">Dr Georgina Brown<\/a>\u00a0with major research assistance from <a href=\"https:\/\/www.linkedin.com\/in\/amy-dixon-9a7a2a212\/\">Amy Dixon<\/a>, this strand of the overall Bot or Not suite focuses specifically on hotel reviews. Fake reviews are a long-standing problem in platform economies, and generative AI has merely industrialised the practice. Some estimates suggest that fake reviews cost the global economy <a href=\"https:\/\/www.weforum.org\/agenda\/2021\/08\/fake-online-reviews-are-a-152-billion-problem-heres-how-to-silence-them\/\">$152 billion annually<\/a>, and figures like this have prompted some countries to pass legislation requiring platforms to proactively detect and remove such content (e.g. the <a href=\"https:\/\/www.legislation.gov.uk\/ukpga\/2024\/13\/contents\">UK&#8217;s Digital Markets, Competition and Consumers Act 2024<\/a>). The problem moves beyond fraud, however. Fake positive reviews on enticingly cheap technology presents a national security concern as compromised goods flood into homes, schools, and workplaces.<\/p>\n<p>Large language models such as ChatGPT make it trivial to produce high-volume, stylistically fluent endorsements for products, services, or even entirely fictitious businesses, allowing the execution of widescale fraud and fraud-adjacent behaviour built on deceptive, persuasive content.<\/p>\n<p>The central research questions for the Text Editions are:<\/p>\n<ul>\n<li>What is overall detection accuracy for AI-generated vs. human-written reviews?<\/li>\n<li>How well calibrated are participants\u2019 confidence judgements?<\/li>\n<li>What linguistic or stylistic cues do readers report relying on?<\/li>\n<li>Are these cues diagnostic, or illusory?<\/li>\n<\/ul>\n<h2>Design overview (v2)<\/h2>\n<p>Participants are presented with <strong>15 reviews<\/strong> randomly drawn from a large bank of 400 reviews overall. This bank consists of&#8230;<\/p>\n<ul>\n<li>200 human-authored positive hotel reviews from the <a href=\"https:\/\/www.kaggle.com\/datasets\/rtatman\/deceptive-opinion-spam-corpus\/data\">Deceptive Opinion Spam Corpus<\/a> (<a href=\"https:\/\/www.cs.cornell.edu\/courses\/cs4740\/2012sp\/lectures\/op_spamACL2011.pdf\">Ott et al 2011<\/a>), and<\/li>\n<li>200 AI-generated positive hotel reviews created by our intern, Amy Dixon.<\/li>\n<\/ul>\n<p>As with other editions, a participant\u2019s set may be:<\/p>\n<ul>\n<li>entirely human-authored<\/li>\n<li>entirely AI-generated<\/li>\n<li>or, more likely, a mixture<\/li>\n<\/ul>\n<p>Moreover, the AI-generated reviews fall into three categories:<\/p>\n<ul>\n<li>minimal prompt engineering, e.g. &#8220;write a positive review of Hotel XYZ&#8221;<\/li>\n<li>moderate prompt engineering, e.g. &#8220;write a positive review of Hotel XYZ, include specific details&#8221;<\/li>\n<li>extensive prompt engineering, e.g. &#8220;write a positive review of Hotel XYZ, include an array of specific details such as A, B, and C, and include some typos&#8221;<\/li>\n<\/ul>\n<p>All hotels are anonymised as <em>Hotel XYZ<\/em> to prevent brand familiarity effects.<\/p>\n<p>For each review, participants:<\/p>\n<ol>\n<li>Classify the review as <strong>human<\/strong>\u00a0or\u00a0<strong>AI-generated<\/strong><\/li>\n<li>Provide pre- and post-task <strong>confidence ratings<\/strong><\/li>\n<li>Offer\u00a0<strong>qualitative explanations<\/strong>\u00a0of the cues or reasoning underlying their decisions (submitted prior to score feedback)<\/li>\n<\/ol>\n<p>A key methodological refinement in v2 is the <strong>disaggregation of accuracy and confidence<\/strong> (see below for more on this). Participants\u2019 scores are based solely on the correctness of their binary classifications. Confidence is measured independently to enable analysis of:<\/p>\n<ul>\n<li>calibration (confidence vs. accuracy alignment)<\/li>\n<li>overconfidence or underconfidence trends<\/li>\n<li>shifts in metacognitive awareness across the task<\/li>\n<\/ul>\n<p>This design brings the Text Edition into alignment with the Music and Speech editions, facilitating cross-modal comparison.<\/p>\n<h2>Current performance snapshot (v2)<\/h2>\n<p>How well do people do at this particular quiz? Our latest summary statistics (Feb 2026) are as follows:<\/p>\n<table style=\"border-collapse: collapse;width: 100%;height: 48px\">\n<tbody>\n<tr style=\"height: 24px\">\n<td style=\"width: 16.6667%;height: 24px\"><strong>Responses<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Lowest score<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Highest score<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Mean<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>SD<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Variance<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px\">\n<td style=\"width: 16.6667%;height: 24px\">589<\/td>\n<td style=\"width: 16.6667%;height: 24px\">1<\/td>\n<td style=\"width: 16.6667%;height: 24px\">15<\/td>\n<td style=\"width: 16.6667%;height: 24px\">9.10<\/td>\n<td style=\"width: 16.6667%;height: 24px\">2.28<\/td>\n<td style=\"width: 16.6667%;height: 24px\">5.19<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-52 aligncenter\" src=\"http:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/b6df0670-39eb-488a-9bb4-30c93de804d2.jpg\" alt=\"Feb 2026: BoNTE results\" width=\"720\" height=\"800\" srcset=\"https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/b6df0670-39eb-488a-9bb4-30c93de804d2.jpg 720w, https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/b6df0670-39eb-488a-9bb4-30c93de804d2-270x300.jpg 270w, https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/b6df0670-39eb-488a-9bb4-30c93de804d2-676x751.jpg 676w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/p>\n<h2>Scope (v2)<\/h2>\n<p>The dataset supports work in:<\/p>\n<ul>\n<li>metacognition and judgement under uncertainty<\/li>\n<li>platform governance and consumer protection<\/li>\n<li>AI literacy and public resilience<\/li>\n<li>adversarial text generation modelling<\/li>\n<\/ul>\n<p>Of particular interest is the gap between <strong>perceived cues<\/strong> (as reported in free-text explanations) and <strong>statistically predictive cues<\/strong>. For instance, participants articulate rationales that do not always correspond to reliable discriminators &#8211; a finding with implications well beyond hotel reviews. Confidence and competence measures are also particularly interesting.<\/p>\n<h2>What about version 1?<\/h2>\n<p>The first iteration of the Text Edition closed on 13 May 2025 with <strong>957 completed responses<\/strong>. In v1 (the very first quiz in the entire Bot or Not suite), rather than being collected separately, confidence was embedded directly within each response option using a five-point categorical scale:<\/p>\n<ul>\n<li>Definitely human<\/li>\n<li>Maybe human<\/li>\n<li>Not sure<\/li>\n<li>Maybe bot<\/li>\n<li>Definitely bot<\/li>\n<\/ul>\n<p>Scoring was weighted:<\/p>\n<ul>\n<li>1 point for a correct \u201cdefinitely\u201d judgement<\/li>\n<li>0.5 points for a correct \u201cmaybe\u201d judgement<\/li>\n<\/ul>\n<p>While this gamified structure provided useful gradience data and a measure per sample, it also introduced a potential behavioural distortion. Because stronger responses yielded higher scores, participants may have been incentivised to overstate certainty to maximise performance. In other words, the measurement instrument risked shaping the phenomenon it aimed to observe.<\/p>\n<p>For this reason, v2 separates binary classification from confidence reporting, removing score-based incentives for overclaiming certainty and enabling cleaner calibration analysis.<\/p>\n<h2>Final scores snapshot (v1)<\/h2>\n<table style=\"border-collapse: collapse;width: 100%;height: 48px\">\n<tbody>\n<tr style=\"height: 24px\">\n<td style=\"width: 16.6667%;height: 24px\"><strong>Responses<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Lowest score<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Highest score<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Mean<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>SD<\/strong><\/td>\n<td style=\"width: 16.6667%;height: 24px\"><strong>Variance<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 24px\">\n<td style=\"width: 16.6667%;height: 24px\">957<\/td>\n<td style=\"width: 16.6667%;height: 24px\">1<\/td>\n<td style=\"width: 16.6667%;height: 24px\">15<\/td>\n<td style=\"width: 16.6667%;height: 24px\">7.19<\/td>\n<td style=\"width: 16.6667%;height: 24px\">2.34<\/td>\n<td style=\"width: 16.6667%;height: 24px\">5.48<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-53 aligncenter\" src=\"http:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/e7778244-f02a-43d2-ace4-1e5259f1d273.jpg\" alt=\"Feb 2026: BoNTE v1 results\" width=\"720\" height=\"800\" srcset=\"https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/e7778244-f02a-43d2-ace4-1e5259f1d273.jpg 720w, https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/e7778244-f02a-43d2-ace4-1e5259f1d273-270x300.jpg 270w, https:\/\/wp.lancs.ac.uk\/botornot\/files\/2026\/02\/e7778244-f02a-43d2-ace4-1e5259f1d273-676x751.jpg 676w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><br \/>\nAcross both versions, the Text Edition continues to ask a simple but increasingly consequential question: when reading persuasive online prose, what convinces us that a human is behind it, and how often are we mistaken?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Text Edition (v2) of Bot or Not? (est. 2023) examines a question that looks relatively mundane on the surface, but that is in fact criminologically and societally significant: how well can readers distinguish between authentic and AI-generated online reviews, and how confident are they in doing so? Led by\u00a0Prof Claire Hardaker and\u00a0Dr Georgina Brown\u00a0with [&hellip;]<\/p>\n","protected":false},"author":77,"featured_media":50,"parent":2,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-10","page","type-page","status-publish","has-post-thumbnail","hentry"],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages\/10","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/users\/77"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/comments?post=10"}],"version-history":[{"count":20,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages\/10\/revisions"}],"predecessor-version":[{"id":163,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages\/10\/revisions\/163"}],"up":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/pages\/2"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/media\/50"}],"wp:attachment":[{"href":"https:\/\/wp.lancs.ac.uk\/botornot\/wp-json\/wp\/v2\/media?parent=10"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}