Byte

Sir Paul McCartney has used AI to complete a Beatles song that was never finished. limits of such technology?

Using machine learning Sir Paul said they managed to “lift” the late John Lennon’s voice and get the piece of work completed. By extracting elements of his voice from a “ropey little bit of a cassette”, the 1978 song entitled “Now and Then” will hopefully be released later this year. “We had John’s voice and a piano and he could separate them with AI. They tell the machine, ‘That’s the voice. This is a guitar. Lose the guitar’. This was not a “hard days night” and was certainly faster than “eight days a week”, it will be interesting to hear the finished result and we may be “glad all over” that they did not “let it be”.

Link here: https://www.bbc.co.uk/news/entertainment-arts-65881813

Literature Reviews

Review of paper: “Fooled twice: People cannot detect deep fakes but think they can”

– Nils C Kobis, Barbora Dolezalova & Ivan Soraperra

In this study the authors show that people cannot reliably detect deep fakes, even if they had their awareness raised and received a financial incentive, their detection accuracy was still poor. People appear to be biased towards mistaking deep fakes as authentic videos rather than the other way around and they also overestimate their detection abilities. Is seeing really believing?

These manipulated images, whilst entertaining can have a dark side. Large scale use of facial images are being used to create fake porn movies of both men and women which could impact their reputation; or in the case of a fake voice remove the life savings from someone. Calwell et al, 2020 ranked the malicious use of deep fakes as the number one emerging AI threat to consider.

This is an issue as the ability to create a deep fake using Generative Adversarial Networks (GANs) is not just in the realm of the experts but accessible to anyone, expert knowledge is not required. Extensive research in judgement and how people make decisions shows that people often use mental shortcuts (heuristics) when establishing the veracity of items online. This could, the authors posit, lead to people becoming oversensitive to online content and then fail to believe anything – even genuine authentic announcements by politicians. However, the counter argument is that fake videos are the exception to the rule and “seeing is believing” is still the dominant heuristic. This study tested both these competing biases – “liars divided versus seeing is believing”.

The results of the study showed that people struggled to identify deep fake videos due to their inability to do so, not just that they were lacking in motivation. They also found that people were overly optimistic with a systematic bias exhibited towards guessing that the videos were authentic.

It could be argued that humans process moving visual information more effectively than other sensory data, results showed a slightly better than chance result and this is worse than when static images are used. Could this be due to inattention? More research is needed in this area.

The authors also found two related biases in human deep fake detection, participants were told 50% of the videos were fake, but still 67.4% were deemed to be authentic, so this was not related to their ability to guess so not deliberate – they were using their judgement. The other bias was related to the “Dunning Kruger”* effect, people over estimated their ability to detect deep fakes, particularly low performers were over confident. Overall people did really think that “seeing is believing”.

Conclusion – Deep fakes will undermine knowledge acquisition as our ability to detect them is not due to a lack of motivation but an inability to do so. The videos used in this study did not have an emotional content which may have yielded different results. More work is definitely needed in this area.

Link to the paper here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8602050/ Reference: Caldwell M., Andrews J.T., Tanay T., Griffin L.D. AI-enabled future crime. Crime Sci. 2020;9:1–13. [Google Scholar]

*The Dunning-Kruger effect occurs when a person’s lack of knowledge and skills in a certain area cause them to overestimate their own competence.

Articles

Deep fakes – a cause for concern?

an example of how deep fake images can be dangerousIn this issue we wanted to take a look at deep fakes and how easy it is to detect them. Image manipulation/editing is nothing new, and deep fakes are the latest in a long line of techniques used for manipulation. Joseph Stalin had people removed from photographic images of him so he was not seen to be associating with the “wrong type of people”.

What is a Deep fake? Deep fakes refer to audio, image, text or video that have been automatically synthesised by a machine learning system and AI. Deep fake technology can be used to create highly realistic images or videos that may depict people saying or doing something that they did not. For example, recent images have circulated of the Pope wearing a large white “puffer” coat, something he never did. Link here: https://www.bloomberg.com/news/newsletters/2023-04-06/pope- francis-white-puffer-coat-ai-image-sparks-deep-fake-concerns

  • Public concern: The public are concerned about the misuse of deep fakes, they are hard to detect and technology is advancing rapidly. The public have limited understanding, and there is a risk of public misinformation especially as the deep fakes become more sophisticated. It is good to look for inconsistencies when trying to decide if an image is a fake, such as mismatched earrings, inconsistent eye blinking etc.
  • Worries and considerations: Deep fakes are increasingly being used for malicious purposes, such as the creation of pornography, and modern tools for creating them are readily available and increasing in sophistication yielding better and better results. Even though public awareness is increasing, the ability to detect a deep fake is not. However some recent research has shone a lens on who might be better at detecting them.
  • Research by Ganna Pogrebna: Ganna is a decision theorist and a behavioural scientist working at the Turing Institute. She recently gave a talk by Zoom on her empirical study into “Temporal Evolution of Human Perceptions and Detection of Deep fakes”. Ganna identified a range of personality traits (37) which could be measured using psychological measurement scales e.g. Anxiety, extraversion, self- esteem etc. Based on the description of the trait she then developed an algorithm. The hypothesis was based on the “big five” personality traits (openness, conscientiousness, extraversion, neuroticism and agreeableness).

The study commenced with a small group of 200 people, and has now increased to 3,000 people in each of five different Anglophone countries: UK, US, Canada, Australia, New Zealand. As Ganna has a large group of deep fakes (dataset) she can test using lots of different people not just using images of actors and politicians as in some studies. This has yielded a copious amounts of data, including cross sectional data from representative samples. Each participant was subjected to 6 deep fake algorithm variations in a between subjects design.

  • Results: People’s ability to detect deep fakes gradually declines as the quality of the deep fakes improves. However, those people who show high emotional intelligence, conscientiousness and are prevention focused are better at detecting deep fakes. Neuroticism, resilience, empathy, impulsivity and risk aversion were traits coming in a close second with these people having better results. 2% of participants (which is low) were very good at detecting deep fakes (although no exact definition of “very good” was presented). They have three traits which are statistically scored higher than other participants: conscientiousness, emotional intelligence and prevention focus – they all detect well. General intelligence and knowing about technology does not make you able to detect deep fakes better, testing for general versus emotional intelligence could be an interesting addition to the data. It will be good to see the full results in terms of exact performance and effect size when published.
  • So What: We are getting familiar with deep fakes and with talking about “hallucinations” such as content created by ChatGPT, these are assertions confidently made by algorithms even though they are far removed from the truth. The future technology is exciting, possibilities are endless with new technologies emerging at an exponential rate, but we need to question more than ever what we see and what we read.

Let us know what work you are doing in the deep fake arena – we’d love to hear from you – CAISS@lancaster.ac.uk

Literature Reviews

Paper review: A Replication Study: Machine Learning (ML) Models Are Capable of Predicting Sexual Orientation From Facial Images

Paper review: A Replication Study: Machine Learning (ML) Models Are Capable of Predicting Sexual Orientation From Facial Images by John Leuner

Objectives: The aim of this paper was to replicate previous studies that used ML to predict sexual orientation from facial images. Included was a new ML model based on highly blurred images to investigate whether the information present in the colours of the face and immediate background were predictive of sexual orientation. Head pose and the presence of facial hair or eyewear were investigated.

Results:
Replicating previous studies but with a new dataset not limited by country or race, both deep learning classifiers and facial morphology classifiers performed better than humans on photographs from dating profiles. A new ML model that tests whether a blurred image can be used to predict sexual orientation is introduced. Using predominant colour information present in the face and background, the author found this to be predictive of sexual orientation.
The author states that this study demonstrates that if someone intentionally alters their appearance to fit gay or straight stereotypes, the ML does not alter the sexual orientation label. Models are still able to predict sexual orientation even whilst controlling for the presence or absence of facial hair.
So What: A Chinese study (physiognomy) claims to be able to detect criminality from identity photographs, this type of research has serious legal and ethical implications. https://arxiv.org/pdf/1902.10739.pdf

Literature Reviews, Uncategorized

Review: Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Justin Grimmer and Brandon M.Stewart (Human Review)

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Justin Grimmer and Brandon M.Stewart

This paper although published nearly ten years ago still has some valid points in today’s world as it discusses that “language is the medium for politics” and policy, whether spoken or written. In our quest to understand politics, from terrorist manifestos to peace treaties, we need to know what political actors are actually saying. The authors caution around using automated methods as the premise of applying careful human thought and robust validation are needed to ensure rigour. But with today’s ever evolving technology is this still the case?

To understand politics we need to ascertain what is actually being said and by whom, in whatever medium it is delivered. However, the volume of material is massive and hiring people to read and code is expensive and scholars cannot do it all themselves. Automated content analysis methods can make this type of analysis possible. The authors do state that automated methods “amplify and augment” careful reading and thoughtful analysis, and their paper takes the reader though all the steps needed for this content analysis. Firstly acquiring the documents, pre-processing them and seeing if they meet the research objective, followed by classification, categorisation and then unpacking the content further. Automated content analysis methods can make the previously impossible possible. Despite the authors initial reservations they offer guidelines on this “exciting area of research” minimising misconceptions and errors and describe “best practice validations across diverse research objectives and models”. Four principals of automated text analysis are identified and the authors encourage revisiting these often during research, these are as follows:

1. All quantitative models of language are wrong – but some are useful. i.e. a complicated dependency structure in a sentence could change the meaning.
2. Quantitative methods for text amplify resources and augment humans.
3. There is no globally best method for text analysis. i.e. there are a lot of different packages available, one of which may suit a particular dataset better than another.

4. Validate, validate, validate. i.e. avoid the blind use of any one method without validation.
The authors point out that automated content analysis methods provide many tools that can be used to measure what is of interest, there is no one size fits all. Whichever tool is chosen needs to be content specific. New texts probably need new methods and ten years ago they identified that commonalities would allow “scholars to share creative solutions to common problems”. Important questions could be answered by the analysis of large collections of texts, but if the methods are applied without rigour then few relevant answers will be forthcoming. When undertaking text analysis it is important to realise the limits of statistical models and the field of political science will be revolutionised by the application of automated models.

The overwhelming message of this paper is that textural measurement, the discovery of new methods and inference points allow us to build upon scientific interpretation and theory, and the journey does indeed continue at pace. Machine learning techniques have revolutionised our ability to analyse vast quantities of text, data and images rapidly and cheaply.

Link to paper: https://web.stanford.edu/~jgrimmer/tad2.pdf UK Defence Science and Technology Laboratory

Literature Reviews

Review: “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts” (Automatic Review)

The paper “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts” by Justin Grimmer and Brandon M. Stewart, published in the Political Analysis journal in 2013, addresses the increasing use of automatic content analysis methods in political science research. The authors argue that these methods have the potential to offer significant advantages over traditional manual content analysis, but also pose important challenges that must be addressed.
The authors begin by outlining the benefits of automatic content analysis methods, including the ability to analyze large amounts of text quickly and accurately, the potential to detect patterns and relationships that would be difficult or impossible for human analysts to discern, and the ability to replicate findings across multiple studies. They also acknowledge, however, that automatic methods are not without limitations, such as difficulties in capturing the nuances of language, the potential for errors in coding, and the need for careful attention to issues of measurement and validity.

To address these challenges, the authors propose a framework for evaluating the quality of automatic content analysis methods, based on three key criteria: validity, reliability, and generalizability. They argue that these criteria should be used to assess the quality of automated methods in political science research, and provide a detailed discussion of how each criterion can be operationalized.

The authors also provide examples of how automated content analysis methods can be used in political science research, including the analysis of presidential speeches and legislative texts, the identification of ideological or partisan biases in news coverage, and the detection of patterns in social media data. They demonstrate how automated methods can be used to generate insights that would be difficult or impossible to obtain using manual methods, such as identifying the specific rhetorical strategies used by politicians to appeal to different audiences.

Finally, the authors acknowledge that the use of automated content analysis methods in political science research is still in its infancy, and that there is much work to be done to refine and improve these methods. They conclude by calling for continued research in this area, with a focus on developing more sophisticated and accurate methods for analyzing political texts, as well as exploring the potential for integrating automated content analysis with other data sources, such as survey data or experimental data.

In summary, Grimmer and Stewart’s paper argues that automated content analysis methods offer great promise for political science research, but also pose important challenges that must be addressed. The authors provide a framework for evaluating the quality of automated methods, as well as examples of how these methods can be used to generate insights in political science research. They call for continued research in this area, with a focus on refining and improving these methods, and exploring their potential for integration with other data sources.

Link to paper: https://web.stanford.edu/~jgrimmer/tad2.pdf

Uncategorized

CAISS goes to AI UK, London March 2023

Around 3,000 delegates attended the QE2 Centre for AI UK. One of the most popular sessions dealt with the much hyped ChatGPT and was delivered by Gary Marcus, Emeritus Professor of Psychology and Neural Science at New York University. He began by stating that although we have a lot of individual AI solutions (for example, GPS) so far there is not a general purpose system that will do everything for us. ChatGPT is the one most advanced and reliable system to date, taking in massive amounts of data and has good guardrails, so it will not for example write an article on the benefits of eating glass! But is it the universal panacea?

Problems:

  • It will make things up and it can even give references for fake information, there is an illusion that adding more information will mitigate the incorrect outputs.
  • After completing eight million chess games, it still does not understand the rules.
  • Driverless cars involves deep learning, this is not AI. This technology is just memorising situations and is unable to cope with unusual events. The system cannot reason in the
  • same way that a human being does.
  • If the circumstance is not in the training set it won’t know what to do, in Chat GPT4
  • (which is the latest version) we do not know yet what that training data set is?

Positives:

  • It can help with de-bugging, it can write pieces of code that are 30% correct and then humans can fix them, this is easier than starting from scratch, the “best use case”.
  • It can write letters, stories, songs and prose, it is fun, fluent and good with grammar.
  • Large Language Models (LLMs) can be used to write articles – looks good but they have errors. If someone does not know the facts though it could be believed. But if it is a story and fiction, does this matter?

Worries and considerations:

Chat GPT is being used at scale, leading to misinformation and a possible polluting of democracy, there is an opportunity for fake information, potential discriminatory, stereotypical or even offensive responses. The 2024 US Presidential Election could be a concern, as the technology could be used by State Actors or as an advertising tool – leading to a spread of misinformation that appears plausible. It can write fictitious news reports, describe data etc. e.g. Covid 19 versus vaccines, the results will look authoritative. This could result in millions of fake tweets/posts in a day output via “troll farms”. Large Language Models (LLM) without guardrails are already being used on the dark web. ChatGPT has been used in a programme to solve CAPTURES – when challenged the bot said it was a person with a visual disability! Already it is being used in credit card scams and phishing attacks.

Classical AI is about facts, LLM’s do not know how to fact check e.g. Elon Musk has died in a car crash – we can check this as humans. With LLM’s, as this is such a wide and fast moving area, should we be looking at them in the same way that we would look at a new drug? Possible controlled releases with a pause in place for a “safety check”?

AI literacy is important for future generations – understanding the limits is crucial, people still need to think in a critical way. Is a coordinated campaign needed to fully understand and warn about the limits of such technology?

Other presentations included Professor Lynn Gladden on Integrating AI for Science and Government, Public Perceptions of AI, how we can “do better in data science and AI”, the on-line safety bill, creating economic and societal impact, what can data science do for policy makers and individual skills for global impact. Overall it was a fascinating two days with many opinions and high profile speakers under the overarching banner of open research, collaboration and inclusion.

Link: https://ai-uk.turing.ac.uk/

 

Uncategorized

CREST Conference BASS22– Lancaster July 2022

The Behavioural and Social Sciences in Security conference BASS22 was held in July at Lancaster University. An International audience were brought together to enhance their understanding of the psychological and social drivers of threats to national security with an explanation of skills, technologies and protective security measures with two workshops on addressing bias in computational social science.

Highlights from the workshops included:

  • Can bias ever be removed in the long term;
  • Problems can be introduced when there is human interaction in the machine.
  • Do Social Science theories introduce bias?
  • How can we use models to our advantage without introducing bias?
  • One key takeaway was, “It is not how the model works, but how can we use the model to our advantage?”
Uncategorized

And we’re off………

The Computational Social Science Hub (CSS Hub) is a collaboration between The Alan Turing Institute (Turing), Lancaster University and Dstl, sitting under the new Defence Centre for AI Research (DCAR). CSS largely focusses on computational approaches to social science – often using large scale data to investigate and model human behaviour and activity. It is made up of social and computer scientists, collaborating on defence and security related problems. After a successful first year, August allowed for a sunny kick off at the Turing Institute in the British library. The team discussed a range of research options to be conducted in the coming years. Our upcoming work covers a range of topics such as Bias, Virtual Reality and Facial Recognition. Additionally, we hope to recruit a PHD student to encourage early researchers into this emerging field.

A primary goal of the CSS Hub is to foster a collaborative community, encouraging work between disciplines. To achieve this we are launching a community group with the Turing, and this regular newsletter focussed on highlighting new CSS work and how it impacts Defence.