Articles, Literature Reviews

“In the Artificial Intelligence (AI) Science boom, beware: your results are only as good as your data.”

Hunter Moseley shines a light on how we can make our experimental results more trustworthy.  Thoroughly vetting them before and after publication will ensure huge complex data sets are both accurate and valid.  We need to question results and papers; just because it has been published does not mean it is accurate or even correct in spite of who the author may be and their credentials.

The key to ensuring the accuracy of these results is reproducibility, careful examination of the data with peers and other research groups investigating the outcomes.  This is vitally important with a  data set that is used in new applications.  Mosely and his colleagues found something unexpected when they investigated some recent research papers.   Duplicates appeared in the data sets which were used in three papers meaning they were corrupt.

In machine learning it is usual to split a data set in two and to use one subset to train a model and the other to evaluate the performance of this model.  With no overlap between training and testing subsets, performance in the testing phase will reflect how well the model learns and performs.  However, in their examination they found what they described as a “catastrophic data leakage” problem in that the two subsets were cross contaminated, thereby messing up the ideal separation.  About one quarter of the dataset in question was represented more than once, corrupting the cross validation steps.  After cleaning up the data sets and applying the published methods again the observed performance was a lot less impressive with a drop in the accuracy score from 0.94 to 0.82.  A score of 0.94 is reasonably high and “indicates that the algorithm is usable in many scientific applications”, but at 0.82 it is useful but with limitations and then “only if handled appropriately”.

So what?

Studies that are published with flawed results obviously call research into question.  If researchers do not  make their code and methods fully available then this type of error can occur.  If high performance is reported this may lead to other researchers not attempting to improve on results, feeling that “their algorithms are lacking in comparison.”  Some journals like to publish reviews of successful results so this could prevent progress in research as it is not considered valid or even worth publishing!

Encouraging reproducibility:

Moseley argues that a measured approach is needed.  Where transparency is demonstrated with data, code and full results being available, a thorough evaluation and identification of the problematic dataset would allow an author to correct their work. Another of his solutions is to retract studies with highly flawed results and little or no support for reproducible research.  Scientific reproducibility should not be an option.

Researchers at all levels will need to learn to treat published data with a degree of scepticism, the research community does not want to repeat others’ mistakes.  But data sets are complex, especially when using AI.  Making these data sets and the code used to analyse them available will benefit the original authors, help validate the research and ensure rigour in the research community.

Link to full article in Nature.

Articles, Literature Reviews

Hypotheses devised by AI could find “blind spots” in research

Could “Artificial Intelligence (AI) have a creative role in the scientific process” was a question posed in 2023 by a group of researchers in Stockholm. AI is already being used in literature searches, to automate data collection, run statistical analyses and even for drafting some parts of industry and academic papers. Sendhil Mullainathan, an economist at the University of Chicago Booth School of Business in Illinois has suggested using AI to generate hypotheses and stated “it’s probably been the single most exhilarating kind of research I’ve ever done in my life”.

AI could help with creativity as using large language models (LLM’s) to create new text, even if it is inaccurate, it could lead to a statement such as: “here’s a kind of thing that looks true”; when you think about it, this is exactly what a hypothesis is! These “hallucinations” are sometimes likely to be something that a human would not make and could aid thinking outside of the box.

Hypotheses are on a spectrum from concrete and specific to the abstract and general, using AI in areas where fundamentals remain hidden could generate insights. For example we know there is this behaviour happening, but we do not know why, could the AI identify some rules that could possibly be applied to this situation? James Evans, a sociologist at the University of Chicago says AI systems that generate hypotheses based purely on machine learning require a lot of data. Should we be looking to build AI that goes beyond “matching pattens” but can also be guided by known laws? Rose Yu, a computer scientist at the University of California, San Diego states that it would be a “powerful way to include understanding the limits is crucial, people still need scientific knowledge into AI systems”.

Ross King a computer scientist at Chalmers University of Technology in Gothenburg is o think in a critical way. Is a coordinated campaign building robotic systems that perform experiments. Factors are being adjusted subtly in his “‘Genesis’ systems allowing these robot scientists to be more constant, unbiased. cheap, efficient and transparent than humans”.

Hypothesis generation by AI is not new, in the 1980’s Don Swanson pioneered “literature based discovery” with some software he created called “Arrowsmith” that searched for indirect connections and proposed for example that fish oil might help treat Raynaud’s syndrome, where human circulation is limited in the hands. This hypothesis when taken forward was proved to be correct in that it decreased the bloods viscosity leading to improved circulation.

Data gathering is becoming more automated and automating hypothesis generation could become an important factor as there is more data being generated than humans can handle. Scaling up “intelligent, adaptive questions” will ensure that this capacity is not wasted.
So What? This approach could lead to valid hypotheses being developed which are clear and broad in areas where the underlying principals are poorly understood. A panacea perhaps to “researchers block” to unlock blind spots? For Defence this could mean helping to avoid group think, encourage more innovation outside of the chain of command and enabling things to be done differently in an often slow to change organisation. AI could prove to be a lot more useful than performing Literature Reviews.

Full article: Nature magazine

Literature Reviews

Social Media Algorithms warp how people learn from each other

Social Media Algorithms warp how people learn from each other, research shows.

William Brady, Assistant Professor of Management and Organisations at Northwestern University.

Interactions especially on social media are influenced by the flow of information controlled by algorithms. These algorithms are amplifying the information that sustains engagement – and could be described as “click bait”. Brady suggests that a side effect of this clicking and returning to the platforms is that “algorithms amplify information that people are strongly biased to learn from”. He has called this “PRIME” – prestigious, in-group, moral and emotional information. This type of learning is not new and would have served a purpose from an evolutionary perspective – learning from prestigious individuals is efficient as we can copy the successful behaviour. Also from a moral point of view, those who violate moral norms can be sanctioned as it would help the community maintain cooperation. With social media this PRIME information is giving a poor signal as prestige can be faked and our feeds can be full of negative and moral information which will lead to conflict rather than cooperation. This can foster dysfunction as social learning should support cooperation and problem solving, but the algorithms are designed to increase engagement only. Brady calls this “mismatch functional misalignment”.

So what, why does this matter?

People can start to form incorrect perceptions of their social world, this can lead to a polarisation of their political views, seeing the “in group” and “out-group” as being more sharply divided than divided than they really are. The author also found that the more a post is shared the more outrage it generates. So when these algorithms amplify moral and emotional information the to think in a critical way. Is a coordinated campaign misinformation is included in this and is itself then amplified.

What next?

Research in this area is new and there is some controversy around whether this type of online polarisation being amplified spills over into the offline world is debateable. More research is needed to understand the outcomes that occur “when humans and algorithms interact in feedback loops of social learning”. For research to continue ethical concerns such as privacy need to be considered. Brady would like to see “what can be done to make algorithms foster accurate human social learning rather than exploit social learning biases”. He suggests we need an algorithm that “increases engagement while also penalising PRIME information”.

Link: https://www.scientificamerican.com/article/social-media-algorithms-warp-how- people-learn-from-each-other/

Literature Reviews

Review of paper: “Fooled twice: People cannot detect deep fakes but think they can”

– Nils C Kobis, Barbora Dolezalova & Ivan Soraperra

In this study the authors show that people cannot reliably detect deep fakes, even if they had their awareness raised and received a financial incentive, their detection accuracy was still poor. People appear to be biased towards mistaking deep fakes as authentic videos rather than the other way around and they also overestimate their detection abilities. Is seeing really believing?

These manipulated images, whilst entertaining can have a dark side. Large scale use of facial images are being used to create fake porn movies of both men and women which could impact their reputation; or in the case of a fake voice remove the life savings from someone. Calwell et al, 2020 ranked the malicious use of deep fakes as the number one emerging AI threat to consider.

This is an issue as the ability to create a deep fake using Generative Adversarial Networks (GANs) is not just in the realm of the experts but accessible to anyone, expert knowledge is not required. Extensive research in judgement and how people make decisions shows that people often use mental shortcuts (heuristics) when establishing the veracity of items online. This could, the authors posit, lead to people becoming oversensitive to online content and then fail to believe anything – even genuine authentic announcements by politicians. However, the counter argument is that fake videos are the exception to the rule and “seeing is believing” is still the dominant heuristic. This study tested both these competing biases – “liars divided versus seeing is believing”.

The results of the study showed that people struggled to identify deep fake videos due to their inability to do so, not just that they were lacking in motivation. They also found that people were overly optimistic with a systematic bias exhibited towards guessing that the videos were authentic.

It could be argued that humans process moving visual information more effectively than other sensory data, results showed a slightly better than chance result and this is worse than when static images are used. Could this be due to inattention? More research is needed in this area.

The authors also found two related biases in human deep fake detection, participants were told 50% of the videos were fake, but still 67.4% were deemed to be authentic, so this was not related to their ability to guess so not deliberate – they were using their judgement. The other bias was related to the “Dunning Kruger”* effect, people over estimated their ability to detect deep fakes, particularly low performers were over confident. Overall people did really think that “seeing is believing”.

Conclusion – Deep fakes will undermine knowledge acquisition as our ability to detect them is not due to a lack of motivation but an inability to do so. The videos used in this study did not have an emotional content which may have yielded different results. More work is definitely needed in this area.

Link to the paper here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8602050/ Reference: Caldwell M., Andrews J.T., Tanay T., Griffin L.D. AI-enabled future crime. Crime Sci. 2020;9:1–13. [Google Scholar]

*The Dunning-Kruger effect occurs when a person’s lack of knowledge and skills in a certain area cause them to overestimate their own competence.

Literature Reviews

Paper review: A Replication Study: Machine Learning (ML) Models Are Capable of Predicting Sexual Orientation From Facial Images

Paper review: A Replication Study: Machine Learning (ML) Models Are Capable of Predicting Sexual Orientation From Facial Images by John Leuner

Objectives: The aim of this paper was to replicate previous studies that used ML to predict sexual orientation from facial images. Included was a new ML model based on highly blurred images to investigate whether the information present in the colours of the face and immediate background were predictive of sexual orientation. Head pose and the presence of facial hair or eyewear were investigated.

Results:
Replicating previous studies but with a new dataset not limited by country or race, both deep learning classifiers and facial morphology classifiers performed better than humans on photographs from dating profiles. A new ML model that tests whether a blurred image can be used to predict sexual orientation is introduced. Using predominant colour information present in the face and background, the author found this to be predictive of sexual orientation.
The author states that this study demonstrates that if someone intentionally alters their appearance to fit gay or straight stereotypes, the ML does not alter the sexual orientation label. Models are still able to predict sexual orientation even whilst controlling for the presence or absence of facial hair.
So What: A Chinese study (physiognomy) claims to be able to detect criminality from identity photographs, this type of research has serious legal and ethical implications. https://arxiv.org/pdf/1902.10739.pdf

Literature Reviews, Uncategorized

Review: Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Justin Grimmer and Brandon M.Stewart (Human Review)

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Justin Grimmer and Brandon M.Stewart

This paper although published nearly ten years ago still has some valid points in today’s world as it discusses that “language is the medium for politics” and policy, whether spoken or written. In our quest to understand politics, from terrorist manifestos to peace treaties, we need to know what political actors are actually saying. The authors caution around using automated methods as the premise of applying careful human thought and robust validation are needed to ensure rigour. But with today’s ever evolving technology is this still the case?

To understand politics we need to ascertain what is actually being said and by whom, in whatever medium it is delivered. However, the volume of material is massive and hiring people to read and code is expensive and scholars cannot do it all themselves. Automated content analysis methods can make this type of analysis possible. The authors do state that automated methods “amplify and augment” careful reading and thoughtful analysis, and their paper takes the reader though all the steps needed for this content analysis. Firstly acquiring the documents, pre-processing them and seeing if they meet the research objective, followed by classification, categorisation and then unpacking the content further. Automated content analysis methods can make the previously impossible possible. Despite the authors initial reservations they offer guidelines on this “exciting area of research” minimising misconceptions and errors and describe “best practice validations across diverse research objectives and models”. Four principals of automated text analysis are identified and the authors encourage revisiting these often during research, these are as follows:

1. All quantitative models of language are wrong – but some are useful. i.e. a complicated dependency structure in a sentence could change the meaning.
2. Quantitative methods for text amplify resources and augment humans.
3. There is no globally best method for text analysis. i.e. there are a lot of different packages available, one of which may suit a particular dataset better than another.

4. Validate, validate, validate. i.e. avoid the blind use of any one method without validation.
The authors point out that automated content analysis methods provide many tools that can be used to measure what is of interest, there is no one size fits all. Whichever tool is chosen needs to be content specific. New texts probably need new methods and ten years ago they identified that commonalities would allow “scholars to share creative solutions to common problems”. Important questions could be answered by the analysis of large collections of texts, but if the methods are applied without rigour then few relevant answers will be forthcoming. When undertaking text analysis it is important to realise the limits of statistical models and the field of political science will be revolutionised by the application of automated models.

The overwhelming message of this paper is that textural measurement, the discovery of new methods and inference points allow us to build upon scientific interpretation and theory, and the journey does indeed continue at pace. Machine learning techniques have revolutionised our ability to analyse vast quantities of text, data and images rapidly and cheaply.

Link to paper: https://web.stanford.edu/~jgrimmer/tad2.pdf UK Defence Science and Technology Laboratory

Literature Reviews

Review: “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts” (Automatic Review)

The paper “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts” by Justin Grimmer and Brandon M. Stewart, published in the Political Analysis journal in 2013, addresses the increasing use of automatic content analysis methods in political science research. The authors argue that these methods have the potential to offer significant advantages over traditional manual content analysis, but also pose important challenges that must be addressed.
The authors begin by outlining the benefits of automatic content analysis methods, including the ability to analyze large amounts of text quickly and accurately, the potential to detect patterns and relationships that would be difficult or impossible for human analysts to discern, and the ability to replicate findings across multiple studies. They also acknowledge, however, that automatic methods are not without limitations, such as difficulties in capturing the nuances of language, the potential for errors in coding, and the need for careful attention to issues of measurement and validity.

To address these challenges, the authors propose a framework for evaluating the quality of automatic content analysis methods, based on three key criteria: validity, reliability, and generalizability. They argue that these criteria should be used to assess the quality of automated methods in political science research, and provide a detailed discussion of how each criterion can be operationalized.

The authors also provide examples of how automated content analysis methods can be used in political science research, including the analysis of presidential speeches and legislative texts, the identification of ideological or partisan biases in news coverage, and the detection of patterns in social media data. They demonstrate how automated methods can be used to generate insights that would be difficult or impossible to obtain using manual methods, such as identifying the specific rhetorical strategies used by politicians to appeal to different audiences.

Finally, the authors acknowledge that the use of automated content analysis methods in political science research is still in its infancy, and that there is much work to be done to refine and improve these methods. They conclude by calling for continued research in this area, with a focus on developing more sophisticated and accurate methods for analyzing political texts, as well as exploring the potential for integrating automated content analysis with other data sources, such as survey data or experimental data.

In summary, Grimmer and Stewart’s paper argues that automated content analysis methods offer great promise for political science research, but also pose important challenges that must be addressed. The authors provide a framework for evaluating the quality of automated methods, as well as examples of how these methods can be used to generate insights in political science research. They call for continued research in this area, with a focus on refining and improving these methods, and exploring their potential for integration with other data sources.

Link to paper: https://web.stanford.edu/~jgrimmer/tad2.pdf