Literature Reviews

Review: “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts” (Automatic Review)

The paper “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts” by Justin Grimmer and Brandon M. Stewart, published in the Political Analysis journal in 2013, addresses the increasing use of automatic content analysis methods in political science research. The authors argue that these methods have the potential to offer significant advantages over traditional manual content analysis, but also pose important challenges that must be addressed.
The authors begin by outlining the benefits of automatic content analysis methods, including the ability to analyze large amounts of text quickly and accurately, the potential to detect patterns and relationships that would be difficult or impossible for human analysts to discern, and the ability to replicate findings across multiple studies. They also acknowledge, however, that automatic methods are not without limitations, such as difficulties in capturing the nuances of language, the potential for errors in coding, and the need for careful attention to issues of measurement and validity.

To address these challenges, the authors propose a framework for evaluating the quality of automatic content analysis methods, based on three key criteria: validity, reliability, and generalizability. They argue that these criteria should be used to assess the quality of automated methods in political science research, and provide a detailed discussion of how each criterion can be operationalized.

The authors also provide examples of how automated content analysis methods can be used in political science research, including the analysis of presidential speeches and legislative texts, the identification of ideological or partisan biases in news coverage, and the detection of patterns in social media data. They demonstrate how automated methods can be used to generate insights that would be difficult or impossible to obtain using manual methods, such as identifying the specific rhetorical strategies used by politicians to appeal to different audiences.

Finally, the authors acknowledge that the use of automated content analysis methods in political science research is still in its infancy, and that there is much work to be done to refine and improve these methods. They conclude by calling for continued research in this area, with a focus on developing more sophisticated and accurate methods for analyzing political texts, as well as exploring the potential for integrating automated content analysis with other data sources, such as survey data or experimental data.

In summary, Grimmer and Stewart’s paper argues that automated content analysis methods offer great promise for political science research, but also pose important challenges that must be addressed. The authors provide a framework for evaluating the quality of automated methods, as well as examples of how these methods can be used to generate insights in political science research. They call for continued research in this area, with a focus on refining and improving these methods, and exploring their potential for integration with other data sources.

Link to paper: https://web.stanford.edu/~jgrimmer/tad2.pdf

Uncategorized

CAISS goes to AI UK, London March 2023

Around 3,000 delegates attended the QE2 Centre for AI UK. One of the most popular sessions dealt with the much hyped ChatGPT and was delivered by Gary Marcus, Emeritus Professor of Psychology and Neural Science at New York University. He began by stating that although we have a lot of individual AI solutions (for example, GPS) so far there is not a general purpose system that will do everything for us. ChatGPT is the one most advanced and reliable system to date, taking in massive amounts of data and has good guardrails, so it will not for example write an article on the benefits of eating glass! But is it the universal panacea?

Problems:

  • It will make things up and it can even give references for fake information, there is an illusion that adding more information will mitigate the incorrect outputs.
  • After completing eight million chess games, it still does not understand the rules.
  • Driverless cars involves deep learning, this is not AI. This technology is just memorising situations and is unable to cope with unusual events. The system cannot reason in the
  • same way that a human being does.
  • If the circumstance is not in the training set it won’t know what to do, in Chat GPT4
  • (which is the latest version) we do not know yet what that training data set is?

Positives:

  • It can help with de-bugging, it can write pieces of code that are 30% correct and then humans can fix them, this is easier than starting from scratch, the “best use case”.
  • It can write letters, stories, songs and prose, it is fun, fluent and good with grammar.
  • Large Language Models (LLMs) can be used to write articles – looks good but they have errors. If someone does not know the facts though it could be believed. But if it is a story and fiction, does this matter?

Worries and considerations:

Chat GPT is being used at scale, leading to misinformation and a possible polluting of democracy, there is an opportunity for fake information, potential discriminatory, stereotypical or even offensive responses. The 2024 US Presidential Election could be a concern, as the technology could be used by State Actors or as an advertising tool – leading to a spread of misinformation that appears plausible. It can write fictitious news reports, describe data etc. e.g. Covid 19 versus vaccines, the results will look authoritative. This could result in millions of fake tweets/posts in a day output via “troll farms”. Large Language Models (LLM) without guardrails are already being used on the dark web. ChatGPT has been used in a programme to solve CAPTURES – when challenged the bot said it was a person with a visual disability! Already it is being used in credit card scams and phishing attacks.

Classical AI is about facts, LLM’s do not know how to fact check e.g. Elon Musk has died in a car crash – we can check this as humans. With LLM’s, as this is such a wide and fast moving area, should we be looking at them in the same way that we would look at a new drug? Possible controlled releases with a pause in place for a “safety check”?

AI literacy is important for future generations – understanding the limits is crucial, people still need to think in a critical way. Is a coordinated campaign needed to fully understand and warn about the limits of such technology?

Other presentations included Professor Lynn Gladden on Integrating AI for Science and Government, Public Perceptions of AI, how we can “do better in data science and AI”, the on-line safety bill, creating economic and societal impact, what can data science do for policy makers and individual skills for global impact. Overall it was a fascinating two days with many opinions and high profile speakers under the overarching banner of open research, collaboration and inclusion.

Link: https://ai-uk.turing.ac.uk/