Literature Reviews

Paper review: A Replication Study: Machine Learning (ML) Models Are Capable of Predicting Sexual Orientation From Facial Images

Paper review: A Replication Study: Machine Learning (ML) Models Are Capable of Predicting Sexual Orientation From Facial Images by John Leuner

Objectives: The aim of this paper was to replicate previous studies that used ML to predict sexual orientation from facial images. Included was a new ML model based on highly blurred images to investigate whether the information present in the colours of the face and immediate background were predictive of sexual orientation. Head pose and the presence of facial hair or eyewear were investigated.

Results:
Replicating previous studies but with a new dataset not limited by country or race, both deep learning classifiers and facial morphology classifiers performed better than humans on photographs from dating profiles. A new ML model that tests whether a blurred image can be used to predict sexual orientation is introduced. Using predominant colour information present in the face and background, the author found this to be predictive of sexual orientation.
The author states that this study demonstrates that if someone intentionally alters their appearance to fit gay or straight stereotypes, the ML does not alter the sexual orientation label. Models are still able to predict sexual orientation even whilst controlling for the presence or absence of facial hair.
So What: A Chinese study (physiognomy) claims to be able to detect criminality from identity photographs, this type of research has serious legal and ethical implications. https://arxiv.org/pdf/1902.10739.pdf

Literature Reviews, Uncategorized

Review: Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Justin Grimmer and Brandon M.Stewart (Human Review)

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Justin Grimmer and Brandon M.Stewart

This paper although published nearly ten years ago still has some valid points in today’s world as it discusses that “language is the medium for politics” and policy, whether spoken or written. In our quest to understand politics, from terrorist manifestos to peace treaties, we need to know what political actors are actually saying. The authors caution around using automated methods as the premise of applying careful human thought and robust validation are needed to ensure rigour. But with today’s ever evolving technology is this still the case?

To understand politics we need to ascertain what is actually being said and by whom, in whatever medium it is delivered. However, the volume of material is massive and hiring people to read and code is expensive and scholars cannot do it all themselves. Automated content analysis methods can make this type of analysis possible. The authors do state that automated methods “amplify and augment” careful reading and thoughtful analysis, and their paper takes the reader though all the steps needed for this content analysis. Firstly acquiring the documents, pre-processing them and seeing if they meet the research objective, followed by classification, categorisation and then unpacking the content further. Automated content analysis methods can make the previously impossible possible. Despite the authors initial reservations they offer guidelines on this “exciting area of research” minimising misconceptions and errors and describe “best practice validations across diverse research objectives and models”. Four principals of automated text analysis are identified and the authors encourage revisiting these often during research, these are as follows:

1. All quantitative models of language are wrong – but some are useful. i.e. a complicated dependency structure in a sentence could change the meaning.
2. Quantitative methods for text amplify resources and augment humans.
3. There is no globally best method for text analysis. i.e. there are a lot of different packages available, one of which may suit a particular dataset better than another.

4. Validate, validate, validate. i.e. avoid the blind use of any one method without validation.
The authors point out that automated content analysis methods provide many tools that can be used to measure what is of interest, there is no one size fits all. Whichever tool is chosen needs to be content specific. New texts probably need new methods and ten years ago they identified that commonalities would allow “scholars to share creative solutions to common problems”. Important questions could be answered by the analysis of large collections of texts, but if the methods are applied without rigour then few relevant answers will be forthcoming. When undertaking text analysis it is important to realise the limits of statistical models and the field of political science will be revolutionised by the application of automated models.

The overwhelming message of this paper is that textural measurement, the discovery of new methods and inference points allow us to build upon scientific interpretation and theory, and the journey does indeed continue at pace. Machine learning techniques have revolutionised our ability to analyse vast quantities of text, data and images rapidly and cheaply.

Link to paper: https://web.stanford.edu/~jgrimmer/tad2.pdf UK Defence Science and Technology Laboratory