Talk Review

CAISS TALK: Dr Lewys Brace – Biases when exploring the “inceolosphere”

The November talk was a great event with Dr Lewys Brace from Exeter University discussing “Biases when exploring online extremist sub-cultures and the “inceolosphere”: examples from the ConCel project”. Incels, short for “involuntary celibate”, is an online sub-culture where individuals define themselves by their inability to form sexual relationships with women. Recent years have seen an increase in the amount of work using large-scale, data-driven, analysis methods to understand such ecosystems, and this work typically utilises text data acquired from online spaces, which is then analysed using Natural Language Processing (NLP) techniques. However, there are several points along the road from data collection, through to interpretation of results where biases can emerge when using such methods on these online sub-cultures.

Lewys talked to us about how bias can appear by:

  • Not selecting the “right” online spaces for gathering data
  • The data collected may not be representative of the extremist ecosystem
  • The initial “seed” list may be biased – (manual checking attempts to reduce this)
  • Data cleaning – as these sites have a specific sub cultural language in use – interrogation of the data in depth helps with this
  • Edgy humour can be a euphemism for racist, misogynistic and homophobic views, is it irony or genuine?
  • Deciding on the measure to use can be problematic – using multiple measures

helps with a “sanity check” and can offer additional insights.
The team used the Fisher Jenks algorithm  which uses an iterative approach to find the best groupings of numbers based on how close they are together; (i.e. based on variance from the group’s mean) while also trying to ensure the different groupings are as distinct as possible (by maximizing the group’s variance between groups). Analyses were also carried out at the micro-level to adopt a context-based approach i.e. integration of ideology with personal life experiences and the macro-level which can cause issues in this case with the use of hateful language. This was mitigated by using violent language and out group terms in the analysis.
A very engaging question and answer session followed covering many aspects of Lewys work such as: group isolation, whether Incels use the dark web (they tend not to), whether the groups can be infiltrated (no, people doing this are spotted, ridiculed and driven out), cross culture (groups are emerging in Japan and Russia), Incel demographics (young, white males in general) and how to track individuals over time. Further work in this area is ongoing using topic modelling and the idea of potential hybrid ideologies.
Link to accompanying report

Talk Review

CAISS TALK: Assistant Professor Xiao Hui Tao – Monitoring internal displacement

CAISS were privileged to have Assistant Professor Xiao Hui Tao from the University of California Davies deliver our December talk.

Xiao Hui talked to us about how mobile phone data is used to monitor internal displacement within a country, in this case Afghanistan. This is especially relevant at present due to current world events. The forced displacement of people is a key cost of violence and Internal Displaced Persons (IDP) are hard to keep track off. This matters in terms of targeting aid more effectively and understanding likely locations of future instability in order to allocate forces or target specific programmes.
The “vast untapped resource” of mobile phone data was utilised to estimate violence induced placement in a granular manner. This was both methodological and substantive i.e. what was the overall effect of violence on displacement in Afghanistan, what factors affected the choice of destination and could the team confirm and test hypotheses from qualitative work gathered from surveys? A large amount of mobile phone data was used: 20 billion transactions, from the anonymised records of 10 million subscribers from Afghanistan’s largest mobile phone operator from April 2013 to March 2017. 398 districts were identified and 5,984 violent events, 13,000 cell towers grouped by proximity into 1,439 tower groups. The results showed that for those in district on a violent day, there was an immediate and statistically significant increase in likelihood of leaving the district. Results also showed that when looking at Islamic State violence versus Taliban violence, there was a larger impact for IS related violence than for the Taliban, this could be credited to the fact that IS have been known to target civilians when for example executions were filmed. There was also a larger impact with recently experienced violence and a smaller impact in provincial capitals.

When being displaced people were not just seeking economic opportunity. Half of those moving from a capital moved to other capitals or major cities. For those moving from non- capitals, more than half went to capitals or major cities with 30% moving to a provincial capital in the same province. The main driver was seeking safety rather than economic opportunities and this is consistent with the narrative. In non-capitals, violence resulted in people seeking safety close to home.

Xiao Hui talked specifically about some of the limitations and mitigating biases:

  • There could be bias in the data sources
  • Check and check again if the results contain bias
  • People could be sharing mobile phones
  • Are phones only being used by the wealthy
  • Are women using phones in a patriarchal society?
  • Is the displacement intra district rather than inter district?
  • Are cell phone towers being destroyed resulting in data of false displacement?

The analysis of this data provided insight into the nature of violence-induced displacement in Afghanistan and helped to quantify some of the human costs of violence that would be difficult to measure using traditional methods such as surveys. While there are definite limitations to what can be observed through mobile phone data, conflict-prone regions are often also the places where traditional survey-based data are the least reliable and most difficult to obtain. This approach could complement traditional perspectives on displacement and eventually contribute to the design of effective policies for prevention and mitigation.

Talk Review

CAISS Talk Series Reports: Professor Wendy Moncur

CAISS were privileged to have Professor Wendy Moncur from the University of Strathclyde deliver our second talk in October.

Wendy leads the Cybersecurity Group and her research focuses on online identity, reputation, trust and cybersecurity and crosses many disciplinary boundaries. Her current research – the 3.6million AP4L project – develops privacy enhancing technologies (PET’s) to support people going through sensitive life transitions. The research is looking at four transition groups: (i) living with cancer, (ii) leaving the armed forces, (iii) LGBT+ and (iv) relationship breakdowns.

Wendy talked to us about “Navigating bias in online privacy research”. She stressed that it is important that we ask the right questions and whilst doing this we also ask the right people. As researchers what do we ourselves “bring” to the research as we use our own “interpretive lens” and it is important that we communicate our findings clearly so that others can understand the results.

Wendy then went on to discuss our individual online identities, this is co-constructed, made up of data about an individual posted by themselves and by other people and organisations. The internet in general is swimming in personal data, the minute we share anything we have lost control – once it is “out there” this information persists. Her explanation of how threads of personal data can be used to construct information regarding an individual was very thought provoking; e.g. if you share your Strava run data then someone can easily ascertain your home address or where you work if you run in your lunch break!

To mitigate against bias in the research Wendy advocated the following:

  • Allow for self reflection
  • Draw out information on digital privacy in sensitive contexts
  • Foster participants’ ability for self-expression
  • Facilitate richer, more comprehensive stories and descriptions
  • Enable non-experts to be heard
  • Avoid assumptions and bias.

To further reduce the researcher bias and ensure that the vocabulary was robust the research team worked hard to increase the list of descriptive terms they used, checked out further terms with the University Librarian and also with the advisory board of people living with the transitions under investigation. This led to a very big list! For the workshops that ensued participants were asked to map their life transition on line with questions as prompts. Then empathy mapping was used to help further remove bias and deliver a shared understanding of the user across the research team. Next metaphor cards were used with the groups asked to consider potential technological solutions as opposed to just challenges, needs and practices. Finally participants ideas were prioritised using the MoSCoW tool (Must have, Should have, Could have, Will not have).

Sociodemographic groups were discussed in that older people 70 plus tend to read but don’t comment on line, 30 to 60 year olds have a lot to say and younger people are happy to share information but in general have a more robust awareness of online security.

Results have indicated that the “Transition Continuum” is not a straight line and this is being explored further. Useful design insight for developing Privacy Enhancing Tools is that people’s experiences are not necessarily linear or instantaneous and can extend over a long period. For the future privacy settings ideally need to be more like a dial than a switch.

Talk Review

CAISS Talk Series Reports: Dr Sharon Glaas

The first CAISS talk was held in September with a fabulous session from Dr Sharon Glaas on “Mitigating Researcher Bias in Linguistic Studies”.

Sharon started by defining bias as “who gets to talk and who is listened to”. For example do stay at home Mums have a voice? Sharon studies linguistics – the systematic study of language and communication – functional and descriptive not prescriptive e.g. linguistic sources of persuasion. She reminded us how the social world is studied based on how it is constructed. Some highlights:

  • Linguistics frequently work in an interdisciplinary, multi disciplinary way – working with other disciplines highlights the issue of bias in ways of thinking.
  •  How you talk about something affects how you view it. E.g. Pro-life versus anti abortion.
  • Constructivist versus positivist perspectives, the social world versus the real or natural world – what is the truth and how is meaning perceived?
  • Bias is part of the world that we live in. We cannot remove it but we need to be aware of it and try to mitigate it.
  • Media literacy is most important.
  • One of the biggest red flags is the use of Large Language models (LLM’s) and how they are being framed. AI does not know things it just repeats them.
  • People “pull down” on large chunks of language and a LLM will just predict what the next word is.
  • Language is an issue as LLM’s do not learn.

Sharon also elaborated on her interesting work in a corpus assisted study of political and media discourses around the EU in the lead up to Brexit. She found that the pro –EU stance of the Guardian was systematically undermined by three key themes:

  • Discourses of Conflict between UK / EU and EU / Member states
  • Discourses of Disparity of citizen’s experience (EU not working)
  • Discourses of Threat to the UK and an existential risk to the EU.

We all have linguistic biases – ways of conceiving and talking about things that are grounded in our world view. Sharon does not believe it is possible to entirely eliminate bias from our work – but awareness and transparency help mitigate the issue. She stressed in her conclusion that it is important to understands the impact of those biases as use of LLM’s and AI tools become more prevalent.

We had excellent feedback from Sharon’s talk, one delegate said it was “the best one hour briefing they had heard in a very long time”.