In 2015, a substantial amount of sensitive data was leaked from the Ashley Madison (hereon, AM) website. Over a series of posts, I investigate the affair, sometimes checking what other media outlets have claimed, and at other times providing my own results and analysis. The series looks (at the present time) like this, with instalments going out every Monday:
- RTC01: Life is short. Hack an affair
- Background context.
- RTC02: Time’s up
- The Impact Team’s first contact, and the contents of the ~80gb of data in the leaks.
- RTC03: Missing pieces
- Summary of the remaining ~220gb of data that was not leaked.
- RTC04: Putting out fires
- AM’s responses.
- RTC05: Internet demons
- AM and Biderman’s business practices.
- RTC06: The internet’s long memory
- The paid delete service.
- RTC07: Extortion, hate-crimes, and fraud
- The compromised bcrypt password hashes and spin-off crimes.
- RTC08: Ashley’s Angels
- The non-human accounts in the tables: Finding and analysing Ashley’s Angels and engager accounts.
- RTC09: Ghosts in the code
- The non-human accounts in the source-code: What the code tells us about the Angels, iconians, engagers, hosts, and XMPP bots.
- RTC10: Enshrining Angels
- How the Angels were captured in the site’s T&Cs, and discussed by AM.
- RTC11: Patterns of life
- Analysis of some of the more committed (anonymised) users.
- RTC12: Zig-a-zig-ah
- How various large groups compare and contrast in their preferences and self-presentation.
- RTC13: Thirty seven million
- If we exclude Angels and adjust for users having multiple accounts, how many people were using AM really?
- RTC14: Case closed?
- Legal cases to date.
- RTC15: Conclusion
- A summary of where things are.
Depending on how each instalment goes there may be more, and I will come back to update this page with links as I work my way through.
One thing that I think it is worth stressing from the start: not everyone caught up in the affair (if you will pardon the pun), was on questionable ground. There are many discussions to be had about both AM’s and the Impact Team’s ethical and moral conduct, and it’s certainly the case that the leak does contain details about many individuals who were actively pursuing affairs.
However, many other users do not fit the site’s adulterous character. Since there was no requirement to confirm one’s email address when creating an account, some found that their email addresses had been used without their knowledge or permission, whether as a prank or just as a matter of coincidence. Some single people, and particularly those looking for same-sex relationships, had joined AM as an ordinary dating and hook-up site. Some had partners who were quite happy with a polygamous relationship. Some had joined to check on spouses that they feared were cheating on them. Some were on the site for research and business purposes. (Heck, had I known it existed, I would probably have been on there in pursuit of some good data myself!) And some had joined, but then quickly changed their minds, and never used the site again. In other words, the matter is not quite so straightforwardly one of angels and demons as it might, on the surface, appear.
On a final note, for this section, it might not be immediately clear why a linguist, or even a forensic corpus linguist would be interested in this data. The fact that I specialise in deception might explain it better. I’m interested not only in how the real users seeking to have affairs behaved, but also in how AM’s Angel (non-human, software-animated) accounts masqueraded as humans, how successful they were, and how they were justified or explained by AM themselves. Before I can get to that bit, though, it’s well worth going through all the context that leads up to it.
Tech and software
I’m not sure anyone would care, but for the sake of rigour/replicability, and just in case an innocent reader has envisioned me using some trilithium-powered super-computer, here are the (probably quite disappointing) specs that all my results have been based on. This was not done quickly, nor prettily, nor on any kind of dream machine. I will not now, nor for the foreseeable future, be winning any breaking-news-data speed-analysis records. This was all done with a 32-bit Win7 PC, an Intel 3.1GHz processor, and 4gb RAM.
For the tables, I used a perfectly vanilla MySQL CL client and MySQL Workbench. On this computer it took just over two weeks to load the tables in. (I am not kidding.) For the source-code, I used Git, GitShell, TortoiseGit, Notepad++, less, and grep. For the emails I used Thunderbird. And for a range of other tasks, I used a winning combination of Office, FireAnt, and even, occasionally, Paint.
And that probably tells you everything you ever need to know about me.
I owe a considerable debt of gratitude to Andrew Hardie, who has been extraordinarily patient and helpful over the past three months. It goes without saying – but I’ll say it anyway – that any and all errors accumulated whilst I have worked on this dataset are absolutely mine.