RTC03: Missing pieces

This is one of a multi-part series. For other instalments, see Romancing the code: Ashley’s Angels and internet demons.

Whilst the amount of data in the leak suggests that it has been accessed and collected over an extended period of time, it was also immediately clear that the torrents comprised, at best, only around a quarter of all the available data. There have been claims that in total, 300gb was taken, whilst the files in the leak total only around 80gb (uncompressed). Even if it is true that someone has a further 220gb, this is unlikely to be the entire amount that was available.

We could take a lot of guesses at the contents of the other 220gb. For instance, there may be more email archives from other members of the AM team. This isn’t guaranteed, however, since it looks as though Biderman ran his email through his gmail account and others may have used different, less accessible systems. There may be the usual clutch of administrative documents – reports, summaries, press releases, and so forth, that one would expect of any large organisation. And given that AM was founded in 2001, but that much of the data doesn’t stretch back this far, there may be older files that were deemed less interesting.

One thing that is clear, however: there are only five tables in the leak, when a database schema file (…ashleymadison\builds.git\sql\schema\main-schema.sql) seems to indicate the existence of at least 600 tables. Though we can speculate about the contents of those table based on their column names and how they relate to the source-code, it’s impossible to be absolutely certain about their contents without actually seeing them. We can, instead, pick out some broad themes, which include:

  • support logs and error tickets
  • chat settings, monitors, and logs
    • (it is not clear whether this includes the messages themselves though intuitively I would have thought that one of them somewhere would)
  • photos, images, and videos
  • phone numbers, accounts, logs, and sessions
    • (it is not clear whether this includes recordings)
  • revenue brought in by affiliates, and payments made to them
  • Angel accounts, chatbots, mailbots, hosts, and iconians
    • (see upcoming instalments on this)

Of the five tables included in the leak, the smallest is ~4gb, and the largest is ~15gb. In other words, it’s more than possible that some of the outstanding ~595 tables account for the remaining ~220gb. In fact, if the remaining tables are similarly sized, then the Impact Team may only have somewhere between 15 to 60 of those tables, leaving anywhere up to 580 unaccounted for. This is unsurprising for several reasons though. On the one hand, some of the tables will be relatively uninteresting to almost anyone. Additionally, AM appears to have been knee-deep in refactoring at the time that the leak occurred. This could have been an effort to scale their systems as more users began to register, but whatever the case, the net result is enormous amounts of duplicated data. Three of the five tables are a perfect example of this (am_am_member, aminno_member, member_details), with multiple replicated or closely-replicated columns of data. In short, just because there are 600 possible tables, that doesn’t mean that they are 600 sources of unique, or interesting information. As a corollary to this, the Impact Team seems to have had an eye for data that is maximally interesting and, well, impactful, so if they really did scoop a further 220gb up, I’m pretty sure there will be stuff in there to launch another ten thousand blog posts and news articles.

 

Read the next instalment- RTC04: Putting out fires.

2 thoughts on “RTC03: Missing pieces

  1. Pingback: Romancing the code: Ashley’s Angels and internet demons | Dr Claire Hardaker

  2. Pingback: RTC02: Time’s up | Dr Claire Hardaker

Comments are closed.