Monthly Archives: January 2022

Conversation – who writes the data?

The main difference between survey and administrative data is the input process.

Design

Survey data is input either by the subject or on behalf of the subject, by an interviewer or proxy. Survey questions are heavily designed with a lot of research behind each one to obtain maximum response and be as accommodating of all possible answers, including ‘don’t know’ and ‘prefer not to say’.

Administrative data is input via an administrative person, whose job is to complete the form in order to provide a service to the subject. The form is designed around what the system needs in order to fulfil it’s purpose, and only what is needed according to GDPR. The data then has to go through a process for the system to work, so the form is therefore designed with barriers and restrictions to direct the data input; often without flexibility should the real value not fit within these boundaries. If a field must be completed then a computer-based system should flag a warning when it is empty or if the data required to be in a certain format then this will also be flagged. But what happens when the real data does not fit within these boundaries? Or the data simply does not make sense? It is down to the administrator to decide. For example a date is required but the event happened over a couple of days or weeks so the administrator then puts the earliest date. But they could have chosen to put the latest date or the first day of the week, or last day of the month, or any other date, into the form.

Source(s)

Surveys are a one-to-one event between survey and respondent. One respondent completes one survey.

Administrative data can contain multiple sources. Multiple individuals and organisations are often involved in an administrative process, and sometimes the data is secondary from a previous administrative process. For example, in the courts administrators will input data from the applicant, the defendant, the local authority, the police, the CPS and/or so on. Some sections will require direct input from a single source and some sections could be completed by multiple sources. To demonstrate take say a defendant’s name and address. Ideally this would be come directly from the defendant, but it could come from the police, or any other organisation, that collected the data for their own processes (This is what makes data linkage possible). It is this complex network of data collection that makes the pros and cons of working with administrative data.

Researchers need to keep in mind who provided the data and who input the data. The individual person entering data may be dependant on another source for the data but also have a strict deadline.

Signatory

Respondents to a survey will often provide a signature to declare that it was indeed them that completed the survey.

Signatures in an administrative process will often be a declaration that the data is correct and accurate, with acceptance of responsibility should otherwise be proven.

Researchers using administrative data need to be mindful of this declaration. It is often tempting in data processing to correct errors, such as “correcting” dates from 3018 to 2018 or when the calculated age does not match the declared age. Do not! I would seriously recommend researchers restrict themselves to only filtering valid and invalid data, and an inclusion/exclusion criteria.