Well… it’s probably quite hard to get to the truth of the matter but here at Lancaster we are trying to find out what researchers really think. This is crucial for developing and improving our services and vital for delivering the service our researchers want.
We are one of the organisations taking part in the JISC RDM Shared Services pilot and you can read their take on the work being done here. With JISC’s help we undertook a researcher survey to find out a bit more about the kinds of research data which were being produced, how the data were (or weren’t) being managed and researcher attitudes towards their data.
Researchers were asked about the types of data which were generated from their research. The results were quite interesting to us. Unsurprisingly perhaps far and away the most popular “type” of data were “document or report” followed with a bit of a gap by spreadsheets. Structured text files (eg xml, json etc) came a lot lower down the list as did databases.
What interested us was comparing the kinds of files which researchers said they created during the research process with the kinds of files which were actually being deposited with us as research outputs. Obviously comparisons are problematic not least because our researchers were being asked about the data generated as part of their research activities rather than specifically those which were ultimately selected for permanent preservation. We also know that we only get a small proportion of the research data which are being created within the university and the respondents may include people who have not deposited data with us. Having analysed the research datasets which we have already we can see that a huge percentage were structured or unstructured text files and a much smaller proportion were spreadsheets or Word documents.
Is it that our researchers have a false sense of the kinds of data which they are creating and using or is it that we as data curators have a poor understanding of the researcher community? I suspect that it is a bit of both but as data curators it is our duty to both have a good understanding of the data environment and also to be able to communicate to our research community. This is something we need to address as part of improving our advocacy and engagement strategies.
Another question which was asked was was about sharing data and this got answers which did surprise us. The majority said that they did already share data and very few said they were not willing to share. For the ones who did not share data it was mostly because it was sensitive or confidential data or they did not have permission to share it. Of those who did share data the majority said it was for “the potential for others to re-use data” and because “research is a public good and should be open to all”. An encouraging third of those questioned said they had re-used someone else’s data.
Of course we know that the people who did answer our survey represent those who are in some way already engaged with the RDM process. We also know that people are likely to give the answers they want us to hear! But if people are serious about being willing and able to share we really want to support them in this.
So we’ve decided to try and get talking to our researchers – and for them to talk to each other – by setting up a series of Data Conversations – events where researchers can discuss creation and dissemination of data to try and encourage a climate of sharing and valuing the data. It means we can hope for data that is well curated from the start of its life and that will be selected for deposit appropriately and with good metadata.
Better communication and advocacy will help us in the long run to preserve and share high quality relevant data which can be shared and reused. Managing (research) data and long term preservation of digital data are collaborative activities and the more we understand and share the better we will be at achieving these goals.
Rachel MacGregor, Digital Archivist