From Planning to Deployment: Digital Preservation and Organizational Change June 2017

We were very excited to be visiting the lovely city of York for the Digital Preservation’s event “From Planning to Deployment: Digital Preservation and Organizational Change”.  The day promised a mixture of case studies from organisations who have or are in the process of implementing a digital preservation programme and also a chance for Jisc to showcase some of the work they have been sponsoring as part of the Research Data Shared Services project (which we are a pilot institution for).  It was a varied programme and the audience was very mixed – one of the big benefits of attending events like these is the opportunity to speak to colleagues from other institutions in related but different roles.  I spoke to some Records Managers and was interested in their perspective as active managers of current data.  I’m a big believer in promoting digital preservation through involvement at all stages of the data lifecycle (or records continuum if you prefer) so it is important that as many people as possible – whatever their role in the creation or management of data – are encouraged into good data management practices.  This might be by encouraging scientists to adopt the FAIR principles or by Records Managers advising on file formats, file naming and structures and so on.

William Kilbride, Digital Preservation Coalition introduces the event (CC-BY Rachel MacGregor)

The first half of the day was a series of case studies presented by various institutions, large and small, who had a whole range of experiences to share. It was introduced by a presentation from the Polonsky Digital Preservation Project based at Oxford and Cambridge Universities.  Lee Pretlove and Sarah Mason jointly led the conversation talking us through the challenges of developing and delivering a digital preservation project which has to continue beyond the life of the project.  Both Universities represented in this project are very large organisations but this can make the issues faced by the team extremely complex and challenging.  They have been recording their experiences of trying to embed practices from the project so that digital preservation can become part of a sustainable programme.

The first case study came from Jen Mitcham from York University talking about the digital preservation work they have undertaken their.  Jen has documented her activities very helpfully and consistently on her blog and she talked specifically about the amount of planning which needs to go into work and then the very real difficulties in implementation.  She has most recently been looking at digital preservation for research data – something we are working on here at Lancaster University.

Next up was Louisa Matthews from the Archaeological Data Service who have been spearheading approaches to Digital Preservation for a very long time.  The act of excavating a site is by its nature destructive so it is vital to be able to capture a data about it accurately and be able to return to and reuse the data for the foreseeable future.  This captures digital preservation in a nutshell!  Louisa described how engaging with their contributors ensures high quality re-usable data – something we are all aiming for.

The final case study for the morning was Rebecca Short from the University of Westminster talking about digital preservation and records management.  The university have already had success implementing a digital preservation workflow and are now seeking to embed it further in the whole records creation and management process.  Rebecca described the very complex information environment at her university – relatively small in comparison to the earlier presentations but no less challenging for all that

The afternoon was a useful opportunity to hear from Jisc about their Research Data Shared Services project which we are a pilot for.  We heard presentations from Arkivum, Preservica and Artefactual Systems who are all vendors taking part in the project and gave interesting and useful perspectives on their approaches to digital preservation issues.  The overwhelming message however has to be – you can’t buy a product which will do digital preservation.  Different products and services can help you with it, but as William Kilbride, Executive Director of the Digital Preservation Coalition has so neatly put it “digital preservation is a human project” and we should be focussing on getting people to engage with the issues and for all of us to be doing digital preservation.

Rachel MacGregor

Jisc Research Data Shared Services March 2017

Here at Lancaster University we are very excited to be part of a group of pilot institutions taking part in Jisc’s Research data shared services project.  This aims to provide a flexible range of services which suit the varied needs of institutions in the HE sector help achieve policy compliance for deposit, publication, discovery, storage and long term preservation of research data. It’s an ambitious project but one that there is an undoubted need for and we are trying to work with Jisc to help them achieve this goal.

Last week we were invited down to Jisc London HQ to learn about the progress of the project and – just as importantly – share our own thoughts and experiences on the process.

Waterloo Sunset (author’s own, CC-BY)

Daniela Duca has written a comprehensive overview of the meeting and the way forward for Jisc from the meeting.

Our table represented a microcosm of the project: Cambridge University (large institution), ourselves at Lancaster (medium) and the Royal College of Music (small).  We all have extremely different needs and resources and how one institution tackles a problem will not work at another.  However we have a common purpose in supporting our academics and students in their research, ensuring compliance with funders and enabling our institutions to support first class research outputs to share with the wider world.

We had been asked to do some preparatory work around costing models for the meeting – I think it would be fair to say we all found this challenging – probably because it is!  My previous knowledge of costings comes from having looked at the excellent Curation Costs Exchange which is an excellent staring point for anyone considering approaching the very difficult task of costing curation services.

My main interest in the day lay in the preservation aspects of the project especially in exploring wider use cases.  It’s clear that many institutions have a number of digital preservation scenarios for which the Shared Service solution might also be applicable.  What is also clear is that there are so many possible use cases that it would be very easy to accidentally create a whole new project without even trying!  I think it’s fair to say that all of us in the room – whether we are actively involved in digital preservation or not – are very interested in this part of the project.  There is no sense in Jisc replicating work which has already been done elsewhere or is being developed by other parties so it presents an ideal opportunity for collaborative working and building on the strengths of the existing digital preservation community.

Overall there was much food for thought and I look forward to the next development in the shared services project.

Researchers: what do they really think?

Image: Flickr https://flic.kr/p/8WpM2U – Rul Fernandes CC BY 2.0

Well… it’s probably quite hard to get to the truth of the matter but here at Lancaster we are trying to find out what researchers really think.  This is crucial for developing and improving our services and vital for

We are one of the organisations taking part in the JISC RDM Shared Services pilot and you can read their take on the work being done here.  With JISC’s help we undertook a researcher survey to find out a bit more about the kinds of research data which were being produced, how the data were (or weren’t) being managed and researcher attitudes towards their data.

Researchers were asked about the types of data which were generated from their research.  The results were quite interesting to us.  Unsurprisingly perhaps far and away the most popular “type” of data were “document or report” followed with a bit of a gap by spreadsheets.  Structured text files (eg xml, json etc) came a lot lower down the list as did databases.

Lancaster Researchers’ responses to JISC DAF Survey

What interested us was comparing the kinds of files which researchers said they created during the research process with the kinds of files which were actually being deposited with us as research outputs.  Obviously comparisons are problematic not least because our researchers were being asked about the data generated as part of their research activities rather than specifically those which were ultimately selected for permanent preservation.  We also know that we only get a small proportion of the research data which are being created within the university and the respondents may include people who have not deposited data with us. Having analysed the research datasets which we have already we can see that a huge percentage were structured or unstructured text files and a much smaller proportion were spreadsheets or Word documents.

Analysis of file formats undertaken at Lancaster University

Is it that our researchers have a false sense of the kinds of data which they are creating and using or is it that we as data curators have a poor understanding of the researcher community?  I suspect that it is a bit of both but as data curators it is our duty to both have a good understanding of the data environment and also to be able to communicate to our research community.  This is something we need to address as part of improving our advocacy and engagement strategies.

Another question which was asked was was about sharing data and this got answers which did surprise us.  The majority said that they did already share data and very few said they were not willing to share.  For the ones who did not share data it was mostly because it was sensitive or confidential data or they did not have permission to share it.  Of those who did share data the majority said it was for “the potential for others to re-use data” and because “research is a public good and should be open to all”.  An encouraging third of those questioned said they had re-used someone else’s data.

Results of JISC DAF survey for Lancaster University

Of course we know that the people who did answer our survey represent those who are in some way already engaged with the RDM process.  We also know that people are likely to give the answers they want us to hear!  But if people are serious about being willing and able to share we really want to support them in this.

So we’ve decided to try and get talking to our researchers – and for them to talk to each other – by setting up a series of Data Conversations – events where researchers can discuss creation and dissemination of data to try and encourage a climate of sharing and valuing the data.  It means we can hope for data that is well curated from the start of its life and that will be selected for deposit appropriately and with good metadata.

Better communication and advocacy will help us in the long run to preserve and share high quality relevant data which can be shared and reused.  Managing (research) data and long term preservation of digital data are collaborative activities and the more we understand and share the better we will be at achieving these goals.

Rachel MacGregor, Digital Archivist

RDMF16 – Creating a Research Data Community

 

 Creating a Research Data Community

Are research institutions engaging their researchers with Research Data Management (RDM)? And if so, how are they doing it? In this post Hardy Schwamm (@hardyschwamm),  Research Data Manager, Lancaster University, and Rosie Higman (@RosieHLib), Research Data Advisor, University of Cambridge, and explore the work they are doing in their respective institutions.

Whilst funder policies were the initial catalyst for many RDM services at UK universities there are many reasons to engage with RDM, from increased impact to moving towards Open Research as the new normal. And a growing number of researchers are keen to get involved! These reasons also highlight the need for a democratic, researcher-led approach if the behavioural change necessary for RDM is to be achieved. Following initial discussions online and at the Research Data Network event in Cambridge on 6 September, we wanted to find out whether and how others are engaging researchers beyond iterating funder policies.

At both Cambridge and Lancaster we are starting initiatives focused on this, respectively Data Champions and Data Conversations. The Data Champions at Cambridge will act as local experts in RDM, advocating at a departmental level and helping the RDM team to communicate across a fragmented institution. We also hope they will form a community of practice, sharing their expertise in areas such as big data and software preservation. The Lancaster University Data Conversations will provide a forum to researchers from all disciplines to share their data experiences and knowledge. The first event will be on 30 January 2017.

Having presented our respective plans to the RDM Forum (RDMF16) in Edinburgh on 22nd November we ran breakout sessions where small groups discussed the approaches our and other universities were taking, the results summarised below highlighting different forms that engagement with researchers will take.

 

Discussing RDM Community
RDMF16 Working Group discussing RDM Communities

Targeting our training

RDM workshops seem to be the most common way research data teams are engaging with researchers, typically targeting postgraduate research students and postdoctoral researchers. A recurrent theme was the need to target workshops for specific disciplinary groups, including several workshops run jointly between institutions where this meant it was possible to get sufficient participants for smaller disciplines. Alongside targeting disciplines some have found inviting academics who have experience of sharing their data to speak at workshops greatly increases engagement.

As well as focusing workshops so they are directly applicable to particular disciplines, several institutions have had success in linking their workshop to a particular tangible output, recognising that researchers are busy and are not interested in a general introduction. Examples of this include workshops around Data Management Plans, and embedding RDM into teaching students how to use databases.

An issue many institutions are having is getting the timing right for their workshops: too early and research students won’t have any data to manage or even be thinking about it; too late and students may have got into bad data management habits. Finding the goldilocks time which is ‘just right’ can be tricky. Two solutions to this problem were proposed: having short online training available before a more in-depth training later on, and having a 1 hour session as part of an induction followed by a 2 hour session 9-18 months into the PhD.

Tailored support

Alongside workshops, the most popular way to get researchers interested in RDM was through individual appointments, so that the conversation can be tailored to their needs, although this obviously presents a problem of scalability when most institutions only have one individual staff member dedicated to RDM.

IMG_20161122_121401There are two solutions to this problem which were mentioned during the breakout session. Firstly, some people are using a ‘train the trainer’ approach to involve other research support staff who are based in departments and already have regular contact with researchers. These people can act as intermediaries and are likely to have a good awareness of the discipline-specific issues which the researchers they support will be interested in.

The other option discussed was holding drop-in sessions within departments, where researchers know the RDM team will be on a regular basis. These have had mixed success at many institutions but seem to work better when paired with a more established service such as the Open Access or Impact team.

What RDM services should we offer?

We started the discussion at the RDM Forum thinking about extending our services beyond sheer compliance in order to create an “RDM community” where data management is part of good research practice and contributes to the Open Research agenda. This is the thinking behind the new initiatives at Cambridge and Lancaster.

However, there were also some critical or sceptical voices at our RDMF16 discussions. How can we promote an RDM community when we struggle to persuade researchers being compliant with institutional and funder policies? All RDM support teams are small and have many other tasks aside from advocacy and training. Some expressed concern that they lack the skills to market our services beyond the traditional methods used by libraries. We need to address and consider these concerns about capacity and skill sets as we attempt to engage researchers beyond compliance.

RDMF16 at work
RDMF16 at work

Summary

It is clear from our discussions that there is a wide variety of RDM-related activities at UK universities which stretch beyond enforcing compliance, but engaging large numbers of researchers is an ongoing concern. We also realised that many RDM professionals are not very good at practising what we preach and sharing our materials, so it’s worth highlighting that training materials can be shared on the RDM training community on Zenodo as long as they have an open license.

Many thanks to the participants at our breakout session at the RDMForum 16, and Angus Whyte for taking notes which allowed us to write this piece. You can follow previous discussions on this topic on Gitter.

Published on 30 November
Written by Rosie Higman and Hardy Schwamm
Creative Commons License