Jisc Research Data Shared Services March 2017

Here at Lancaster University we are very excited to be part of a group of pilot institutions taking part in Jisc’s Research data shared services project.  This aims to provide a flexible range of services which suit the varied needs of institutions in the HE sector help achieve policy compliance for deposit, publication, discovery, storage and long term preservation of research data. It’s an ambitious project but one that there is an undoubted need for and we are trying to work with Jisc to help them achieve this goal.

Last week we were invited down to Jisc London HQ to learn about the progress of the project and – just as importantly – share our own thoughts and experiences on the process.

Waterloo Sunset (author’s own, CC-BY)

Daniela Duca has written a comprehensive overview of the meeting and the way forward for Jisc from the meeting.

Our table represented a microcosm of the project: Cambridge University (large institution), ourselves at Lancaster (medium) and the Royal College of Music (small).  We all have extremely different needs and resources and how one institution tackles a problem will not work at another.  However we have a common purpose in supporting our academics and students in their research, ensuring compliance with funders and enabling our institutions to support first class research outputs to share with the wider world.

We had been asked to do some preparatory work around costing models for the meeting – I think it would be fair to say we all found this challenging – probably because it is!  My previous knowledge of costings comes from having looked at the excellent Curation Costs Exchange which is an excellent staring point for anyone considering approaching the very difficult task of costing curation services.

My main interest in the day lay in the preservation aspects of the project especially in exploring wider use cases.  It’s clear that many institutions have a number of digital preservation scenarios for which the Shared Service solution might also be applicable.  What is also clear is that there are so many possible use cases that it would be very easy to accidentally create a whole new project without even trying!  I think it’s fair to say that all of us in the room – whether we are actively involved in digital preservation or not – are very interested in this part of the project.  There is no sense in Jisc replicating work which has already been done elsewhere or is being developed by other parties so it presents an ideal opportunity for collaborative working and building on the strengths of the existing digital preservation community.

Overall there was much food for thought and I look forward to the next development in the shared services project.

Researchers: what do they really think?

Image: Flickr https://flic.kr/p/8WpM2U – Rul Fernandes CC BY 2.0

Well… it’s probably quite hard to get to the truth of the matter but here at Lancaster we are trying to find out what researchers really think.  This is crucial for developing and improving our services and vital for

We are one of the organisations taking part in the JISC RDM Shared Services pilot and you can read their take on the work being done here.  With JISC’s help we undertook a researcher survey to find out a bit more about the kinds of research data which were being produced, how the data were (or weren’t) being managed and researcher attitudes towards their data.

Researchers were asked about the types of data which were generated from their research.  The results were quite interesting to us.  Unsurprisingly perhaps far and away the most popular “type” of data were “document or report” followed with a bit of a gap by spreadsheets.  Structured text files (eg xml, json etc) came a lot lower down the list as did databases.

Lancaster Researchers’ responses to JISC DAF Survey

What interested us was comparing the kinds of files which researchers said they created during the research process with the kinds of files which were actually being deposited with us as research outputs.  Obviously comparisons are problematic not least because our researchers were being asked about the data generated as part of their research activities rather than specifically those which were ultimately selected for permanent preservation.  We also know that we only get a small proportion of the research data which are being created within the university and the respondents may include people who have not deposited data with us. Having analysed the research datasets which we have already we can see that a huge percentage were structured or unstructured text files and a much smaller proportion were spreadsheets or Word documents.

Analysis of file formats undertaken at Lancaster University

Is it that our researchers have a false sense of the kinds of data which they are creating and using or is it that we as data curators have a poor understanding of the researcher community?  I suspect that it is a bit of both but as data curators it is our duty to both have a good understanding of the data environment and also to be able to communicate to our research community.  This is something we need to address as part of improving our advocacy and engagement strategies.

Another question which was asked was was about sharing data and this got answers which did surprise us.  The majority said that they did already share data and very few said they were not willing to share.  For the ones who did not share data it was mostly because it was sensitive or confidential data or they did not have permission to share it.  Of those who did share data the majority said it was for “the potential for others to re-use data” and because “research is a public good and should be open to all”.  An encouraging third of those questioned said they had re-used someone else’s data.

Results of JISC DAF survey for Lancaster University

Of course we know that the people who did answer our survey represent those who are in some way already engaged with the RDM process.  We also know that people are likely to give the answers they want us to hear!  But if people are serious about being willing and able to share we really want to support them in this.

So we’ve decided to try and get talking to our researchers – and for them to talk to each other – by setting up a series of Data Conversations – events where researchers can discuss creation and dissemination of data to try and encourage a climate of sharing and valuing the data.  It means we can hope for data that is well curated from the start of its life and that will be selected for deposit appropriately and with good metadata.

Better communication and advocacy will help us in the long run to preserve and share high quality relevant data which can be shared and reused.  Managing (research) data and long term preservation of digital data are collaborative activities and the more we understand and share the better we will be at achieving these goals.

Rachel MacGregor, Digital Archivist

iPres 2016 – International Digital Preservation Conference Bern, Switzerland

I was extremely lucky to attend iPres 2016 the International Digital Preservation conference this year held in the beautiful Swiss capital city Bern.
Bern and a view of the Eiger, the Monch and the Jungfrau
Bern and a view of the Eiger, the Monch and the Jungfrau

The conference attracts some of the leading practitioners in the field so it’s a real privilege to be able to hear from and speak to people who are leading in research and development – creating tools, developing workflows and undertaking research into all aspects of digital management and preservation.

It will take a while to digest everything – there was so much to learn! – but I thought I would gather together some “highlights” of the session while still fresh in my mind.

The conference opened with a keynote from Bob Kahn who reflected on the need for interoperability and unique identifiers with digital objects. The world we live in is a networked one and as we conceive of information and objects as linked to one another over networks so we must find ways of describing them in question and unambiguous ways. When objects can exist anywhere and in several places at once so we need to find unambiguous ways of describing them.

To complement this I attended a workshop on persistent identifiers which gave an extremely helpful introduction to the world of URNs, URLs, PURLs, Handles, DOIs and the rest.  Sometimes it can seem a little like acronym spaghetti but the presenters Jonathan Clark, Maurizio Lunghi, Remco Van Veenendaal, Marcel Ras and Juha Hakala did did their best to untangle it for us.  Remco van Veenendaal introduced a great online tool from National Archives of the Netherlands which aims to guide practitioners towards an informed choice about which identifier scheme to use.  You can have a go at it here and the Netherlands Coalition for Digital Preservation  are keen for feedback.

What is particularly useful about it is that it explains in some detail at each stage about which PiD system might be particularly good in specific circumstances allowing for a nuanced approach to collections management.

Current persistent identifier systems do not cope well with complex digital objects and likely future developments will be around tackling these shortcomings.  Sadly the current widely used systems have already developed along separate lines to the extent that they cannot be fully aligned – sadly not the interoperable future we are all hoping for.

The second keynote came from Sabine Himmelsbach of the House of Electronic Art in Basel and was a lively and engaging account of a range of digital artworks and how digital preservation and curation has to work closely with artists to (re)create artworks.  It threw up many philosophical questions about authenticity an integrity not to mention the technical challenges of emulation and preservation of legacy formats.  This was a theme returned to again and again in various sessions throughout the conference as was the constant refrain of how the main challenges are not necessarily technological.

iPres2016 Conference in full swing

The conference had so many highlights it’s very hard to choose from amongst them.  There were a number of papers looking specifically at the issues around the long term preservation of research data, which is of particular interest to the work we are undertaking at Lancaster University.  There was a fascinating paper given by Austrian researchers from SBA research and TU Wien (the Vienna University of Technology) looking specifically at the management of the so-called “long tail” of research data – that is the wide variety of file formats spread over a relatively small number of files which characterises the management of research data in particular, but also of relevance for the management of legacy digital collections and digital art collections.  This discussion was returned to by Jen Mitcham (University of York) and Steve Mackey (Archivum) talking about preserving Research Data and also in my final workshop on file format identification.  Jay Gattusso – nobly joining in at 4 am local time from New Zealand – talked about similar issues at the National Library of New Zealand involving legacy digital formats where there were only one or two examples.

One of the posters also captured this point perfectly – “Should We Keep Everything Forever?: Determining Long-Term Value of Research Data” from the team at the University of Illinois at Urbana-Champaign which looked at trying to create a methodology for assessing and appraising research data.

Detail of poster from University of Illinois Urbana-Champaign
Detail of poster from University of Illinois Urbana-Champaign

Plenty of food for thought there about how much effort we should put into preserving, how we prioritise and how we appraise our collections.

The final keynote was from Dr David Bosshart of the Gottlieb Duttweiler Institute – a provocative take on the move from an industrial to a digital age.  He had a very particular view of the future which caused a bit of a mini-twitter storm from those who felt that his view was very narrow; after all more than half the world is not online.  Whilst his paper was no doubt deliberately designed to create debate, it highlighted the issues about where we direct our future developments and what our ultimate goals are.  This is common to all archives/preservation strategies: whose stories are we preserving? and how are we capturing complex narratives?  This issue was revisited later in a workshop on personal digital archiving.  Preservation can only happen where information is captured in the first place.  It can be about educating and empowering people to capture and present their own narratives.

There is still a lot for me to think about from such a varied and interesting conference.  There was very little time for leisure but there were wonderful evening events which the conference organisers arranged – a drinks receptions at the National Library of Switzerland and a conference dinner at the impressive fifteenth century Rathaus.  There are lots of conference photos online which give a flavour of the event.

And speaking of flavours I couldn’t visit Switzerland and not try a fondue…. Delicious!

Eating fondue

Rachel MacGregor

(all photos author’s own).