So Long and Thanks For All the Pizza

 

 

 

 

 

Today is my last day working as Digital Archivist at Lancaster University so I thought I would take a little time to reflect on my three years here; the highlights and what I have learnt in my time here.

Pizza

Pizza has featured quite a bit in my time here at Lancaster. And that’s not just team lunches! Pizza is a core component of our Data Conversations – the networking event designed to bring together researchers to share experiences of creating, using and sharing data. Having a peer-led discussion forum has been a fantastic success story for us – it’s even gone global!  The pizza is a key part of this as it helps create an informal, friendly environment where sharing is central. I’ve learnt a huge amount from being part of the Data Conversations and I will definitely be taking forward what I’ve learnt about successful engagement activities.

Culture change

Data Conversations' attendees enjoying refreshments and conversation

The focus for the Research Services team from early on (ie when there were only two of us!) was about how to push forward culture change. We were fortunate in having a management team who supported and promoted team-led agenda setting.  We identified our priorities for development which focused on bringing about culture change and promoting an Open Research agenda. Looking at our goals and keeping them at the centre of what we do was important and meant we could tailor and prioritise activities around encouraging and promoting good data management practices.  From my perspective of being engaged in digital preservation the best chance of preserving data that we have is by ensuring that the data is created in the right way in the first place.   Sending the message about good data practices “upstream” so that well formed data is captured early and with the right metadata means it has the best chance of being accessible into the future.

Opportunities

Eating fondue

I’ve learnt a huge amount in the time I’ve been at Lancaster; when I started I had a lot of enthusiasm but not much practical experience.  I hope I’ve retained the enthusiasm but added experience and practical application to it. Things change at a very fast pace in the digital preservation world so it’s brilliant to be able to go to training events and conferences and hear from the leaders in the field. I was lucky enough to attend iPres 2016 in the beautiful city of Bern. I learnt a lot there and have very much built on that knowledge and experience especially around peronal digital archiving and community engagement activities.

Communities

Did I mention cake and biscuits?

The last three years have also brought me more fully into the Digital Preservation Community and it is a community where sharing best practice and collaborating is greatly encouraged for the benefit of all. I have had help and support from countless people but I would single out Jen Mitcham at the University of York and Sean Rippington at the University of St Andrews as being particularly supportive and inspirational. The Archivematica UK user group has also been a fantastic group and I am looking forward to continuing these relationships into the future.

The Future

So now I’m off to take up new challenges at the University of Warwick in their Modern Records Centre. I am looking forward to future collaborations with the team of colleagues and friends at Lancaster and see us all face the challenges that digital data presents together.

Rachel MacGregor, Digital Archivist

Connecting the Bits

Glasgow: location of the unconference (CC0: https://pixabay.com/en/glasgow-scotland-city-tourism-2997987/)

We are members of the Digital Preservation Coalition which is a members organisation which exists to secure our digital legacy. Members include businesses, HE institutions, funding bodies, national heritage and cultural organisations and are drawn from every continent.

Last week all members were invited to the annual un-conference where we come together not only to share experiences and network but also to help set the Digital Preservation Coalition’s training and development agenda for the year ahead. The ideas is that members have the opportunity to raise the issues which really matter to them and then discuss how the DPC can take action to move forward on these issues.

The agenda for is set on the day and full members are invited to give a three minute presentation of their successes, challenges and future opportunities.  Listening to the reports it was clear that there were themes common to all, whatever stage of maturity they are at.

So what were the common themes which came out of the day?

Challenges

(CC0 https://pixabay.com/en/magic-cube-patience-tricky-hobby-1976725/)

Many people shared their efforts to meet the challenge of preserving specific “types” of data:

  • Software and software environments
  • Email
  • Audio visual materials
  • Sensitive data

The preservation of Research Data, which usually means a huge range of data types also came up.  Here at Lancaster the preservation of Research Data has been our focus so we are well aware of the challenges we face but it’s great to be able to share them and know that there is a community out there working on this together.  We have also been engaging with software preservation and looking into ways on which we can support our researchers who create software. There is really encouraging work being done by the Software Sustainability Institute and here at Lancaster we have been running various initiatives including inviting Neil Chue Hong of the SSI to speak at our Data Conversation and presenting at our own Research Software Forum.

There were quite a few organisational challenges discussed such as:

  • Huge rise in quantity of data and difficulties in predicting the growth rate.
  • Resources either staying the same or being cut in the face of the growth in data
  • Sustaining work beyond a project level – moving it on to business as usual
  • Dealing with organisational restructures

Finding the right tools for the job

(CC0 https://pixabay.com/en/tools-awl-pliers-antique-equipment-1083796/)

These challenges require robust strategies and planning to tackle.  Again the approaches we need to develop can be done as a community. Here at Lancaster we are developing a tool called DMAonline as part of the Jisc Research Data Shared Service.  DMAonline has reporting functionality for a variety of research data and scholarly communications outputs but one of the things we are hoping for is that it will be able to provide intelligence (rather than analytics) – it will use machine learning to make suggestions on growth and development and predictions on future use.

We don’t just want to create pretty graphs we want to answer questions; for example predicting growth in storage needs or predicting the growth of the “long tail” of unidentified file formats. It’s an ambitious aim but we are keen to take part in the challenges presented by the long term preservation of digital assets.

Finding the right tools for the job was also mentioned.  I think we would all agree that the tools we currently have are not necessarily the right fit for the job. Often we just need to get on with the job and have to use the tools which are available but sometimes it’s good to take a step back and say – what are we trying to achieve? What is the best way to get there and what should the tools we need look like? I don’t have the technical knowledge to build them but I can work with others – like my team here at Lancaster – to work towards this.

The human problem

One thing that came up was the challenge of getting the data/records/archives as quickly as possible ie before they are lots/altered/deleted/degraded/ended up on a corrupted cd.  Some of this challenge is technical ie having simple easy-to-use systems which people will engage with and will encourage good data practices.  However more of the challenge is about getting people to engage with the process in the first place so that vital data, metadata and contextual information is not lost over the passage of time.

Successes

(CC0 https://pixabay.com/en/raise-challenge-landscape-mountain-3338589/)

It was great to hear about many successes with many institutions implementing a fully functional preservation system. Other institutions had successes getting digital preservation on the agenda with senior management.  One institution mentioned that they argues by not investing in digital preservation and training they would fall behind competitors. Another mentioned getting digital preservation recognised on a risk register. These are all significant achievements and show that individual institutions are moving forward and making progress.  

It was also really good to hear about some specific projects such as the work done by the National Library of Scotland on converting tif files to jp2 or the British Library’s work to keep up with the challenge of preserving digital formats which form part of the collections of a legal deposit library.  This work will also benefit other institutions tackling similar problems. 

Moving on

I really hope this day leads to relevant and targeted planning and support for all DPC members and I also hope it helps connect us as a community to tackle the common challenges which we all face.  The Digital Preservation Coalition also provide lots of resources for the wider non-member community so it’s a great way of coordinating development work and sharing expertise to help foster a real community of practice.

[This blog post was first published by the Digital Preservation Coalition]

Rachel MacGregor (Digital Archivist)

International Archives Day

Today is International Archives Day where everyone involved in preserving archives, records, data – whatever your take – celebrates the work that is happening worldwide to ensure the preservation of our memory and heritage and the protection of our rights by documenting decisions and building the foundations for good governance.

Lancaster Castle: very visible heritage (image author’s own CC-BY)

It’s easy to get people interested in memory and heritage – our history surrounds us in very visible ways and our memories are what binds us together with sharing and celebrating the past to inform our culture and identity.  But it’s much harder to get excited about “governance” even though it’s all about maintaining rights and responsibilities and ensuring justice and equality across the board.

So I want to take a moment to hear it for governance and shout about how the work we are doing here at Lancaster University is contributing towards supporting the creation of strong and accountable governance structures.  Accountable governance ensures fairness and equality for all. The work in my team is all about promoting the Open Research agenda which creates an environment where research is sustainable, reliable, accountable and for the greater good.

“Good governance in the public sector encourages
better informed and longer-term decision making as
well as the efficient use of resources. It strengthens
accountability for the stewardship of those
resources… People’s lives are
thereby improved.”

(International framework: good governance in the public sector IFAC/CIPFA 2014)

And it’s improving people’s lives that we are all really putting all our effort into.

So how are we hoping to supporting these objectives? The long term preservation of data and of good quality, reliable data means that we can support the decision making processes which affect all of us.  Poor data leads to poor decisions so we are looking to see if we can establish ways of preserving data in a way that guarantees its authenticity and integrity and ensures that it will be available for the long term.  The work is not done in isolation and we are looking at best practice and initiatives such as the Jisc Research Data Shared Service which we are hoping will deliver huge advances in helping us preserve important data.

Let’s celebrate everyone who is working hard on preserving documents, manuscripts, archives, data – all kinds of information – which enrich our lives and help us build a better world.

Rachel MacGregor (Digital Archivist)

Two days in the City

Beautiful sunshine in the City: Westminster Bridge (photo: Rachel MacGregor CC-BY)

I was lucky enough to have two days in London last week to attend two separate but linked events: the first was a Jisc sponsored workshop on Digital Appraisal and the second an Archivematica UK User group meet up.  It was a nice balance of activities, Day One was around the theory of how we decide what to keep or what to throw away and Day Two was about sharing experiences of using Archivematica – a digital preservation tool which can potentially help us with aspects of this.

Wednesday was a day at the University of Westminster – founded in 1838  in their beautiful buildings at 309 Regent Street.

Foyer at University of Westminster (Credit: Big Rock Cat / Sabotage1 https://en.wikipedia.org/w/index.php?title=File:University_of_Westminster_Foyer.jpg CC-BY 3.0)

This event – kindly sponsored by Jisc – designed to bring together digital preservation practitioners to discuss and explore approaches to the theory and practice of the managing digital archives. Chatham House Rules applied so there was freedom to discuss practice in an open and honest way.  The morning session comprised of two presentations.  The first focussed on the theory of appraisal, that is how we make decisions about what to keep and what to get rid of.  The second explored practical experiences of the same and reflecting on the change that those who are responsible for managing and looking after records have experienced in the move to the digital age.

For the afternoon session we reflected on what we had heard in the morning and were divided into smaller groups and invited to discuss the approaches we took to appraising both digital and physical collections.  It was a good chance to share experiences of tools which we found useful and difficulties we encountered.

For me it was a great opportunity to meet people out there actually “doing preservation” using a wide variety of tools. Sometimes when people use one software package or another it can have the effect of dividing them into camps.  It’s really important to be able to meet up with and share experiences of others who are in a similar position – as witnessed at the Archivematica Meeting the next day – but it also good to hear a diversity of experience.  There was a strong feeling that any tools, workflows and ways of working are likely to change and develop rapidly, paralleling rapid technological changes, so that anything we opt for now is necessarily only a “temporary” solution.  We have to learn to work in a state of flux and be dynamic in our approaches to preservation.

Day two was the Archivematica UK User group this time hosted by Westminster School.  I’ve blogged before about this group when we hosted here at Lancaster University. Yet another fantastic setting for our meeting another brilliant opportunity to discuss our work with colleagues from a wide range of institutions.

Deans Yard, Westminster (Photo by Rachel MacGregor CC-BY)

The morning session involved the sharing of workflows and in a nice parallel to the previous day’s session, talking about appraisal!

Lunch was back-to-school in the canteen but I’m pleased to report that school dinners have certainly moved on since I remember them!

In the afternoon there were a selection of presentations – including one that I gave to update people on our work at Lancaster as part of the Jisc RDSS to create a reporting tool – DMAonline – which will work with Archivematica to give added reporting functionality.  One of the attractive things about Archivematica as a digital preservation tool, is that it is Open Source so that it allows for development work to happen parallel to the product and to suit all sorts of circumstances.

We also heard from Hrafn Malmquist at University of Edinburgh talking about his recent work with Archivematica to help with preserving the records of the University Court. Sean Rippington from the University of St Andrews talked to us about experimenting with exporting Sharepoint files and Laura Giles from the University of Hull talked about documenting Hull’s year as City of Culture.

We were also lucky enough to get a tour of Westminster School’s archive which gave the archivist Elizabeth the chance to show off her favourite items, including the wonderful Town Boy ledger which you can discover for yourself here.

All in all it was a very useful couple of days in London which gave me a lot to think about and incorporate into my practice.  Having time to reflect on theoretical approaches in invaluable and rarely achieved when the “day job” is so busy and I am grateful to have had the time to attend.

Rachel MacGregor

Data Interview with Andrew Moore

Andrew Moore (@apmoore94) is a 2nd year PhD student at Lancaster University within the School of Computing and Communications. He is studying how sentiment analysis can be improved through world knowledge using finance as his specialised domain. His research interests are across Natural Language Processing, Machine Learning, and Reproducibility.

We talked to Andrew after he presented at the 3rd Data Conversations.

Q: When does software become research data in your understanding?

Andrew: As soon as you start writing software towards a research paper that I would count as research data.

Q: Is that when you need the code to verify results or re-run calculations?

Andrew: You also need the code to clean your data which is just as important as your results because depending on how you clean your data that informs on what your results are going to be.

Q: And the software is needed to clean the data?

Andrew: Yes. The software will be needed for cleaning the data. So as soon as you start writing your software towards a paper that is when the code becomes research data. It doesn’t have to be in the public domain but it really should be.

Q: What is the current practice when you publish a paper? Do you get asked where your software is?

Andrew: Recently we have actually, for some of our conferences in the computational linguistics or Natural Languages Processing field. But it is not a requirement to get published. It is a friendly question rather than an obligation.

Q: Who is asking, the publisher?

Andrew: No, that’s the conference chairs who are asking but it is not a requirement. Personally I think it should be. I can understand in certain cases when for instance there are security concerns. But normally the sensitivity is on the data side rather than the software.

Q: At the moment if you read a paper the software that is linked to the paper is not available?

Andrew: Normally, if there is software with the paper the paper would have a link, normally on the first or the last page. But a large proportion of the papers don’t have a link. Normally there would be a link to GitHub, maybe 50 per cent of the time. Other than that you can dig around if you’re really looking for it, perhaps Google the name but that’s not really how it should be.

Q: So sometimes the software is available but not referenced in the paper?

Andrew: That’s correct.

Q: But why would you not reference the software in the paper when it is available?

Andrew: I am really puzzled by this [laughs]. I can think of a few reasons. One of them could be that the GitHub instance is just used as backup. The problem I have with that is that it is not referenced in the paper how much do you trust the code to be the version that is associated with the paper?

Also, the other problem with that if I’m on GitHub is that if you reference it in a paper, on GitHub you can keep changing the code and unless you “tag” it on GitHub like a version number and reference that tag in your paper you don’t know what is the correct version.

Q: What about pushing a version of the code from GitHub to [the data archiving tool] Zenodo and get a DOI?

Andrew: I didn’t know about that until recently!

Andrew presenting at Data Conversations

Q: So this mechanism is not widely known?

Andrew: I know what DOIs are but not really how you can get them.

Q: So are the issues why software isn’t shared about the lack of time or is it more technical as we have just discussed, to do with versions and ways of publishing?

Andrew: I think time and technical issues go hand in hand. To be technically better takes time and to do research takes time. It is always a tradeoff between “I want my next paper out” and spending extra time on your code. If your paper is already accepted that is “my merit” so why spend more time?

But there are incentives! When I submitted paper at an evaluation workshop I said that everybody should release their software because it was about evaluating models so it makes sense to have all the code online. So it was decided that we shouldn’t enforce the release but it was encouraged and the argument was that you are likely to get more citations. Because if your code is available people are more likely to use it and then to credit you by citing your paper. So getting more citations is a good incentive but I am not sure if there are some studies proving that releasing software correlates to more citations?

Q: There are a number of studies proving there is a positive correlation when you deposit your research data[1]. I am not aware there is one for software[2]. So maybe we need more evidence to persuade researchers to release code?

Andrew: Personally I think you should do it anyway! You spend so many hours on writing software so even if it takes you a couple of hours extra to put it online it might save somebody else a lot of time doing the same thing. But some technical training could help significantly. From my understanding, the better I got at doing software development the quicker I’ve been getting at releasing code.

Q: Is that something that Lancaster University could help with? Would that be training or do we need specialists that offer support?

Andrew: I am not too sure. I have a personal interest in training myself but I am not sure how that would fit into research management.

Q: I remember that at the last Data Conversations Research Software Engineers were being discussed as a support method.

Andrew: I think that would be a great idea. They could help direct researchers. Even if they don’t do any development work for them they could have a look at the code and point them into directions and suggest “I think you should do this or that”, like re-factoring. I think that kind of supervision would be really beneficial, like a mentor even if they are not directly on that project. Just for example ten per cent of their time on a project would help.

Q: Are you aware that this is happening elsewhere?

Andrew: Yes, I did a summer internship with the Turing Institute and they have a team of Research Software Engineers.

Q: And who do the Research Software Engineers support?

Andrew: The Alan Turing Institute is made up of five institutes. They represent the Institute of Data Science for the UK. They do have their own researchers but also associated researchers from the other five universities. The Research Software Engineers are embedded in the research side integrated with the researchers.

When I was an intern at the Turing Institute one of the Research Software Engineers had a time slot for us available once a week.

Q: Like a drop in help session?

Andrew: Yes, like that. They helped me by directing me to different libraries and software to unit test my code and create documentation as well stating the benefits of doing this. I know that others teams benefited from there guidance and support on using Microsoft Azure cloud computing to facilitate their work. I imagine that a lot of time was saved by the help that they gave.

Q: Thanks Andrew. And to get to the final question. You deposited data here at Lancaster University using Pure. Does that work for you as a method to deposit your research data and get a DOI? Does that address your needs?

Andrew: I think better support for software might be needed on Pure. It would be great if it could work with GitHub.

Q: Yes, at the moment you can’t link Pure with GitHub in the same way you can link GitHub with Zenodo.

Andrew: When you link GitHub and Zenodo does Zenodo keep a copy of the code?

Q: I am not an expert but I believe provides the DOI to a specific release of the software.

Andrew: One thing I think it is really good that we keep data at Lancaster’s repository. In twenty years’ time GitHub might not exist anymore and then I would really appreciate a copy store in the Lancaster archives. The assumption that “It’s in GitHub, it’s fine” might not be true.

Q: Yes, if we assume that GitHub is platform for long-term preservation of code we need to trust it and I am not sure that this is the case. If you deposit here at Lancaster the University has a commitment to preservation and I believe that the University’s data archive is “trustworthy”.

Andrew: So putting a zipped copy of your code is a good solution for now. But in the long term the University’s archives could be better for software. An institutional GitLab might be good and useful. I know there is one in Medicine but an institution wide one would help. It would be nice if Pure could talk to these systems but I can imagine it is difficult.

The area of Neuroscience seems to be doing quite well with releasing research software. You have an opt-in system for the review of code. I think one of the Fellows of the Software Sustainability Institute was behind this idea.

Q: Did that happen locally here at Lancaster University?

Andrew: No, the Fellow was from Cambridge. They seem to be ahead of the curve but it only happened this year. But they seem to be really pushing for that.

Q: Thanks a lot for the Data Interview Andrew!

The interview was conducted by Hardy Schwamm.

[1] For example: Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ, 1, e175. http://doi.org/10.7717/peerj.175

[2] Actually there is a relevant study: Vandewalle, Patrick. Code Sharing Is Associated with Research Impact in Image Processing . Computing in Science & Engineering, 2012, http://ieeexplore.ieee.org/document/6200247/.

 

 

International Digital Preservation Day

What’s that about then?

International Digital Preservation Day 30th November 2017 #IDPD2017

What’s that about then?

Digital Archivists are a much misunderstood lot.

A lot of people think our  work on digital preservation must be something to do with digitising old documents but this is absolutely not the case.  Of course digitising old documents is fantastic and the wonderful resources which are now increasingly available on the internet like (and there are so many examples these are just some of my favourite ones) Charles Booth’s London or the Cambridge Digital Library . There are thousands and thousands useful for scholars, historians, students, teachers, genealogists, journalists – well just about anyone really who is interested in getting access to sources that would otherwise be near impossible to access.  Digitising archive and library content has revolutionised the way we access and interact with archives, manuscripts and special collections.

Image: Flickr https://flic.kr/p/dVHkbG Kjetil Korslien CC BY-NC 2.0

However – this is not what the digital archivist does (although there are overlaps).  The digital archivist is concerned mainly (although not exclusively) with archives, data, stuff – whatever you want to call it – which was created in a digital format and has never had a physical existence.  If someone accidentally deletes the digitised version of Charles Booth’s poverty maps, the original is still there and can be digitised again.  Of course that would be an enormous waste of time and effort which is why we often treat digitised content as if it were the original content and guard against accidental deletion or loss.

But although digitisation does help preserve a document because it reduces the wear and tear on the original it is often swapping one stable format (paper, parchment etc) for a less stable one.  So you could argue that digitising – rather than helping with preservation issues – is just creating new ones.  Of course there are many very unstable analogue formats such as many photographic processes, magnetic tape and so forth which need to be digitised if they are to survive at all.

Digitisation is not preservation.

With digitised content you would like to think (!) that you might have some measure of control about what that content is, specifically the format it comes in.  It is possible to choose to save the image files in a format that is widely used and well documented, so that the risk that they will be hard to access in 5 or 10 years time is lessened.  There are formats which are recommended for long term preservation because they are widely adopted and well supported and by choosing these we help the process of digital preservation by giving those files a “head start”.

However files which are created by others – perhaps completely outside of the organisation – can come in *literally* any format.  A good example of this is when I analysed a sample of the data deposited by academics undertaking research at our institution and found a grand total of  59 different file types.  OK so that doesn’t sound *too* bad but 55% of the files I couldn’t identify at all.  Which is not so good.

So we could try (as some archives do) saying we will only accept files in a certain format, to give our files the best chance of a long and happy life.  But clearly there are lots of circumstances where this is either impractical or impossible.  For example with the papers of now-deceased person – we cannot ask them to convert them or resubmit them.  And in the case of our researchers they will need to be using specific software to perform specific specialised tasks and they themselves may have very little say in their choice of software.

Another major -and perhaps often overlooked issue with digital preservation is actually making sure that the files are captured in the first place.  This is not a digital specific problem – any kind of data whether it is research outputs, personal papers, financial records of a business – are all at risk of disappearing if they are not looked after properly.  They will need a safe storage environment where the risk of accidental or malicious damage is kept to a minimum and they can be found, the content understood and shared effectively.  For digital files this means a particularly rigorous ongoing check that the content and format are stable and that they can still be made accessible.

So what is digital preservation?

It’s not just backing stuff up

It’s the active management of digital assets to ensure they will still be accessible in the future.

Making sure we can still open files in the future.
Making sure we can still understand files in the future.

It’s a tough job – but someone’s got to do it!

 

3rd Data Conversation – Software as data: summary and slides

We had our third Data Conversation here at Lancaster University again with the aim of bringing together researchers to share their data stories and discuss issues and exchange ideas in a  friendly and informal setting.

Data Conversations Agenda

We had a bit of a change this time, however, as we had a special guest speaker, Neil Chue-Hong of the Software Sustainability Institute talking about Software as “a different kind of research object“.

We all had plenty of time to eat pizza and crisps before Neil invited us all to consider reproducibility and sustainability in relation to software.  Neil has a very clear and engaging style which really helped us, the audience, navigate around the complex issues of managing software.  He asked us all to imagine returning to our work in three months time – would it make sense?  Would it still work?  He also addressed some of the complex issues around versioning, authorship and sharing software.

Neil Chue-Hong, Software Sustainability Institute

The second half of the afternoon followed the more traditional Data Conversations route of short lightning talks given by Lancaster University researchers.

Data Conversations

First up was Barry Rowlingson (Lancaster Medical School) talking about the benefits of using GitLab for developing, sharing and keeping software safe.

Barry Rowlingson (Lancaster University)

Barry Rowlingson weighs up the benefits of GitLab over GitHub…

Next was Kristoffer Geyer (Psychology) talking about the innovative and challenging uses of smartphone data for investigating behaviour and in particular the issues of capturing the data from external and ever changing software. Kris mentioned how the recent update of Android (to Oreo) makes retrieving relevant data more difficult – a flexible approach is definitely what is needed.

Kristoffer Geyer (Lancaster University)

 

Then we heard from Andrew Moore (School of Computing and Communications) who returned to the theme of sharing software, looking at some of the barriers and opportunities which present themselves. Andrew argued passionately that we need more resources for software sharing (such as specialist Research Software Engineers) but also that researchers need to share their attitudes towards sharing their code.

Andrew Moore (Lancaster University)

Our final speaker was the Library’s own Stephen Robinson (Library Developer) talking about using containers as a method of software preservation.  This provoked quite some debate – which is exactly what we want to encourage at these events!

We think these kind of conversations are a great way of getting people to share good ideas and good practice around data management and we look forward to the next Data Conversations in January 2018!

This blog post was co-authored by Rachel MacGregor and Hardy Schwamm.

Reflections on PASIG 2017

Christchurch, Oxford: my home for the conference

It was fantastic to see PASIG 2017 (Preservation and Archives Special Interest Group) come to Oxford this year which meant I had the privilege of attending this prestigious international conference in the beautiful surroundings of Oxford’s Natural History Museum.  All slides and presentations are available here.

The first day was advertised as Bootcamp Day so that everyone could be up-to-speed with the basics.  And I thought: “do I know everything about Digital Preservation?”  and the answer was “No” so I decided to come along to see what I could learn.  The answer was:  quite a lot.  There was some excellent advice on offer from Sharon McMeekin of the Digital Preservation Coalition and Stephanie Taylor of CoSector who both have a huge amount of experience in delivering and supporting digital preservation training.  Adrian Brown (UK Parliament) gave us a lightning tour of relevant standards – what they are and why they are important.  It was so whistle stop that I think we were all glad that the slides of all the presentations are available – this was definitely one to go back to.

The afternoon kicked off with “What I wish I knew before I started” and again responses to these have been summarised in some fantastic notes made collaboratively but especially by Erwin Verbruggen (Netherlands Institute for Sound and Vision) and David Underdown (UK National Archives).  One of the pieces of advice I liked the most came from Tim Gollins (National Records of Scotland) who suggested that inspiration for solutions does not always come from experts or even from within the field – it’s an invitation to think broadly and get ideas, inspiration and solutions from far and wide.  Otherwise we will never innovate or move on from current practices or ways of thinking.

There was much food for thought from the British Library team who are dealing with all sorts of complex format features.  The line between book and game and book and artwork is often blurred.  They used the example of Nosy Crow’s Goldilocks and Little Bear – is it a book, an app, a game or all three?  And then there is Tea Uglow’s A Universe Explodes , a blockchain book, designed to be ephemeral and changing.  In this it has many things in common with time-based artworks which institutions such as the Tate, MOMA and many others are grappling with preserving.

The conference dinner was held at the beautiful Wadham College and it was great again to have the opportunity to meet new people in fantastic surroundings.  I really liked what Wadham College had done with their Changing Faces commission – four brilliant portraits of Wadham women.

Conference dinner at Wadham College
Wadham pudding

The conference proper began on Day Two and over the course of the two days there were lots of interesting presentations which it would be impossible to summarise here.  John Sheridan’s engaging and thought provoking talk on disrupting the archive, mapping the transition from paper archive to digital not just in a literal sense but also in the sense of our ways of thinking.  Paper-based archival practices rely on hierarchies and order – this does not work so well with digital content.  We probably also need to be thinking more like this:

and less like this:

for our digital archives.

Eduardo del Valle of the University of the Balearic Islands gave his Digital Fail story – a really important example of how sharing failures can be as important as sharing successes – in his case they learnt key lessons and can move on from this and hopefully prevent others from making the same mistakes.  Catherine Taylor of Waddesdon Manor also bravely shared the shared drive – there was a nervous giggle from an audience made up of people who all work with similarly idiosyncratically arranged shared drives… In both cases acquiring tools and applying technical solutions was only half of the work (or possibly not even half) its the implementation of the entire system (made up of a range of different parts) which is the difficult part to get right.

Me networking at the Natural History Museum

As a counter point to John Sheridan’s theory we had the extremely practical and important presentation from Angeline Takawira of the United Nations Mechanism for Criminal Tribunals who explained that preserving and managing archives are a core part of the function of the organisation.  Access for an extremely broad range of stake holders is key.  Some of the stakeholders live in parts of Rwanda where internet access is usually wifi onto mobile devices – this is an important part of considerations of how to make material available.

Alongside Angeline Takawira’s presentation Pat Sleeman of the UN Refugee Agency packed a powerful punch with her description of archives and records management in the field when coping with the biggest humanitarian crisis in the history of the organisation.  How to put together a business case for spending on digital preservation when the organisation needs to spend money on feeding starving babies.  And even twitter which had been lively during the course of the conference at the hashtag #PASIG17 fell silent at the testimony of Emi Mahmoud which exemplifies the importance of preserving the voices and stories of refugees and displaced persons.

I came away with a lot to think about and also a lot to do.  What can we do (if anything) to help with the some of the tasks faced by the digital preservation community as a whole?  The answer is we can share the work we are doing – success or failure – and all learn that it is a combination of tools, processes and skills which come from right across the board of IT, archives, librarians, data scientists and beyond that we can help preserve what needs to be preserved.

Rachel MacGregor (Digital Archivist)

[all images author’s own]

From Planning to Deployment: Digital Preservation and Organizational Change June 2017

We were very excited to be visiting the lovely city of York for the Digital Preservation’s event “From Planning to Deployment: Digital Preservation and Organizational Change”.  The day promised a mixture of case studies from organisations who have or are in the process of implementing a digital preservation programme and also a chance for Jisc to showcase some of the work they have been sponsoring as part of the Research Data Shared Services project (which we are a pilot institution for).  It was a varied programme and the audience was very mixed – one of the big benefits of attending events like these is the opportunity to speak to colleagues from other institutions in related but different roles.  I spoke to some Records Managers and was interested in their perspective as active managers of current data.  I’m a big believer in promoting digital preservation through involvement at all stages of the data lifecycle (or records continuum if you prefer) so it is important that as many people as possible – whatever their role in the creation or management of data – are encouraged into good data management practices.  This might be by encouraging scientists to adopt the FAIR principles or by Records Managers advising on file formats, file naming and structures and so on.

William Kilbride, Digital Preservation Coalition introduces the event (CC-BY Rachel MacGregor)

The first half of the day was a series of case studies presented by various institutions, large and small, who had a whole range of experiences to share. It was introduced by a presentation from the Polonsky Digital Preservation Project based at Oxford and Cambridge Universities.  Lee Pretlove and Sarah Mason jointly led the conversation talking us through the challenges of developing and delivering a digital preservation project which has to continue beyond the life of the project.  Both Universities represented in this project are very large organisations but this can make the issues faced by the team extremely complex and challenging.  They have been recording their experiences of trying to embed practices from the project so that digital preservation can become part of a sustainable programme.

The first case study came from Jen Mitcham from York University talking about the digital preservation work they have undertaken their.  Jen has documented her activities very helpfully and consistently on her blog and she talked specifically about the amount of planning which needs to go into work and then the very real difficulties in implementation.  She has most recently been looking at digital preservation for research data – something we are working on here at Lancaster University.

Next up was Louisa Matthews from the Archaeological Data Service who have been spearheading approaches to Digital Preservation for a very long time.  The act of excavating a site is by its nature destructive so it is vital to be able to capture a data about it accurately and be able to return to and reuse the data for the foreseeable future.  This captures digital preservation in a nutshell!  Louisa described how engaging with their contributors ensures high quality re-usable data – something we are all aiming for.

The final case study for the morning was Rebecca Short from the University of Westminster talking about digital preservation and records management.  The university have already had success implementing a digital preservation workflow and are now seeking to embed it further in the whole records creation and management process.  Rebecca described the very complex information environment at her university – relatively small in comparison to the earlier presentations but no less challenging for all that

The afternoon was a useful opportunity to hear from Jisc about their Research Data Shared Services project which we are a pilot for.  We heard presentations from Arkivum, Preservica and Artefactual Systems who are all vendors taking part in the project and gave interesting and useful perspectives on their approaches to digital preservation issues.  The overwhelming message however has to be – you can’t buy a product which will do digital preservation.  Different products and services can help you with it, but as William Kilbride, Executive Director of the Digital Preservation Coalition has so neatly put it “digital preservation is a human project” and we should be focussing on getting people to engage with the issues and for all of us to be doing digital preservation.

Rachel MacGregor

Sharing Qualitative Data Workshop

On 5 April we invited Libby Bishop to give a workshop on how to share qualitative data. Libby is well known in the Research Data Management (RDM) world as the Manager for Producer Relations at the UK Data Archive (University of Essex) although she introduced herself as a “maverick social science researcher”.

Libby explaining the workshop plan

Why have a workshop on sharing qualitative data?

The short answer is: because it is difficult! If we look at the datasets deposited in our Lancaster Research Directory (currently about 150) you will find very few qualitative datasets. The reason for that is that there are many challenges in sharing this type of data. Which is why we invited expert advice from Libby.

Workshop Highlights

Firstly, you can have a look at Libby’s slides below but I would like to highlight a few things that were especially of interest to me further on.

Qualitative data does get reused! Not just for research.

One of the surprises for me personally was that the reuse purpose of qualitative data is mainly for learning purposes (see figure below). According to Libby’s research 64% of downloads of qualitative data are for learning and 15% for research.

Re-use purposes of qualitative data downloaded from UK Data Service, 2002-2016. From Bishop, Libby and Kuula-Lummi, Arja (2017) ‘Revisiting Qualitative Data Reuse.’ SAGE Open, 7 (1). https://doi.org/10.1177/2158244016685136

In our workshop Libby used a dataset created by Lancaster University researchers to illustrate the benefits of archiving data: It will get re-used! The example is the dataset “Health and Social Consequences of the Foot and Mouth Disease Epidemic in North Cumbria, 2001-2003” which is available from the UK Data Service (http://doi.org/10.5255/UKDA-SN-5407-1). It is a rich qualitative study including interviews with people affected by the Foot & Mouth crisis and diaries documenting experiences in Cumbria 2001-2003.

Libby explained how the researchers themselves thought the data could not be archived but with support (and some extra funding) created an important resource that is being reused in different contexts.

Libby Bishop

Get the consent right!

A major hurdle on the way to sharing qualitative data is the right consent from research participants. Workshop participants worked on some real life examples provided by Libby and realised that critiquing consent forms is much easier than writing one yourselves.

For example, any pledge to “totally anonymise” an interview is a promise you are unlikely to keep. Also, vague statements or legalistic terminology were criticised.

Workshop participants discussing consent forms

Libby highlighted that consent statements actually have become more difficult to write as dissemination tools (including data archives) have diversified.

Some conclusions

Here are a few points that stuck on my mind after the Sharing Qualitative Data workshop:

  • Sharing qualitative data offers many benefits. We heard of examples where research participants were more keen on sharing their (anonymised) data than overly careful researchers.
  • The prime responsibility of the researcher is to protect participants but she/he has also a responsibility to science and funders. Both together according to Libby “is not an easy package”.
  • The three tools for sharing qualitative data are:
  1. A well written and explained informed consent form
  2. Protection of identities (through careful anonymisation)
  3. Regulated access (not all data should be open without restrictions)
“Sharing”, Image by Ryan Roberts, Flickr, CC-BY-NC

Full citation of the paper mentioned above: Bishop, Libby and Kuula-Lummi, Arja (2017) ‘Revisiting Qualitative Data Reuse.’ SAGE Open, 7 (1). https://doi.org/10.1177/2158244016685136

Hardy Schwamm, Research Data & Repository Manager, Lancaster University