International Digital Preservation Day

What’s that about then?

International Digital Preservation Day 30th November 2017 #IDPD2017

What’s that about then?

Digital Archivists are a much misunderstood lot.

A lot of people think our  work on digital preservation must be something to do with digitising old documents but this is absolutely not the case.  Of course digitising old documents is fantastic and the wonderful resources which are now increasingly available on the internet like (and there are so many examples these are just some of my favourite ones) Charles Booth’s London or the Cambridge Digital Library . There are thousands and thousands useful for scholars, historians, students, teachers, genealogists, journalists – well just about anyone really who is interested in getting access to sources that would otherwise be near impossible to access.  Digitising archive and library content has revolutionised the way we access and interact with archives, manuscripts and special collections.

Image: Flickr Kjetil Korslien CC BY-NC 2.0

However – this is not what the digital archivist does (although there are overlaps).  The digital archivist is concerned mainly (although not exclusively) with archives, data, stuff – whatever you want to call it – which was created in a digital format and has never had a physical existence.  If someone accidentally deletes the digitised version of Charles Booth’s poverty maps, the original is still there and can be digitised again.  Of course that would be an enormous waste of time and effort which is why we often treat digitised content as if it were the original content and guard against accidental deletion or loss.

But although digitisation does help preserve a document because it reduces the wear and tear on the original it is often swapping one stable format (paper, parchment etc) for a less stable one.  So you could argue that digitising – rather than helping with preservation issues – is just creating new ones.  Of course there are many very unstable analogue formats such as many photographic processes, magnetic tape and so forth which need to be digitised if they are to survive at all.

Digitisation is not preservation.

With digitised content you would like to think (!) that you might have some measure of control about what that content is, specifically the format it comes in.  It is possible to choose to save the image files in a format that is widely used and well documented, so that the risk that they will be hard to access in 5 or 10 years time is lessened.  There are formats which are recommended for long term preservation because they are widely adopted and well supported and by choosing these we help the process of digital preservation by giving those files a “head start”.

However files which are created by others – perhaps completely outside of the organisation – can come in *literally* any format.  A good example of this is when I analysed a sample of the data deposited by academics undertaking research at our institution and found a grand total of  59 different file types.  OK so that doesn’t sound *too* bad but 55% of the files I couldn’t identify at all.  Which is not so good.

So we could try (as some archives do) saying we will only accept files in a certain format, to give our files the best chance of a long and happy life.  But clearly there are lots of circumstances where this is either impractical or impossible.  For example with the papers of now-deceased person – we cannot ask them to convert them or resubmit them.  And in the case of our researchers they will need to be using specific software to perform specific specialised tasks and they themselves may have very little say in their choice of software.

Another major -and perhaps often overlooked issue with digital preservation is actually making sure that the files are captured in the first place.  This is not a digital specific problem – any kind of data whether it is research outputs, personal papers, financial records of a business – are all at risk of disappearing if they are not looked after properly.  They will need a safe storage environment where the risk of accidental or malicious damage is kept to a minimum and they can be found, the content understood and shared effectively.  For digital files this means a particularly rigorous ongoing check that the content and format are stable and that they can still be made accessible.

So what is digital preservation?

It’s not just backing stuff up

It’s the active management of digital assets to ensure they will still be accessible in the future.

Making sure we can still open files in the future.
Making sure we can still understand files in the future.

It’s a tough job – but someone’s got to do it!


Reflections on PASIG 2017

Christchurch, Oxford: my home for the conference

It was fantastic to see PASIG 2017 (Preservation and Archives Special Interest Group) come to Oxford this year which meant I had the privilege of attending this prestigious international conference in the beautiful surroundings of Oxford’s Natural History Museum.  All slides and presentations are available here.

The first day was advertised as Bootcamp Day so that everyone could be up-to-speed with the basics.  And I thought: “do I know everything about Digital Preservation?”  and the answer was “No” so I decided to come along to see what I could learn.  The answer was:  quite a lot.  There was some excellent advice on offer from Sharon McMeekin of the Digital Preservation Coalition and Stephanie Taylor of CoSector who both have a huge amount of experience in delivering and supporting digital preservation training.  Adrian Brown (UK Parliament) gave us a lightning tour of relevant standards – what they are and why they are important.  It was so whistle stop that I think we were all glad that the slides of all the presentations are available – this was definitely one to go back to.

The afternoon kicked off with “What I wish I knew before I started” and again responses to these have been summarised in some fantastic notes made collaboratively but especially by Erwin Verbruggen (Netherlands Institute for Sound and Vision) and David Underdown (UK National Archives).  One of the pieces of advice I liked the most came from Tim Gollins (National Records of Scotland) who suggested that inspiration for solutions does not always come from experts or even from within the field – it’s an invitation to think broadly and get ideas, inspiration and solutions from far and wide.  Otherwise we will never innovate or move on from current practices or ways of thinking.

There was much food for thought from the British Library team who are dealing with all sorts of complex format features.  The line between book and game and book and artwork is often blurred.  They used the example of Nosy Crow’s Goldilocks and Little Bear – is it a book, an app, a game or all three?  And then there is Tea Uglow’s A Universe Explodes , a blockchain book, designed to be ephemeral and changing.  In this it has many things in common with time-based artworks which institutions such as the Tate, MOMA and many others are grappling with preserving.

The conference dinner was held at the beautiful Wadham College and it was great again to have the opportunity to meet new people in fantastic surroundings.  I really liked what Wadham College had done with their Changing Faces commission – four brilliant portraits of Wadham women.

Conference dinner at Wadham College
Wadham pudding

The conference proper began on Day Two and over the course of the two days there were lots of interesting presentations which it would be impossible to summarise here.  John Sheridan’s engaging and thought provoking talk on disrupting the archive, mapping the transition from paper archive to digital not just in a literal sense but also in the sense of our ways of thinking.  Paper-based archival practices rely on hierarchies and order – this does not work so well with digital content.  We probably also need to be thinking more like this:

and less like this:

for our digital archives.

Eduardo del Valle of the University of the Balearic Islands gave his Digital Fail story – a really important example of how sharing failures can be as important as sharing successes – in his case they learnt key lessons and can move on from this and hopefully prevent others from making the same mistakes.  Catherine Taylor of Waddesdon Manor also bravely shared the shared drive – there was a nervous giggle from an audience made up of people who all work with similarly idiosyncratically arranged shared drives… In both cases acquiring tools and applying technical solutions was only half of the work (or possibly not even half) its the implementation of the entire system (made up of a range of different parts) which is the difficult part to get right.

Me networking at the Natural History Museum

As a counter point to John Sheridan’s theory we had the extremely practical and important presentation from Angeline Takawira of the United Nations Mechanism for Criminal Tribunals who explained that preserving and managing archives are a core part of the function of the organisation.  Access for an extremely broad range of stake holders is key.  Some of the stakeholders live in parts of Rwanda where internet access is usually wifi onto mobile devices – this is an important part of considerations of how to make material available.

Alongside Angeline Takawira’s presentation Pat Sleeman of the UN Refugee Agency packed a powerful punch with her description of archives and records management in the field when coping with the biggest humanitarian crisis in the history of the organisation.  How to put together a business case for spending on digital preservation when the organisation needs to spend money on feeding starving babies.  And even twitter which had been lively during the course of the conference at the hashtag #PASIG17 fell silent at the testimony of Emi Mahmoud which exemplifies the importance of preserving the voices and stories of refugees and displaced persons.

I came away with a lot to think about and also a lot to do.  What can we do (if anything) to help with the some of the tasks faced by the digital preservation community as a whole?  The answer is we can share the work we are doing – success or failure – and all learn that it is a combination of tools, processes and skills which come from right across the board of IT, archives, librarians, data scientists and beyond that we can help preserve what needs to be preserved.

Rachel MacGregor (Digital Archivist)

[all images author’s own]

Researchers: what do they really think?

Image: Flickr – Rul Fernandes CC BY 2.0

Well… it’s probably quite hard to get to the truth of the matter but here at Lancaster we are trying to find out what researchers really think.  This is crucial for developing and improving our services and vital for delivering the service our researchers want.

We are one of the organisations taking part in the JISC RDM Shared Services pilot and you can read their take on the work being done here.  With JISC’s help we undertook a researcher survey to find out a bit more about the kinds of research data which were being produced, how the data were (or weren’t) being managed and researcher attitudes towards their data.

Researchers were asked about the types of data which were generated from their research.  The results were quite interesting to us.  Unsurprisingly perhaps far and away the most popular “type” of data were “document or report” followed with a bit of a gap by spreadsheets.  Structured text files (eg xml, json etc) came a lot lower down the list as did databases.

Lancaster Researchers’ responses to JISC DAF Survey

What interested us was comparing the kinds of files which researchers said they created during the research process with the kinds of files which were actually being deposited with us as research outputs.  Obviously comparisons are problematic not least because our researchers were being asked about the data generated as part of their research activities rather than specifically those which were ultimately selected for permanent preservation.  We also know that we only get a small proportion of the research data which are being created within the university and the respondents may include people who have not deposited data with us. Having analysed the research datasets which we have already we can see that a huge percentage were structured or unstructured text files and a much smaller proportion were spreadsheets or Word documents.

Analysis of file formats undertaken at Lancaster University

Is it that our researchers have a false sense of the kinds of data which they are creating and using or is it that we as data curators have a poor understanding of the researcher community?  I suspect that it is a bit of both but as data curators it is our duty to both have a good understanding of the data environment and also to be able to communicate to our research community.  This is something we need to address as part of improving our advocacy and engagement strategies.

Another question which was asked was was about sharing data and this got answers which did surprise us.  The majority said that they did already share data and very few said they were not willing to share.  For the ones who did not share data it was mostly because it was sensitive or confidential data or they did not have permission to share it.  Of those who did share data the majority said it was for “the potential for others to re-use data” and because “research is a public good and should be open to all”.  An encouraging third of those questioned said they had re-used someone else’s data.

Results of JISC DAF survey for Lancaster University

Of course we know that the people who did answer our survey represent those who are in some way already engaged with the RDM process.  We also know that people are likely to give the answers they want us to hear!  But if people are serious about being willing and able to share we really want to support them in this.

So we’ve decided to try and get talking to our researchers – and for them to talk to each other – by setting up a series of Data Conversations – events where researchers can discuss creation and dissemination of data to try and encourage a climate of sharing and valuing the data.  It means we can hope for data that is well curated from the start of its life and that will be selected for deposit appropriately and with good metadata.

Better communication and advocacy will help us in the long run to preserve and share high quality relevant data which can be shared and reused.  Managing (research) data and long term preservation of digital data are collaborative activities and the more we understand and share the better we will be at achieving these goals.

Rachel MacGregor, Digital Archivist

Acting On Change: Pericles/DPC Conference and DPA Awards London 2016

DPA Awards 2016 nominees and judges (Image @SueCorrigall licence OGL)
DPA Awards 2016 nominees and judges (Image @SueCorrigall licence OGL)

Last week I had the pleasure of attending the Pericles/DPC Conference: Acting on Change at the Wellcome Institute in London.  The theme of the conference was moving forward with digital preservation; in other words taking steps beyond just the technical tools and looking outward instead of inward.  There were excellent keynotes and panel sessions and useful and thought-provoking workshops.  PERICLES (Promoting and Enhancing Reuse of Information through the Content Lifecycle) is a EU funded four year project which seeks to address the issues of managing digital preservation in an ever changing world.

Kara Van Lassen of AVPreserve set the tone brilliantly with her inspiring keynote “Seeing the forest for the trees – looking outside the OAIS model” which focused mainly on moving away from what she called the “boutique approach” to digital preservation and towards developing a broader ecosystem of integrated automated services.  She touched on some of the difficulties in getting funding for what she calls “maintenance” (which after all is what digital preservation often is) as opposed to “cool new stuff” and recommended some listening and reading on the subject, such as the podcast “In Praise of Maintenance” and the article “Hail The Maintainers”. She concluded that what was required was a culture of change in the way people and organisations work so that digital preservation “just happens” and no one notices.  This was echoed many times during the course of the conference and is definitely something we have been thinking about in terms of the way we are developing our research services here at Lancaster University.

periconf1 periconf2
Barbara Sierman’s OAIS illustrations

The panel session following again took its theme of “Beyond OAIS”.  Many of us welcomed the all female panel and were impressed by the range and depth of experience represented with Angela Dappert, Pip Laurenson, Barbara Reed and Barbara Sierman – a truly international panel. Barbara Sierman spoke firmly in favour of the much-maligned OAIS model saying it was a guide and a conversation piece (she also had wonderful slides which illustrated her point of OAIS not being a cage to limit us but a guide to let us fly freely (and tweet too presumably!). Barbara Reed brought a welcome archival slant to the discussion with a more critical view of OAIS which she felt had certain assumptions which did not fit well with archival theory. Pip Laurenson likened the journey towards OAIS as like William Blake’s illustration of Dante’s Purgatory and Angela Dappert explained that the only way of never being wrong is by doing nothing. The call overall was to be involved and a good way to start with OAIS is to look at and contribute to the conversations taking place on the Digital Preservation wiki.

Image from Tate Galleries

In the afternoon we continued with the theme of working on the the maintenance and advocacy side of digital preservation. As Dan Gillean of Artefactual pointed out in his presentation two thirds of ISO 16363 is organisational rather than technical. Jen Mitcham explained to us that the key to working successfully was to work collaboratively and Angela Dappert wanted more encouragement for smaller organisations to take the first steps in preservation. Matthew Addis suggested that capability models could be a good way to start. Anna Henry finished off with describing some of the challenges of communication. The panel session gave us all both a lot to look at and a lot to think about. Wrapping up for the day we were asked – what is stopping us from making progress in digital preservation? The answer is money and confidence. The latter we can do something about…

DPA Awards (image by permission of the Digital Preservation Coalition)
DPA Awards (image by permission of the Digital Preservation Coalition)

And speaking of boosting confidence I was delighted to be invited to the Digital Preservation Awards Ceremony where the hard work and fantastic achievements of many individuals and projects were deservedly rewarded.  There’s a full list of the winners here although those who didn’t come away with an award were winners too, having achieved what we were all talking about – advocacy, innovation and sustainability.  Hopefully it will also give a boost of confidence to all those nominated as well as raise the profile of these projects with funders, more senior managers and those in a position to put additional resources into the ongoing maintenance of digital preservation programmes.

DPA Awards (image by permission of the Digital Preservation Coalition)
DPA Award winners (image by permission of the Digital Preservation Coalition)

The keynote on the second day was from Matthew Addis who continued with the theme of looking outwards and urged everyone to look for and actively seek opportunities to make a difference.  “Everyone benefits from the power of the many”.  A question from the audience was “How do we avoid getting shoved out of the way by the IT community?” and the answer is:  We work together!  We should collaborate and not compete and help move digital preservation upstream to where it’s an invisible part of everyday practice.

The second day’s panel was no less impressive than the first with a wide range of experiences and backgrounds represented and again a truly international gathering.  The panel were posed a series of questions posed by Natalie Harrower and in being asked for a “wish list” for 20 years time in digital preservation came up with a surprisingly varied set of responses.  Neil Beagrie wanted better metrics and better evidence for the impact of data losses.  However he also made a very popular suggestion which was a call for more one page summary documents which can be used as part of the advocacy process.  Nancy McGovern wanted the dash board of dash boards for her work (as did we all) and Jean-Yves Vion-Dury wanted a more sophisticated system of knowledge exchange. Natalie Milic-Frayling was brave about admitting the mistakes of past programmes and called for built in continuity in the design of tools.  George Papastefanatos echoed Matthew Addis with a call for the end of digital preservation as a separate “thing” but rather that it was integrated into the way everyone creates and uses data.

The session was an extremely lively one and I will definitely be returning to the recordings of it to capture some of the nuances of the debates.  In the afternoon I chose Pro-Active Data Management with Simon Waddington, George Papastefanotos and Tomasz Miksa.  Simon Waddington covered some approaches to data appraisal and compared technical and human appraisal decisions.  He highlighted the potential benefits of using modelling to help predict change and inform these decisions but had to admit that this was very expensive…  George Papastefanatos looked at preventative maintenance – again back to theme of the morning – and how we should be working towards robust and adaptable systems.  Tomasz Miksa took up the very hot topic of research reproducibilty and drew parallels with digital preservation techniques in how the environment in which the data is created becomes vital for understanding, preserving or reproducing it.  There followed a lively debate centering around the gap between theory and research and everyday practice and the need to be realistic when assessing what is achievable and possible.  Miksa made the very important point that people will choose the best tools available for them – these may well not be the most sustainable.

Me at the wonderful Wellcome Collection
Me thinking about writing a one page report at the wonderful Wellcome Collection

Wrapping up William Kilbride invited us all to be the agents of change and I for one have come away with some homework.  Developing and improving our advocacy work.  Producing short (!) reports to set out what we are trying to achieve and how and finally to continue to work collaboratively with others to avoid reinventing the wheel and to enable everyone to move forward in an ever changing world.

Rachel MacGregor, Digital Archivist

All images authors (CC-BY) unless otherwise credited.