International Digital Preservation Day 30th November 2017 #IDPD2017

What’s that about then?

Digital Archivists are a much misunderstood lot.

A lot of people think our work on digital preservation must be something to do with digitising old documents but this is absolutely not the case.  Of course digitising old documents is fantastic and the wonderful resources which are now increasingly available on the internet like (and there are so many examples these are just some of my favourite ones) Charles Booth’s London or the Cambridge Digital Library . There are thousands and thousands useful for scholars, historians, students, teachers, genealogists, journalists – well just about anyone really who is interested in getting access to sources that would otherwise be near impossible to access.  Digitising archive and library content has revolutionised the way we access and interact with archives, manuscripts and special collections.

Image: Flickr Kjetil Korslien CC BY-NC 2.0

However – this is not what the digital archivist does (although there are overlaps).  The digital archivist is concerned mainly (although not exclusively) with archives, data, stuff – whatever you want to call it – which was created in a digital format and has never had a physical existence.  If someone accidentally deletes the digitised version of Charles Booth’s poverty maps, the original is still there and can be digitised again.  Of course that would be an enormous waste of time and effort which is why we often treat digitised content as if it were the original content and guard against accidental deletion or loss.

But although digitisation does help preserve a document because it reduces the wear and tear on the original it is often swapping one stable format (paper, parchment etc) for a less stable one.  So you could argue that digitising – rather than helping with preservation issues – is just creating new ones.  Of course there are many very unstable analogue formats such as many photographic processes, magnetic tape and so forth which need to be digitised if they are to survive at all.

Digitisation is not preservation.

With digitised content you would like to think (!) that you might have some measure of control about what that content is, specifically the format it comes in.  It is possible to choose to save the image files in a format that is widely used and well documented, so that the risk that they will be hard to access in 5 or 10 years time is lessened.  There are formats which are recommended for long term preservation because they are widely adopted and well supported and by choosing these we help the process of digital preservation by giving those files a “head start”.

However files which are created by others – perhaps completely outside of the organisation – can come in *literally* any format.  A good example of this is when I analysed a sample of the data deposited by academics undertaking research at our institution and found a grand total of  59 different file types.  OK so that doesn’t sound *too* bad but 55% of the files I couldn’t identify at all.  Which is not so good.

So we could try (as some archives do) saying we will only accept files in a certain format, to give our files the best chance of a long and happy life.  But clearly there are lots of circumstances where this is either impractical or impossible.  For example with the papers of now-deceased person – we cannot ask them to convert them or resubmit them.  And in the case of our researchers they will need to be using specific software to perform specific specialised tasks and they themselves may have very little say in their choice of software.

Another major -and perhaps often overlooked issue with digital preservation is actually making sure that the files are captured in the first place.  This is not a digital specific problem – any kind of data whether it is research outputs, personal papers, financial records of a business – are all at risk of disappearing if they are not looked after properly.  They will need a safe storage environment where the risk of accidental or malicious damage is kept to a minimum and they can be found, the content understood and shared effectively.  For digital files this means a particularly rigorous ongoing check that the content and format are stable and that they can still be made accessible.

So what is digital preservation?

It’s not just backing stuff up

It’s the active management of digital assets to ensure they will still be accessible in the future.

Making sure we can still open files in the future.
Making sure we can still understand files in the future.

It’s a tough job – but someone’s got to do it!


Published by

Rachel MacGregor

Digital Archivist at Lancaster University. Interested in data discovery, archives, open data and lots of other things.