International Digital Preservation Day 30th November 2017 #IDPD2017
What’s that about then?
Digital Archivists are a much misunderstood lot.
A lot of people think our work on digital preservation must be something to do with digitising old documents but this is absolutely not the case. Of course digitising old documents is fantastic and the wonderful resources which are now increasingly available on the internet like (and there are so many examples these are just some of my favourite ones) Charles Booth’s London or the Cambridge Digital Library . There are thousands and thousands useful for scholars, historians, students, teachers, genealogists, journalists – well just about anyone really who is interested in getting access to sources that would otherwise be near impossible to access. Digitising archive and library content has revolutionised the way we access and interact with archives, manuscripts and special collections.
However – this is not what the digital archivist does (although there are overlaps). The digital archivist is concerned mainly (although not exclusively) with archives, data, stuff – whatever you want to call it – which was created in a digital format and has never had a physical existence. If someone accidentally deletes the digitised version of Charles Booth’s poverty maps, the original is still there and can be digitised again. Of course that would be an enormous waste of time and effort which is why we often treat digitised content as if it were the original content and guard against accidental deletion or loss.
But although digitisation does help preserve a document because it reduces the wear and tear on the original it is often swapping one stable format (paper, parchment etc) for a less stable one. So you could argue that digitising – rather than helping with preservation issues – is just creating new ones. Of course there are many very unstable analogue formats such as many photographic processes, magnetic tape and so forth which need to be digitised if they are to survive at all.
Digitisation is not preservation.
With digitised content you would like to think (!) that you might have some measure of control about what that content is, specifically the format it comes in. It is possible to choose to save the image files in a format that is widely used and well documented, so that the risk that they will be hard to access in 5 or 10 years time is lessened. There are formats which are recommended for long term preservation because they are widely adopted and well supported and by choosing these we help the process of digital preservation by giving those files a “head start”.
However files which are created by others – perhaps completely outside of the organisation – can come in *literally* any format. A good example of this is when I analysed a sample of the data deposited by academics undertaking research at our institution and found a grand total of 59 different file types. OK so that doesn’t sound *too* bad but 55% of the files I couldn’t identify at all. Which is not so good.
So we could try (as some archives do) saying we will only accept files in a certain format, to give our files the best chance of a long and happy life. But clearly there are lots of circumstances where this is either impractical or impossible. For example with the papers of now-deceased person – we cannot ask them to convert them or resubmit them. And in the case of our researchers they will need to be using specific software to perform specific specialised tasks and they themselves may have very little say in their choice of software.
Another major -and perhaps often overlooked issue with digital preservation is actually making sure that the files are captured in the first place. This is not a digital specific problem – any kind of data whether it is research outputs, personal papers, financial records of a business – are all at risk of disappearing if they are not looked after properly. They will need a safe storage environment where the risk of accidental or malicious damage is kept to a minimum and they can be found, the content understood and shared effectively. For digital files this means a particularly rigorous ongoing check that the content and format are stable and that they can still be made accessible.
So what is digital preservation?
It’s not just backing stuff up
It’s the active management of digital assets to ensure they will still be accessible in the future.
Making sure we can still open files in the future.
Making sure we can still understand files in the future.
We had our third Data Conversation here at Lancaster University again with the aim of bringing together researchers to share their data stories and discuss issues and exchange ideas in a friendly and informal setting.
We all had plenty of time to eat pizza and crisps before Neil invited us all to consider reproducibility and sustainability in relation to software. Neil has a very clear and engaging style which really helped us, the audience, navigate around the complex issues of managing software. He asked us all to imagine returning to our work in three months time – would it make sense? Would it still work? He also addressed some of the complex issues around versioning, authorship and sharing software.
The second half of the afternoon followed the more traditional Data Conversations route of short lightning talks given by Lancaster University researchers.
First up was Barry Rowlingson (Lancaster Medical School) talking about the benefits of using GitLab for developing, sharing and keeping software safe.
Barry Rowlingson weighs up the benefits of GitLab over GitHub…
Next was Kristoffer Geyer (Psychology) talking about the innovative and challenging uses of smartphone data for investigating behaviour and in particular the issues of capturing the data from external and ever changing software. Kris mentioned how the recent update of Android (to Oreo) makes retrieving relevant data more difficult – a flexible approach is definitely what is needed.
Then we heard from Andrew Moore (School of Computing and Communications) who returned to the theme of sharing software, looking at some of the barriers and opportunities which present themselves. Andrew argued passionately that we need more resources for software sharing (such as specialist Research Software Engineers) but also that researchers need to share their attitudes towards sharing their code.
Our final speaker was the Library’s own Stephen Robinson (Library Developer) talking about using containers as a method of software preservation. This provoked quite some debate – which is exactly what we want to encourage at these events!
We think these kind of conversations are a great way of getting people to share good ideas and good practice around data management and we look forward to the next Data Conversations in January 2018!
This blog post was co-authored by Rachel MacGregor and Hardy Schwamm.
It was fantastic to see PASIG 2017 (Preservation and Archives Special Interest Group) come to Oxford this year which meant I had the privilege of attending this prestigious international conference in the beautiful surroundings of Oxford’s Natural History Museum. All slides and presentations are available here.
The first day was advertised as Bootcamp Day so that everyone could be up-to-speed with the basics. And I thought: “do I know everything about Digital Preservation?” and the answer was “No” so I decided to come along to see what I could learn. The answer was: quite a lot. There was some excellent advice on offer from Sharon McMeekin of the Digital Preservation Coalition and Stephanie Taylor of CoSector who both have a huge amount of experience in delivering and supporting digital preservation training. Adrian Brown (UK Parliament) gave us a lightning tour of relevant standards – what they are and why they are important. It was so whistle stop that I think we were all glad that the slides of all the presentations are available – this was definitely one to go back to.
The afternoon kicked off with “What I wish I knew before I started” and again responses to these have been summarised in some fantastic notes made collaboratively but especially by Erwin Verbruggen (Netherlands Institute for Sound and Vision) and David Underdown (UK National Archives). One of the pieces of advice I liked the most came from Tim Gollins (National Records of Scotland) who suggested that inspiration for solutions does not always come from experts or even from within the field – it’s an invitation to think broadly and get ideas, inspiration and solutions from far and wide. Otherwise we will never innovate or move on from current practices or ways of thinking.
There was much food for thought from the British Library team who are dealing with all sorts of complex format features. The line between book and game and book and artwork is often blurred. They used the example of Nosy Crow’s Goldilocks and Little Bear – is it a book, an app, a game or all three? And then there is Tea Uglow’s A Universe Explodes , a blockchain book, designed to be ephemeral and changing. In this it has many things in common with time-based artworks which institutions such as the Tate, MOMA and many others are grappling with preserving.
The conference dinner was held at the beautiful Wadham College and it was great again to have the opportunity to meet new people in fantastic surroundings. I really liked what Wadham College had done with their Changing Faces commission – four brilliant portraits of Wadham women.
The conference proper began on Day Two and over the course of the two days there were lots of interesting presentations which it would be impossible to summarise here. John Sheridan’s engaging and thought provoking talk on disrupting the archive, mapping the transition from paper archive to digital not just in a literal sense but also in the sense of our ways of thinking. Paper-based archival practices rely on hierarchies and order – this does not work so well with digital content. We probably also need to be thinking more like this:
and less like this:
for our digital archives.
Eduardo del Valle of the University of the Balearic Islands gave his Digital Fail story – a really important example of how sharing failures can be as important as sharing successes – in his case they learnt key lessons and can move on from this and hopefully prevent others from making the same mistakes. Catherine Taylor of Waddesdon Manor also bravely shared the shared drive – there was a nervous giggle from an audience made up of people who all work with similarly idiosyncratically arranged shared drives… In both cases acquiring tools and applying technical solutions was only half of the work (or possibly not even half) its the implementation of the entire system (made up of a range of different parts) which is the difficult part to get right.
As a counter point to John Sheridan’s theory we had the extremely practical and important presentation from Angeline Takawira of the United Nations Mechanism for Criminal Tribunals who explained that preserving and managing archives are a core part of the function of the organisation. Access for an extremely broad range of stake holders is key. Some of the stakeholders live in parts of Rwanda where internet access is usually wifi onto mobile devices – this is an important part of considerations of how to make material available.
Alongside Angeline Takawira’s presentation Pat Sleeman of the UN Refugee Agency packed a powerful punch with her description of archives and records management in the field when coping with the biggest humanitarian crisis in the history of the organisation. How to put together a business case for spending on digital preservation when the organisation needs to spend money on feeding starving babies. And even twitter which had been lively during the course of the conference at the hashtag #PASIG17 fell silent at the testimony of Emi Mahmoud which exemplifies the importance of preserving the voices and stories of refugees and displaced persons.
I came away with a lot to think about and also a lot to do. What can we do (if anything) to help with the some of the tasks faced by the digital preservation community as a whole? The answer is we can share the work we are doing – success or failure – and all learn that it is a combination of tools, processes and skills which come from right across the board of IT, archives, librarians, data scientists and beyond that we can help preserve what needs to be preserved.
We were very excited to be visiting the lovely city of York for the Digital Preservation’s event “From Planning to Deployment: Digital Preservation and Organizational Change”. The day promised a mixture of case studies from organisations who have or are in the process of implementing a digital preservation programme and also a chance for Jisc to showcase some of the work they have been sponsoring as part of the Research Data Shared Services project (which we are a pilot institution for). It was a varied programme and the audience was very mixed – one of the big benefits of attending events like these is the opportunity to speak to colleagues from other institutions in related but different roles. I spoke to some Records Managers and was interested in their perspective as active managers of current data. I’m a big believer in promoting digital preservation through involvement at all stages of the data lifecycle (or records continuum if you prefer) so it is important that as many people as possible – whatever their role in the creation or management of data – are encouraged into good data management practices. This might be by encouraging scientists to adopt the FAIR principles or by Records Managers advising on file formats, file naming and structures and so on.
The first half of the day was a series of case studies presented by various institutions, large and small, who had a whole range of experiences to share. It was introduced by a presentation from the Polonsky Digital Preservation Project based at Oxford and Cambridge Universities. Lee Pretlove and Sarah Mason jointly led the conversation talking us through the challenges of developing and delivering a digital preservation project which has to continue beyond the life of the project. Both Universities represented in this project are very large organisations but this can make the issues faced by the team extremely complex and challenging. They have been recording their experiences of trying to embed practices from the project so that digital preservation can become part of a sustainable programme.
The first case study came from Jen Mitcham from York University talking about the digital preservation work they have undertaken their. Jen has documented her activities very helpfully and consistently on her blog and she talked specifically about the amount of planning which needs to go into work and then the very real difficulties in implementation. She has most recently been looking at digital preservation for research data – something we are working on here at Lancaster University.
Next up was Louisa Matthews from the Archaeological Data Service who have been spearheading approaches to Digital Preservation for a very long time. The act of excavating a site is by its nature destructive so it is vital to be able to capture a data about it accurately and be able to return to and reuse the data for the foreseeable future. This captures digital preservation in a nutshell! Louisa described how engaging with their contributors ensures high quality re-usable data – something we are all aiming for.
The final case study for the morning was Rebecca Short from the University of Westminster talking about digital preservation and records management. The university have already had success implementing a digital preservation workflow and are now seeking to embed it further in the whole records creation and management process. Rebecca described the very complex information environment at her university – relatively small in comparison to the earlier presentations but no less challenging for all that
The afternoon was a useful opportunity to hear from Jisc about their Research Data Shared Services project which we are a pilot for. We heard presentations from Arkivum, Preservica and Artefactual Systems who are all vendors taking part in the project and gave interesting and useful perspectives on their approaches to digital preservation issues. The overwhelming message however has to be – you can’t buy a product which will do digital preservation. Different products and services can help you with it, but as William Kilbride, Executive Director of the Digital Preservation Coalition has so neatly put it “digital preservation is a human project” and we should be focussing on getting people to engage with the issues and for all of us to be doing digital preservation.
Here at Lancaster University we are very excited to be part of a group of pilot institutions taking part in Jisc’s Research data shared services project. This aims to provide a flexible range of services which suit the varied needs of institutions in the HE sector help achieve policy compliance for deposit, publication, discovery, storage and long term preservation of research data. It’s an ambitious project but one that there is an undoubted need for and we are trying to work with Jisc to help them achieve this goal.
Last week we were invited down to Jisc London HQ to learn about the progress of the project and – just as importantly – share our own thoughts and experiences on the process.
Daniela Duca has written a comprehensive overview of the meeting and the way forward for Jisc from the meeting.
Our table represented a microcosm of the project: Cambridge University (large institution), ourselves at Lancaster (medium) and the Royal College of Music (small). We all have extremely different needs and resources and how one institution tackles a problem will not work at another. However we have a common purpose in supporting our academics and students in their research, ensuring compliance with funders and enabling our institutions to support first class research outputs to share with the wider world.
We had been asked to do some preparatory work around costing models for the meeting – I think it would be fair to say we all found this challenging – probably because it is! My previous knowledge of costings comes from having looked at the excellent Curation Costs Exchange which is an excellent staring point for anyone considering approaching the very difficult task of costing curation services.
My main interest in the day lay in the preservation aspects of the project especially in exploring wider use cases. It’s clear that many institutions have a number of digital preservation scenarios for which the Shared Service solution might also be applicable. What is also clear is that there are so many possible use cases that it would be very easy to accidentally create a whole new project without even trying! I think it’s fair to say that all of us in the room – whether we are actively involved in digital preservation or not – are very interested in this part of the project. There is no sense in Jisc replicating work which has already been done elsewhere or is being developed by other parties so it presents an ideal opportunity for collaborative working and building on the strengths of the existing digital preservation community.
Overall there was much food for thought and I look forward to the next development in the shared services project.
Well… it’s probably quite hard to get to the truth of the matter but here at Lancaster we are trying to find out what researchers really think. This is crucial for developing and improving our services and vital for
We are one of the organisations taking part in the JISC RDM Shared Services pilot and you can read their take on the work being done here. With JISC’s help we undertook a researcher survey to find out a bit more about the kinds of research data which were being produced, how the data were (or weren’t) being managed and researcher attitudes towards their data.
Researchers were asked about the types of data which were generated from their research. The results were quite interesting to us. Unsurprisingly perhaps far and away the most popular “type” of data were “document or report” followed with a bit of a gap by spreadsheets. Structured text files (eg xml, json etc) came a lot lower down the list as did databases.
What interested us was comparing the kinds of files which researchers said they created during the research process with the kinds of files which were actually being deposited with us as research outputs. Obviously comparisons are problematic not least because our researchers were being asked about the data generated as part of their research activities rather than specifically those which were ultimately selected for permanent preservation. We also know that we only get a small proportion of the research data which are being created within the university and the respondents may include people who have not deposited data with us. Having analysed the research datasets which we have already we can see that a huge percentage were structured or unstructured text files and a much smaller proportion were spreadsheets or Word documents.
Is it that our researchers have a false sense of the kinds of data which they are creating and using or is it that we as data curators have a poor understanding of the researcher community? I suspect that it is a bit of both but as data curators it is our duty to both have a good understanding of the data environment and also to be able to communicate to our research community. This is something we need to address as part of improving our advocacy and engagement strategies.
Another question which was asked was was about sharing data and this got answers which did surprise us. The majority said that they did already share data and very few said they were not willing to share. For the ones who did not share data it was mostly because it was sensitive or confidential data or they did not have permission to share it. Of those who did share data the majority said it was for “the potential for others to re-use data” and because “research is a public good and should be open to all”. An encouraging third of those questioned said they had re-used someone else’s data.
Of course we know that the people who did answer our survey represent those who are in some way already engaged with the RDM process. We also know that people are likely to give the answers they want us to hear! But if people are serious about being willing and able to share we really want to support them in this.
So we’ve decided to try and get talking to our researchers – and for them to talk to each other – by setting up a series of Data Conversations – events where researchers can discuss creation and dissemination of data to try and encourage a climate of sharing and valuing the data. It means we can hope for data that is well curated from the start of its life and that will be selected for deposit appropriately and with good metadata.
Better communication and advocacy will help us in the long run to preserve and share high quality relevant data which can be shared and reused. Managing (research) data and long term preservation of digital data are collaborative activities and the more we understand and share the better we will be at achieving these goals.
Last week I had the pleasure of attending the Pericles/DPC Conference: Acting on Change at the Wellcome Institute in London. The theme of the conference was moving forward with digital preservation; in other words taking steps beyond just the technical tools and looking outward instead of inward. There were excellent keynotes and panel sessions and useful and thought-provoking workshops. PERICLES (Promoting and Enhancing Reuse of Information through the Content Lifecycle) is a EU funded four year project which seeks to address the issues of managing digital preservation in an ever changing world.
Kara Van Lassen of AVPreserve set the tone brilliantly with her inspiring keynote “Seeing the forest for the trees – looking outside the OAIS model” which focused mainly on moving away from what she called the “boutique approach” to digital preservation and towards developing a broader ecosystem of integrated automated services. She touched on some of the difficulties in getting funding for what she calls “maintenance” (which after all is what digital preservation often is) as opposed to “cool new stuff” and recommended some listening and reading on the subject, such as the podcast “In Praise of Maintenance” and the article “Hail The Maintainers”. She concluded that what was required was a culture of change in the way people and organisations work so that digital preservation “just happens” and no one notices. This was echoed many times during the course of the conference and is definitely something we have been thinking about in terms of the way we are developing our research services here at Lancaster University.
Barbara Sierman’s OAIS illustrations
The panel session following again took its theme of “Beyond OAIS”. Many of us welcomed the all female panel and were impressed by the range and depth of experience represented with Angela Dappert, Pip Laurenson, Barbara Reed and Barbara Sierman – a truly international panel. Barbara Sierman spoke firmly in favour of the much-maligned OAIS model saying it was a guide and a conversation piece (she also had wonderful slides which illustrated her point of OAIS not being a cage to limit us but a guide to let us fly freely (and tweet too presumably!). Barbara Reed brought a welcome archival slant to the discussion with a more critical view of OAIS which she felt had certain assumptions which did not fit well with archival theory. Pip Laurenson likened the journey towards OAIS as like William Blake’s illustration of Dante’s Purgatory and Angela Dappert explained that the only way of never being wrong is by doing nothing. The call overall was to be involved and a good way to start with OAIS is to look at and contribute to the conversations taking place on the Digital Preservation wiki.
In the afternoon we continued with the theme of working on the the maintenance and advocacy side of digital preservation. As Dan Gillean of Artefactual pointed out in his presentation two thirds of ISO 16363 is organisational rather than technical. Jen Mitcham explained to us that the key to working successfully was to work collaboratively and Angela Dappert wanted more encouragement for smaller organisations to take the first steps in preservation. Matthew Addis suggested that capability models could be a good way to start. Anna Henry finished off with describing some of the challenges of communication. The panel session gave us all both a lot to look at and a lot to think about. Wrapping up for the day we were asked – what is stopping us from making progress in digital preservation? The answer is money and confidence. The latter we can do something about…
And speaking of boosting confidence I was delighted to be invited to the Digital Preservation Awards Ceremony where the hard work and fantastic achievements of many individuals and projects were deservedly rewarded. There’s a full list of the winners here although those who didn’t come away with an award were winners too, having achieved what we were all talking about – advocacy, innovation and sustainability. Hopefully it will also give a boost of confidence to all those nominated as well as raise the profile of these projects with funders, more senior managers and those in a position to put additional resources into the ongoing maintenance of digital preservation programmes.
The keynote on the second day was from Matthew Addis who continued with the theme of looking outwards and urged everyone to look for and actively seek opportunities to make a difference. “Everyone benefits from the power of the many”. A question from the audience was “How do we avoid getting shoved out of the way by the IT community?” and the answer is: We work together! We should collaborate and not compete and help move digital preservation upstream to where it’s an invisible part of everyday practice.
The second day’s panel was no less impressive than the first with a wide range of experiences and backgrounds represented and again a truly international gathering. The panel were posed a series of questions posed by Natalie Harrower and in being asked for a “wish list” for 20 years time in digital preservation came up with a surprisingly varied set of responses. Neil Beagrie wanted better metrics and better evidence for the impact of data losses. However he also made a very popular suggestion which was a call for more one page summary documents which can be used as part of the advocacy process. Nancy McGovern wanted the dash board of dash boards for her work (as did we all) and Jean-Yves Vion-Dury wanted a more sophisticated system of knowledge exchange. Natalie Milic-Frayling was brave about admitting the mistakes of past programmes and called for built in continuity in the design of tools. George Papastefanatos echoed Matthew Addis with a call for the end of digital preservation as a separate “thing” but rather that it was integrated into the way everyone creates and uses data.
The session was an extremely lively one and I will definitely be returning to the recordings of it to capture some of the nuances of the debates. In the afternoon I chose Pro-Active Data Management with Simon Waddington, George Papastefanotos and Tomasz Miksa. Simon Waddington covered some approaches to data appraisal and compared technical and human appraisal decisions. He highlighted the potential benefits of using modelling to help predict change and inform these decisions but had to admit that this was very expensive… George Papastefanatos looked at preventative maintenance – again back to theme of the morning – and how we should be working towards robust and adaptable systems. Tomasz Miksa took up the very hot topic of research reproducibilty and drew parallels with digital preservation techniques in how the environment in which the data is created becomes vital for understanding, preserving or reproducing it. There followed a lively debate centering around the gap between theory and research and everyday practice and the need to be realistic when assessing what is achievable and possible. Miksa made the very important point that people will choose the best tools available for them – these may well not be the most sustainable.
Wrapping up William Kilbride invited us all to be the agents of change and I for one have come away with some homework. Developing and improving our advocacy work. Producing short (!) reports to set out what we are trying to achieve and how and finally to continue to work collaboratively with others to avoid reinventing the wheel and to enable everyone to move forward in an ever changing world.
Rachel MacGregor, Digital Archivist
All images authors (CC-BY) unless otherwise credited.
I attended the first Research Data Alliance workshop held in sunny Birmingham which was designed to bring together practitioners from across the UK to find out more about the work of the RDA. It was also a chance to see how we might be able to contribute and benefit from what the organisation has to offer. Despite already being a member of the RDA Interest Groups for Archives and Records Professionals, I confess to having been more of a casual observer than an active participant. So it was a brilliant opportunity to find out more about exactly what the Research Data Alliance is, how it works and what it hopes to achieve. Rachel Bruce from JISC introduced the event by outlining some of the ways in which JISC are working with the RDA across broad areas of Research Data Management and then handed over to Mark Parsons, the charismatic Secretary General of the RDA. Parsons is passionate about data, about connecting people and about creativity. He gave examples of technology “leapfrogging” and how local networks can come together to solve global issues. He used an illustration from the New York Magazine on how Willie Nelson is using local networks to take on corporate agricultural firms in the battle for the rising (legalised) marijuana market.
He also introduced ideas around how networks and connections lead to creativity and again referenced Anna Lowenhaupt Tsing’s Friction (this is the link if you’re lucky enough to be a Lancaster University person!) as well as Steve Johnson: “Chance favors the connected mind”:
That is how innovation happens…
The RDA he explained were absolutely not about a top-down framework but instead promoted a model of organic development; creating spaces for things to happen in. It was not, as Parsons explained, about thinking locally and acting globally but about doing local and global at the same time. The RDA has 75 Working and Interest Groups covering a very wide range of topics from the general right through to the extremely specific. There is no question that it is a complex network so we were invited to hear from a few of the Interest Groups: I chose Certification and Metadata, mostly because of their particular relevance to Digital Preservation.
The first session of the afternoon was on certification and first up was Lesley Rickards from the British Oceanographic Data Centre introducing the work of the Certification of Digital Repositories Interest Group. They are trying to map out Core Requirements for certifying repositories across the two main certification schemes for “trusted repositories”: World Data System (WDS) and the Data Seal of Approval (DSA). The two are different schemes using different concepts and methodologies which the RDA were keen to bring together. This they have successfully achieved with a Common Requirements document painstakingly mapping on onto the other and allowing for greater interoperability.
Next was Ingrid Dillo from the Data Archiving and Networked Service in the Netherlands who spoke about their experiences with obtaining certification – they went the whole hog and obtained Data Seal of Approval, World Data System certification and NestorSeal. DSA certification was A Lot of Work (approximately 250 staff hours) but nothing like as onerous as NestorSeal which took an eye popping 1500 person hours (if I recall correctly) which is something few repositories I imagine would be willing to contemplate. Interestingly DANS did not attempt ISO 16363. Certification is extremely important and Dillo pointed out the benefits of increased stakeholder trust and raising the profile of digital preservation in her organisation. She also felt the extra effort of attaining NestorSeal was worth it because it addressed some of the issues she felt were outstanding in the way they managed data. As for ISO 16363 it has a notoriously low take up and I wonder if too onerous a system coupled with limited resources means this situation does not change much in the near future.
The second session of the afternoon was on metadata and with Alex Ball of the Digital Curation Centre talking about the work of the RDA Metadata Standards Catalog Working Group whose initial aim was to make metadata standards easier to find and to advocate for their adoption. They hope that creating a more easily searchable catalogue of metadata will help with this. Sarah Jones (DCC) also introduced an enhancement to DMPOnline (a really useful tool we find!) which will make the addition of metadata easier and move towards Data Management Plans which are capable of being analysed by machines. This session also included a presentation from Dom Fripp of JISC on some of the ways in which they are trying to bring people together and be effective at using shared resources – don’t develop in isolation! He talked about JISC’s Research Data Discovery Service – a massive project which looks very exciting and also some of the work of the RDA Interoperability Working Group.
My quote of the day was “You’ve got to grab [metadata] when it’s produced” (Dom Fripp). This is so true and needs to be factored in when developing workflows and planning advocacy strategies.
My take-aways from the day were: it’s good to collaborate. Connections and conversations lead to new ideas.
It would seem it never rains but it pours with conferences and hot on the heels of iPres 2016 in Bern which I blogged about earlier came DCDC16: Discovering Collections: Discovering Communities which is organised jointly by the UK National Archives and Research Libraries UK. The theme this year was “From potential to impact” and certainly through the conference we heard quite a lot about academic impact especially in the context of the Research Excellence Framework.
There were four keynotes all of which were excellent and certainly delivered impact. All DCDC presentations are made available via their website where they should soon be available. If I were to choose one it would have to be the presentation from Phil Lyons and Sarah Coward from the National Holocaust Memorial Centre whose powerful keynote focussed on the importance of preserving memory and testimony. Their centre, amongst other things, seeks to keep the myriad testimonies of holocaust survivors alive. Visitors to the centre can meet, hear and speak to those who survived the holocaust. However they are faced with the problem that the average age of their witnesses is 87. The centre is well aware that a huge part of the impact the speakers have comes from the interaction and personal engagement so the experience is not a passive one. Visitors can question and interact with survivors: this will no longer possible once they have died. The NHMC have developed some fantastic software which aims to record the voice, moving image and memories of the survivors in such a way as they can be captured as life size projected image which can be questioned and will respond. This was developed by recording hours of footage of survivors responding to hundreds of questions. These are then transcribed, indexed and searched to provide a responsive and powerful experience.
The first session I attended was “Progressive Partnerships: creative collaboration between academic and cultural organisations” with a varied but engaging panel of Sarah Price from Durham University, Alice Purkiss from the University of Oxford/National Trust and Sue Gillett from La Trobe University, Australia. The collaborations discussed were very different but all were inspirational for using opportunities and expertise from different sources to promote and further engage different audiences. Sarah Price’s work at Durham University has helped tackle the growing funding gap in local authority cultural offers. With the closure of the local Durham Light Infantry Museum in Durham the university offered both space and expertise to help maintain access to these popular collections. The university worked very hard to identify stakeholders to plan for the future. They worked with users and non-users as well as developing relationships with the local authority. The key message was about working with the strong local sense of identity and listening to people about what they wanted.
Alice Purkiss spoke about a unique collaboration between the National Trust and Oxford University funded by Knowledge Transfer Partnerships. It is very unusual to have these awarded for a “Humanities” subject but the aim here was developing and creating online articles relating to aspects of National Trust properties (historical, architectural etc) using expertise drawn from the academic community at Oxford University. It enables the wider dissemination of research benefiting students, visitors and academics (by raising their profile). The example she gave was the article about Coade Stone which gives a concise and authoritative introduction whilst at the same time encouraging people to visit properties to find out more. The model was definitely about encouraging visitors and improving the visitor experience.
The final session in this panel came from Sue Gillett who talked about developing an educational programme at La Trobe University in Australia. They collaborated with two regional galleries – Bendigo Art Gallery and the Modern Art Museum of Albury – to create a changing programme of undergraduate study that was interdisciplinary and offered students an opportunity to engage with and respond to collections and exhibitions hosted by the partner organisations. Her example was a course based on the exhibition “Imagining Ned“, bringing together artefacts and art associated with and inspired by the iconic Australian folk hero Ned Kelly, including Kelly’s original armour and works by Sir Sidney Nolan. It brought up opportunities for public engagement which had not been anticipated. I particularly enjoyed seeing the student responses to the art – I wish I could have done a course like this when I was an undergraduate!
There was a whole panel session devoted to metadata and after all – what’s not to love about metadata? It’s what binds together all our collections in the form of catalogues and descriptions and there were some excellent presentations looking at ways in which we can use “metadata” (ie catalogues) to work harder for us. Neal Grindley from JISC introduced their KnowledgeBase+ project which aims to link libraries to accurate resources with the specific aim of efficient sharing of e-resources. The task of gathering and linking catalogue entries to provide a UK-wide resource sounds simple enough but often the data underlying it is not consistent so the task is nowhere near enough as straightforward as it sounds. The provenance of the current cataloguing metadata which is out there in the wild is, we are told, “murky”. Neil Grindley described it as the “library shaped black hole in the internet” which gave us all a pause for thought. It might be a question of licensing and then sharing, but in a world where (metada)data has significant commercial value, this is not easy. Grindley also looked forward to a point where we could use linked open data to connect to archive, museum and other resources.
We also heard from Dr Ben Outhwaite who is working on the Cambridge University Cairo Genizah project. The project is slowly conserving, digitising and cataloguing nearly 200,000 Hebrew manuscript fragments. A genizah is a sacred store for holy texts which can no longer be used but should be treated with reverence either by burying them storing them. Therefore any texts stored in a genizah are likely to be in a fragile or nearly illegible state, fragmentary and hard to interpret. The Cairo Genizah, whose fascinating story can be read here, also contained many other secular texts including invoices, shopping lists, legal documents etc which shed light on the thousand years of so history of the Jewish community of Fustat.
The collection is vast and using traditional methods it will take decades to transcribe and catalogue. Outhwaite discussed how the project used text mining techniques to interrogate the pre-existing literature written about the collection which allows some metadata to be created for the vast collection of fragments. Readers are also invited to add comments – a form of crowd sourcing – although Outhwaite admitted the audience for this was small – although quite active! Having the documents digitised does allow scholars from across the world to contribute and engage with the project. Outhwaite also suggested talking about metadata and text mining is a better way of getting funding than talking about a library cataloguing project…
The conference was packed full of fascinating presentations and I was only sorry I couldn’t clone myself and go to many more of them. By the end of the conference we were all preparing to wind down a bit only to be jolted out of our complacency by a rip-roaring final seminar from Ronan Deazley of Queen’s University Belfast, Andrea Wallace of Glasgow University and the National Library of Scotland and Simon Tanner of Kings College London. The session presented “Display At Your Own Risk” a “research-led exhibition experiment” which looked at the public re-use of digital surrogates.
The project explored the way in which cultural institutions expose and reproduce their collections leading on to an interesting debate about copyright and how this interplays with reproducing images. There was a further discussion exploring the risks/barriers both perceived and real to institutions using and reusing their collections online.
The Rijksmuseum in Amsterdam allow virtually unlimited re-use of their collections as well as exposing the metadata for creative re-use.
We were all impressed by the creative re-use Andrea Wallace had put the metadata to in making her very own metadata skirt:
There was a general plea for a sharing and documenting of experiences (good and bad) of copying and sharing cultural assets. People want “an easy answer” but there is no easy answer and there needs to be a “community of norms of practice” developed. There was certainly a lot to think about in this presentation!
As ever DCDC was a wide ranging and thought provoking conference which gave me lots to take, not least exploring the creative possibilities of metadata!
I was extremely lucky to attend iPres 2016 the International Digital Preservation conference this year held in the beautiful Swiss capital city Bern.
The conference attracts some of the leading practitioners in the field so it’s a real privilege to be able to hear from and speak to people who are leading in research and development – creating tools, developing workflows and undertaking research into all aspects of digital management and preservation.
It will take a while to digest everything – there was so much to learn! – but I thought I would gather together some “highlights” of the session while still fresh in my mind.
The conference opened with a keynote from Bob Kahn who reflected on the need for interoperability and unique identifiers with digital objects. The world we live in is a networked one and as we conceive of information and objects as linked to one another over networks so we must find ways of describing them in question and unambiguous ways. When objects can exist anywhere and in several places at once so we need to find unambiguous ways of describing them.
To complement this I attended a workshop on persistent identifiers which gave an extremely helpful introduction to the world of URNs, URLs, PURLs, Handles, DOIs and the rest. Sometimes it can seem a little like acronym spaghetti but the presenters Jonathan Clark, Maurizio Lunghi, Remco Van Veenendaal, Marcel Ras and Juha Hakala did did their best to untangle it for us. Remco van Veenendaal introduced a great online tool from National Archives of the Netherlands which aims to guide practitioners towards an informed choice about which identifier scheme to use. You can have a go at it here and the Netherlands Coalition for Digital Preservation are keen for feedback.
What is particularly useful about it is that it explains in some detail at each stage about which PiD system might be particularly good in specific circumstances allowing for a nuanced approach to collections management.
Current persistent identifier systems do not cope well with complex digital objects and likely future developments will be around tackling these shortcomings. Sadly the current widely used systems have already developed along separate lines to the extent that they cannot be fully aligned – sadly not the interoperable future we are all hoping for.
The second keynote came from Sabine Himmelsbach of the House of Electronic Art in Basel and was a lively and engaging account of a range of digital artworks and how digital preservation and curation has to work closely with artists to (re)create artworks. It threw up many philosophical questions about authenticity an integrity not to mention the technical challenges of emulation and preservation of legacy formats. This was a theme returned to again and again in various sessions throughout the conference as was the constant refrain of how the main challenges are not necessarily technological.
The conference had so many highlights it’s very hard to choose from amongst them. There were a number of papers looking specifically at the issues around the long term preservation of research data, which is of particular interest to the work we are undertaking at Lancaster University. There was a fascinating paper given by Austrian researchers from SBA research and TU Wien (the Vienna University of Technology) looking specifically at the management of the so-called “long tail” of research data – that is the wide variety of file formats spread over a relatively small number of files which characterises the management of research data in particular, but also of relevance for the management of legacy digital collections and digital art collections. This discussion was returned to by Jen Mitcham (University of York) and Steve Mackey (Archivum) talking about preserving Research Data and also in my final workshop on file format identification. Jay Gattusso – nobly joining in at 4 am local time from New Zealand – talked about similar issues at the National Library of New Zealand involving legacy digital formats where there were only one or two examples.
One of the posters also captured this point perfectly – “Should We Keep Everything Forever?: Determining Long-Term Value of Research Data” from the team at the University of Illinois at Urbana-Champaign which looked at trying to create a methodology for assessing and appraising research data.
Plenty of food for thought there about how much effort we should put into preserving, how we prioritise and how we appraise our collections.
The final keynote was from Dr David Bosshart of the Gottlieb Duttweiler Institute – a provocative take on the move from an industrial to a digital age. He had a very particular view of the future which caused a bit of a mini-twitter storm from those who felt that his view was very narrow; after all more than half the world is not online. Whilst his paper was no doubt deliberately designed to create debate, it highlighted the issues about where we direct our future developments and what our ultimate goals are. This is common to all archives/preservation strategies: whose stories are we preserving? and how are we capturing complex narratives? This issue was revisited later in a workshop on personal digital archiving. Preservation can only happen where information is captured in the first place. It can be about educating and empowering people to capture and present their own narratives.
There is still a lot for me to think about from such a varied and interesting conference. There was very little time for leisure but there were wonderful evening events which the conference organisers arranged – a drinks receptions at the National Library of Switzerland and a conference dinner at the impressive fifteenth century Rathaus. There are lots of conference photos online which give a flavour of the event.
And speaking of flavours I couldn’t visit Switzerland and not try a fondue…. Delicious!