Impressions from Liber Conference 2018 in Lille

I was lucky enough to attend this years 47th Liber Annual Conference from 4-6 July in the French city of Lille. The theme was “Research Libraries as an Open Science Hub: from Strategy to Action” which is very much close to my heart. I want to highlight a few interesting presentations and talks I attended. If you want to find out more, presentations are available at and Zenodo.

Liber is quite a big conference. It stretches over three days (plus workshops and Liber group meetings) and was fully booked with 440 delegates from 35 countries.

Conference venue: Lilliad

Keynote Speech: A National Open Science Plan for France

Prof. Frédérique Vidal is the French Minister of Higher Education, Research and Innovation. She used her Keynote Speech to announce the French National Plan for Open Science. Among other things, the Minister The Minister declared the introduction of an obligation for open access dissemination of articles and works resulting from publicly funded research projects. You might think that is by now a fairly standard ambition in the EU.

However, Professor Vidal went further and included in her speech Open Data, training, a commitment to European research data infrastructure, the introduction of a “Chief Data Officer” in her Ministry and a budget of “€5.4M the first year and €3.4M the following years”.  What did impress me was the passion with which the announcement was delivered from someone who has real power. I’d be curious to hear from colleagues if the National Plan really has an impact on Open Science in France in the coming years.

Minister Frédérique Vidal

Open Science and academic libraries: managing the change

One theme that ran through the Liber conference was “culture change” acknowledging that the move to Open Science is a profound one that needs a long time to be achieved. Paul Ayris (UCL) and Tiberius Ignat (Scientific Knowlege Services) are experts in this topic having just published the paper “Defining the role of libraries in the Open Science landscape: a reflection on current European practice“. This paper included a diagram of the research cycle mapped against component parts of Open Science which I find really helpful.

Diagram Ayris, Ignat, 2018:

Their talk summarised also the Open Science Roadmap developed by LERU (League of European Research Universities) which provides 41 recommendations (yes, that’s quite a lot) in eight areas of Open Science. Paul and Tiberius recognise that this will be “a complex and multi-dimensional process of transition, different for every university”.

I asked Paul how we can get Senior Management on board who might not see how Open Science relates to their institutional strategic objectives. He gace the example that the UCL University Press which is fully Open Access has generated downloads from 220 countries, an impressive achievement that enhances UCL’s reputation and shows the reach of the “UCL brand”. Senior management will like this statistics very much.

Panel including Paul Ayris and Tiberius Ignat in the centre

To optimize or not to optimize one’s h-index – that is the question…

Part of my role as Research& Scholarly Communications Manager is to coordinate our (developing) Library’s citation analysis services. Therefore, I was curious to hear what Bertil Dorch (University Library of Southern Denmark) had to say about “optimising” researchers’ h-indices.

So, not everyone is familiar with the h-index. It is an author-level metric that attempts to measure both the productivity and the citation impact of scholarly publications. It is quite a simple and influential metric but not without its problems. Bertil and his co-authors looked at quantitative and qualitative data from a sample of 75 Danish researchers to see what recommendations can be made to researchers to increase the h-index.

Their preliminary findings can be found here introducing the concept of effectiveness.

Liber 2018 presentation:

Other impressions

The food! As you might expect from a conference in France the food was excellent. More surprisingly maybe, it was very nice for vegetarians as well!

Vegetarian conference lunch

The Reception on Thursday was a lot of fun. Not least because it was held in the beautiful Palais des Beaux-Arts in central Lille. It’s an art museum that holds paintings of Breughel, Bosch, Goya, Rembrandt and many others and quite a setting for a meeting!

Reception in the Palais des Beaux-Arts

The city! This was my first time in Lille and I liked it. Luckily, I was based in the buzzing old town full of character, bars and restaurants and a lot of young people. It helped that the World Cup was in full flow and France did rather well.

Central square in Lille


At least for me, Liber is a big conference. 430 people and three days long is full on. But that is also the beauty of it. It is broad in its remit and ambitious in its theme. However, it is also quite friendly and approachable despite the fact that it includes a lot of “senior” folks (i.e. Library Directors). While there might be a lack of “practical” things you can take home and implement straight away it provides a lot of food for thought and most importantly a great forum to exchange ideas. I am very much looking forward to Dublin 26-28 June 2019!

Liber Conference plenary
World Cafe at the Conference

More pictures from the Liber Conference 2018 on Flickr!

So Long and Thanks For All the Pizza






Today is my last day working as Digital Archivist at Lancaster University so I thought I would take a little time to reflect on my three years here; the highlights and what I have learnt in my time here.


Pizza has featured quite a bit in my time here at Lancaster. And that’s not just team lunches! Pizza is a core component of our Data Conversations – the networking event designed to bring together researchers to share experiences of creating, using and sharing data. Having a peer-led discussion forum has been a fantastic success story for us – it’s even gone global!  The pizza is a key part of this as it helps create an informal, friendly environment where sharing is central. I’ve learnt a huge amount from being part of the Data Conversations and I will definitely be taking forward what I’ve learnt about successful engagement activities.

Culture change

Data Conversations' attendees enjoying refreshments and conversation

The focus for the Research Services team from early on (ie when there were only two of us!) was about how to push forward culture change. We were fortunate in having a management team who supported and promoted team-led agenda setting.  We identified our priorities for development which focused on bringing about culture change and promoting an Open Research agenda. Looking at our goals and keeping them at the centre of what we do was important and meant we could tailor and prioritise activities around encouraging and promoting good data management practices.  From my perspective of being engaged in digital preservation the best chance of preserving data that we have is by ensuring that the data is created in the right way in the first place.   Sending the message about good data practices “upstream” so that well formed data is captured early and with the right metadata means it has the best chance of being accessible into the future.


Eating fondue

I’ve learnt a huge amount in the time I’ve been at Lancaster; when I started I had a lot of enthusiasm but not much practical experience.  I hope I’ve retained the enthusiasm but added experience and practical application to it. Things change at a very fast pace in the digital preservation world so it’s brilliant to be able to go to training events and conferences and hear from the leaders in the field. I was lucky enough to attend iPres 2016 in the beautiful city of Bern. I learnt a lot there and have very much built on that knowledge and experience especially around peronal digital archiving and community engagement activities.


Did I mention cake and biscuits?

The last three years have also brought me more fully into the Digital Preservation Community and it is a community where sharing best practice and collaborating is greatly encouraged for the benefit of all. I have had help and support from countless people but I would single out Jen Mitcham at the University of York and Sean Rippington at the University of St Andrews as being particularly supportive and inspirational. The Archivematica UK user group has also been a fantastic group and I am looking forward to continuing these relationships into the future.

The Future

So now I’m off to take up new challenges at the University of Warwick in their Modern Records Centre. I am looking forward to future collaborations with the team of colleagues and friends at Lancaster and see us all face the challenges that digital data presents together.

Rachel MacGregor, Digital Archivist

International Archives Day

Today is International Archives Day where everyone involved in preserving archives, records, data – whatever your take – celebrates the work that is happening worldwide to ensure the preservation of our memory and heritage and the protection of our rights by documenting decisions and building the foundations for good governance.

Lancaster Castle: very visible heritage (image author’s own CC-BY)

It’s easy to get people interested in memory and heritage – our history surrounds us in very visible ways and our memories are what binds us together with sharing and celebrating the past to inform our culture and identity.  But it’s much harder to get excited about “governance” even though it’s all about maintaining rights and responsibilities and ensuring justice and equality across the board.

So I want to take a moment to hear it for governance and shout about how the work we are doing here at Lancaster University is contributing towards supporting the creation of strong and accountable governance structures.  Accountable governance ensures fairness and equality for all. The work in my team is all about promoting the Open Research agenda which creates an environment where research is sustainable, reliable, accountable and for the greater good.

“Good governance in the public sector encourages
better informed and longer-term decision making as
well as the efficient use of resources. It strengthens
accountability for the stewardship of those
resources… People’s lives are
thereby improved.”

(International framework: good governance in the public sector IFAC/CIPFA 2014)

And it’s improving people’s lives that we are all really putting all our effort into.

So how are we hoping to supporting these objectives? The long term preservation of data and of good quality, reliable data means that we can support the decision making processes which affect all of us.  Poor data leads to poor decisions so we are looking to see if we can establish ways of preserving data in a way that guarantees its authenticity and integrity and ensures that it will be available for the long term.  The work is not done in isolation and we are looking at best practice and initiatives such as the Jisc Research Data Shared Service which we are hoping will deliver huge advances in helping us preserve important data.

Let’s celebrate everyone who is working hard on preserving documents, manuscripts, archives, data – all kinds of information – which enrich our lives and help us build a better world.

Rachel MacGregor (Digital Archivist)

Data Interview on “Messy Data”

Our latest Data Interview features our two Jisc sponsored Data Champions, Dr Jude Towers and Dr David Ellis. Jude is a Lecturer in Sociology and Quantitative Methods and David a Lecturer in Computational Social Science in our Psychology Department.

Jude and David recently presented at a Jisc event on ‘Stories from the Field: Data are Messy and that’s (kind of) ok’.

We talked to Jude and David about what Messy Data are (and many other things):

Q: At the recent Research Data Champions Day the title of your presentation was ‘Data are Messy and that’s (kind of) ok’. I wonder what are ‘messy research data’ in your fields?

Jude: My ‘messy data’ are crime data. The ‘messiness’ comes from a lot of different directions. One of the main ones is that administrative data is not collected for research, it is collected for other purposes. It never quite has the thing that you want. You need to work out if it is a good enough proxy or not.

For example I am interested in violence and gender but police crime data doesn’t disaggregate by gender.  There is no such crime as domestic violence, so it depends on whether somebody has flagged it as such, which is not mandatory, so it is hit and miss. I think the fact the data are not collected for research makes them messy for researchers, and then I guess there is all the other kind of biases that come with things like administrative data.  So if you think about crime, not everybody reports a crime, so you only get a particular sample. If you have a particular initiative, so every time there are international football matches they have big initiatives around domestic violence, so reporting goes up, so everyone says that domestic violence is related to football.  But is it, or is it just related to the fact that everyone one tells you, that you can report and they have zero tolerance to domestic violence during football matches?  It’s more likely to be recorded.

Then you get feedback loops, so the classic one at the moment is knife crime in London, because knife crime has gone up on the agenda more money and resource will go into knife crime, at some point that will probably go down, and something else will go up because there is a finite amount of resource.  These create feedback loops by the research that you do on the administrative data and people don’t always remember that when they come to interpret research.

Jude and David presenting on Messy Data

David: The majority of data within psychology that tends to measure people is messy because people are messy, particularly social, psychological phenomenon, there is always noise within. The challenge is often trying to get past that noise to understand what might be going on.  This is also true in administrative data and data you collect in a lab.  Probably the only exception in psychology is where people are doing very, very controlled maybe visual perception experiments where the measurement is very fine grain, but almost everything else in Psychology is by its nature extremely messy, and data never looks like it appears in a textbook.

Q: So there is always that ‘noise’ in research data, regardless if you use external data such NHS data, or if you collect data yourself, unless as you say it is in a very controlled environment?

David:  Yes. And I guess that within Psychology there is an argument that if the data is collected in a very controlled environment, is that actually someone’s real behaviour or is a less controlled environment more ecologically valid as you’ve always got that balance to try and address?

Q:  So what are the advantages, why do you work with messy data?     

Jude: Sometimes because there is nothing else. [laughs]

David: Because there is nothing else.  I think Psychology generally is going to be messy. Because as I said people aren’t perfect, you know they are not perfect scientific participants. Participants are not 100% predictable, people aren’t predictable social phenomena.  There are very few theories within social psychology, in fact, I don’t think there’s any that are 100% spot on.

When you compare that to say physics, where there is Newton’s law, where there are governing theories, which are singular truths, which explain a certain phenomenon. We don’t have much of that in psychology!  We have theories that tend to explain social phenomena but people are too unpredictable.  There are good examples of where theories have held for a long time but it is never a universal explanation.

David presenting

Q: What are the implications for management of that kind of messy data?

David: I think the implications are that you make sure that it is clear to people how you got from the raw data, which was noisy or messy, to something that resembles a conclusion.  So that could be: how did you get from X number of observations that you boiled down to an average that you then analysed? What is it the process of that? It’s not just about running a statistical test, it’s about the whole process from: this is what we started with and this is what we ended with.

Jude:  I think that’s right, I think being very clear about what your data can and cannot support and be very clear that you are not producing facts, you are testing theory, where everything is iterative, a tiny step towards something else, not the end. You never get to the end.

David: I think researchers have a responsibility to do that and people have to be careful in the language they use to convey how that has happened.  A good example of that at the moment is, there is a lot in the press and current debate about the effects social media has on children or on teenagers, and the way that it is measured and the language that is used to talk about that is to me totally disconnected. That behaviour isn’t really measured. It is generated by people providing an estimate of what they do, yet we know that, that estimate isn’t very accurate.  The conclusions which have been drawn  are that this is having this big effect on people.  I’m not saying it’s not having any effect; it’s not as exciting to say: ‘well actually the data’s really messy or not perfect, we can’t really conclude very much’. Instead it’s being pushed into saying that [social media] is causing a massive problem for young people, which we don’t know.  Which is why there is a responsibility for that to be clear and I don’t think in that debate it is clear, and I think there are big consequences because of it.

Jude and David at Jisc panel discussion

Q: So in your dream world, what would change, so we could work better with this kind of data?

Jude: I think we need better statistical literacy, across the board. This is what I did with my Masters students:  I told them to go and find a paper or media story which used centred statistics,  then critique it.  So, how do you know what someone is telling you is ‘true’? Why are they telling it you in that particular way? What data have they used? What have they excluded?

You go to the stats literature and they talk about outliers, as though it’s just a mere statistical phenomenon, but those decisions are often political and they massively change what we know, and nobody talks about that, nobody sets out exactly what that means.  The only official statistics for crime in England and Wales are currently capped at a maximum of five incidents.  If you are beaten up by your partner 40 times a year, only the first five are included in the count, which is a huge bias effect in what we know about crime.  Then in the way resources are distributed between different groups, about what crimes are going up and what are falling.  I think this lack of people questioning statistics in particular, but data more generally, is a real problem.  In our social science degrees we just do not teach undergraduates how to do that.  We do it with qualitative data, but we don’t do it with quantitative data. It’s exactly the same process, it’s exactly the same questions, but we just don’t do it, we are really bad at it Britain!

David: I think more generally, there is a cultural issue within the whole ethos of science, of how it gets published, of what becomes read and what doesn’t become read.  So again, say I go back, do a paper and find no relationship between social media use and anxiety.  That would be harder to publish than if I write a paper and find a tiny correlation, which is probably spurious and not even relevant, between anxiety and social media. So again, this comes down to both criticising what is out there but also what is just becoming more sellable or having more ‘impact’.  I use the word impact with inverted commas; what sounds more interesting, but actually might be totally wrong.  I think what is pushed is what’s more interesting rather than what is truth.  I think it’s worth remembering that science is about getting a result and trying to unpick it, looking at what else could explain this, what might we have missed.  Rather than saying ‘that’s it, it’s done’, it’s similar to what Jude was saying about a critical thinking process.

Q: Following on from what Jude said about the skills gap: You say that undergraduates are not taught the skills they need.  Therefore, when we eventually get PhD students and early career researchers this gap might have even increased?

Jude:  Yes, and they don’t use quantitative data, or they use it really uncritically. So lots of  post-graduate students who work on domestic violence won’t use quantitative data, but their thesis often  starts with ‘one in four women will experience domestic violence  in their lifetime’ or ‘two women a week are killed by intimate partners’, bbut they don’t know where that data comes from or how reliable it is or how it was achieved, yet it is just parroted.

David: I can give a similar example to that where it is sometimes difficult to take those numbers back, once they become a part of the common discourse.  So years ago we found that people check their smartphone 85 times a day on average.  Now that was a sample of about thirty young people. Now we obviously talked about that, but that number is now used repeatedly.  Now there is no way that my grandmother or my parents check their phone 85 times a day.  But that sample did, so there is now this kind of view that everyone checks it 85 times a day.  They probably don’t, but I can’t take that back now, there are things you don’t know at the time, but that is what that data showed.  It’s tricky to balance, and it was picked up as an impactful thing, but it wasn’t what we really meant.

Q: Is there also a job for you as a researcher if your findings are picked up by the media looking for a catchy easy numbers, to write your paper differently so that it is not being picked up so easily, or is it the fault of the media, because they are just looking for a simplified version of a complex issue?

David:  There is a cultural issue, a kind of toing and froing; because we want our work to be read and we want people to read it and certainly writing a press release is one way of doing that.  I think it’s actually what you put in the press release [that] has to be even more refined, because a lot of people won’t read the paper, but they will see the press release, and that will be spun.  Once the press release is done, it’s out of your control in some ways.  You can get it as right as you want but a journalist might still tweak it a certain way.  It’s a really tough balance because as you say the other extreme is to say I am just going to leave it. But then people might not hear about the work, so it’s a very tricky tightrope to walk.

Jude: We made the decision as a Centre when our work started getting picked up by the media, that we would not talk to the media about anything that had not been through peer review, so it is always peer reviewed first.  We work with one person from the press office, we work with her closely, all the way through the process of putting the paper together and deciding the press release and how we are going to release it.  What we have actually got now is contacts in several newspapers and media outlets And we say we will work with you exclusively providing this is the message which goes out. We have actually been successful enough that we’ve now got two or three people on board who will do that with us.  They get exclusives providing we see the copy before it goes public.


David: That is very hard to do, but really good.

Jude: We have been really hardcore and we’ve had a lot of pressure to put stuff out earlier, to make a bigger splash, to go with more papers. It was only I think because we resisted that, that in the long run it has been much better, although it is hard to resist the pressure.  The press in our early work wanted our trends, but we wanted them to talk about the data, we wouldn’t release the trends unless they talked about the problems with bias, official statistics.  So we kind of married the two, but they didn’t want it, but that was the  deal.

David: It’s like when you say: ‘people do X this number of times’ then you can’t put in brackets ‘within the sample’ so I understand where journalists come from and I understand the conversations with the press. To me as I said it’s like walking a tightrope. It has to be interesting enough that people want to read it, but at the same time it needs to be accurate.

Jude: But that’s the statistical literacy, because you want someone reading a media story going ‘Really? Well how did you get that?’ That’s something we would do as academics when you are reading it. People are always telling me ‘interesting facts’ about violence and my first reaction is always: ‘Where has that come from?’ These questions should become routine. I think journalist training is terrible!  I mean I have spent hours on the phone with journalists, who want me to say a really particular thing, and its clearly absolute nonsense! But they have got two little bits of data and they have drawn a line between them.

David: I have had a few experiences where journalists have tried to get a comment about someone else’s work and I have said things like, ‘I don’t think this is right’ or I’ve been critical and the journalist said, ‘well really what we are looking for is a positive comment’.  And I’ve said ‘well I’m not going to give you one’, and they have said ‘alright bye then’, and have gone and found someone that will.  That doesn’t happen very often, but we can see what they are kind of hoping for.  Presumably, some of the time I have said things where I have been really critical. The BBC are quite good at that; they get someone who they know is going to be critical without having to explicitly saying something negative.

Q: This has been fascinating; we have been though the whole life cycle of data from the creation to the management and now to the digestion by the media.  This tells us that data management issues are fundamental to the outputs of research.

Jude: I think it impacts on the open data agenda though ‘cause if I was going to put my data out, the caveat manual which came with it would be three times the size of the data.  Again, you don’t have any control over how someone presents an analysis of that data. I think it’s really difficult because we are not consistent with good practice in reporting on messiness of data.

David: I think there is a weight of responsibility on scientists to get that right! Because it does affect other things. I keep using social media as an example. The government are running an enquiry at the moment into the effects of screen time and social media. If I was being super critical I would say it’s a bit early for an enquiry, because there isn’t any cause and effect evidence. Even some of studies they report on their home page of the enquiry are totally flawed, one of them is not peer reviewed.  That lack of transparency or statistical literacy even among Members of Parliament, clearly, is leading to things being investigated where actually we could be missing a bigger problem here.  So that is just one example, but that is where there is a lot of noise about it, there is a lot of ‘this might be a problem’, or ‘is it a problem?’, right through to ‘it definitely is a problem’, without anyone standing back and going, ‘actually, is this an issue, is the quality of the evidence there?’


Jude: Or can you even do it at the moment?

David: Yes, absolutely! That is a separate area and there is a methodological challenge in that.

Jude: We get asked to measure trafficking in human beings on a regular basis, we’ve  even written a report that said you can’t measure it at the moment! There is no mechanism in place that can give you any data that is good enough to produce any kind of measure.

David: But that isn’t going to make it onto the front of the Daily Mail. [laughs]

Q: Maybe just to conclude our interview, what can the university do? You mentioned statistical literacy as one thing. Are there other things we can do to help?

Jude: We are starting to move a little bit in FASS [Faculty of Arts and Social Sciences] with some of Research Training Programme and I think things like the data conversations which are hard to measure but I think are actually having a really good impact.  Drawing people in through those kinds of mechanisms and then setting up people that are interested in talking about this would be good. I would like to see something around… what you need to tell people about your data when it’s published; you know, the caveats: what it can and can’t support, how far you can push it.

David: I think the University as a whole does a lot, certainly psychology, is preaching to the converted, in a way.  I would like a thing in Pure [Lancaster University Data Repository] that when you upload a paper it says… ‘have you have included any code or data?’ just as a sort of a ‘by the way you can do that’. One, it tells people that we do it and two, it reminds people that if you’re not doing that it would be useful just to have tick box just to see why.  Obviously, there are lots of cases where you can’t do it, but it would be good for that to be recorded. So is it actually, I can’t do it because the data is a total mess or some other reason or I’m not bothered.  There is an issue here about why not, because, if it has just been published it should be in a form which is sensible and clear.

Jude: I wonder if there is some scope in just understanding the data, so maybe like the data conversation is specifically about qualitative data, and then other even more obscure forms like literature reviews as data, ‘cause I still keep thinking about when you told me you offered to do data management with FASS and you were told they didn’t have any data.

I think that people don’t think about it as data in the same way and it would be really good to kind of challenge that.  I think data science has a massive problem in that area, it has become so dominant, and if you’re not doing what fits inside the data science box you’re not doing data and you’re not doing science and it’s really excluding.  I think for the university to embrace a universal definition of data would be really, really, beneficial.

David: It’s also good for the University, [to] capitalise on that extra resource; it would have a big effect on the institution as a whole.

Jude, David, thank you very much for this interesting interview!

Jude and David presenting

Jude and David have also featured in previous Data Interviews.

The interview was conducted by Hardy Schwamm, Research and Scholarly Communications Manager @hardyschwamm. Editing was done by Aniela Bylinski-Gelder and Rachel MacGregor.




5th Data Conversations – Stories from the Field

We recently held our fifth Data Conversations here at Lancaster University Library. These events bring researchers together and act as a forum to share their experiences of using and sharing data. The vibe’s informal and we provide our attendees with complementary coffee, cake and pizza…

It’s FAIR  to say that pizza is a popular part of the event. Who doesn’t love pizza…? The informal lunch at the start brings researchers together. It’s a chance to spark conversations and connections with colleagues from different disciplines and at different career stages.

Data Conversations' attendees enjoying refreshments and conversation

Once again we had a great programme with contributions from three fantastic speakers: 

Up first was Dr David Ellis, Lecturer in Computational Social Science from the Psychology department and one of our Jisc Data Champions. David spoke about his experiences (including challenges and solutions) of working with National Health Service Data.

David Ellis beginning his presentation







Next up was Jessica Phoenix, Criminology PhD Candidate. Jess spoke about her Masters dissertation project which looked at missing persons and the link between risk assessment and time to resolution. She spoke about the challenges and solutions associated with creating a dataset from pre-existing raw data. Issues that were amplified as the data were highly sensitive and identifiable (police records).

Image showing Jess as she begins her presentation









Last up was Professor Chris Hatton, Centre for Disability Research, Division of Health Research. Chris discussed his experience of collaborating with social workers to achieve uniquely valuable results. He also explored the way in which social media (his Twitter account) has provided a platform to engage with a wide array of voices that he couldn’t have reached through conventional research methods.

Chris enjoying jovial interaction with attendees








It was a another fantastic installment in an ongoing series of Data Conversations. We thoroughly enjoyed it and we’re looking forward to 6th Data Conversations: Keep it, throw it, put it in the vault…? We hope you can join us, sign up today!

Digital flyer promoting 6th Data Conversations to be held 18th September, 13:30-16:00, the Library, C130. Link below.

Joshua Sendall, Research Data Manager @JSendall

4th Data Conversation – Open Data Open Doors


We held our fourth Data Conversation here at Lancaster University bringing together researcher and their experiences of using and sharing data over pizza and cake…

Pizza is a big attraction at an event but more importantly it brings people together to share experiences and creates a relaxed and informal environment which encourages conversation – exactly what we want.  Now in our fourth event in the series we have some “regulars” who come for the conversation (and the pizza) but also new faces who bring new perspectives.

We had another interesting programme with a range of researchers from different disciplines:

Our first speaker was Dr John Towse, Senior lecturer in Psychology and for this Data Conversation he reflected on his role as editor of the Journal of Numerical Cognition an open access journal which charges no author fees.  The Journal is very encouraging of data sharing and as editor John is in the position of being able to ask his contributors to share their data although the journal does not require it.  John stressed that you can’t expect data sharing to happen organically – you have to ask.

Dr John Towse at the Data Conversation

Our next speaker was Dr Jo Knight who has featured as part of our Data Interview series talking about her work.  She explained about the emergence of the Psychiatric Genomics Consortium out of a need to share genomics data even where that data can be quite sensitive.  The aim is to make the data as open as possible and this has been made possible by creating a community of trust.  She emphasised that they are motivated by the wish to change people’s lives and do not share the data with commercial entities.

Dr Jo Knight discusses the issues of sharing genomic data

Dr Kyungmee Lee from the Department of Educational Research works with Distance Learners supporting their doctoral training as part of the preparation for their PhD research.  She encourages students to reuse existing datasets to investigate research methods and it was whilst doing this she realised how many datasets were out there which were difficult to use because they lacked context.

Dr Dermot Lynott takes us on a Data Journey

Dr Dermot Lynott entertained us with his confessions of a poor data manager, as he like the rest of us has been guilty of poor file organisation and even worse file naming.  However he also gave us a success story of publishing data which has been shared and re-used for a period of over 10 years and was keen to encourage others to see the benefit of doing the same.

Finally Professor Maggie Mort wrapped up with a moving and powerful description of the data gathered as part of the Documenting Flood Experience project and with warnings about the difficulties which might lie ahead with the incoming GDPR regulations which will impact on future projects which gather, use and store data relating to children.  This sparked off even more interest and debate.

Professor Maggie Mort discusses working with children and their data

To be honest we could easily have been there all day and we’re very much looking forward to the next Data Conversation on 10th April – Stories from the Field.

Wrapt attention at the 4th Data Conversation

Rachel MacGregor, Digital Archivist


Data Interview with Andrew Moore

Andrew Moore (@apmoore94) is a 2nd year PhD student at Lancaster University within the School of Computing and Communications. He is studying how sentiment analysis can be improved through world knowledge using finance as his specialised domain. His research interests are across Natural Language Processing, Machine Learning, and Reproducibility.

We talked to Andrew after he presented at the 3rd Data Conversations.

Q: When does software become research data in your understanding?

Andrew: As soon as you start writing software towards a research paper that I would count as research data.

Q: Is that when you need the code to verify results or re-run calculations?

Andrew: You also need the code to clean your data which is just as important as your results because depending on how you clean your data that informs on what your results are going to be.

Q: And the software is needed to clean the data?

Andrew: Yes. The software will be needed for cleaning the data. So as soon as you start writing your software towards a paper that is when the code becomes research data. It doesn’t have to be in the public domain but it really should be.

Q: What is the current practice when you publish a paper? Do you get asked where your software is?

Andrew: Recently we have actually, for some of our conferences in the computational linguistics or Natural Languages Processing field. But it is not a requirement to get published. It is a friendly question rather than an obligation.

Q: Who is asking, the publisher?

Andrew: No, that’s the conference chairs who are asking but it is not a requirement. Personally I think it should be. I can understand in certain cases when for instance there are security concerns. But normally the sensitivity is on the data side rather than the software.

Q: At the moment if you read a paper the software that is linked to the paper is not available?

Andrew: Normally, if there is software with the paper the paper would have a link, normally on the first or the last page. But a large proportion of the papers don’t have a link. Normally there would be a link to GitHub, maybe 50 per cent of the time. Other than that you can dig around if you’re really looking for it, perhaps Google the name but that’s not really how it should be.

Q: So sometimes the software is available but not referenced in the paper?

Andrew: That’s correct.

Q: But why would you not reference the software in the paper when it is available?

Andrew: I am really puzzled by this [laughs]. I can think of a few reasons. One of them could be that the GitHub instance is just used as backup. The problem I have with that is that it is not referenced in the paper how much do you trust the code to be the version that is associated with the paper?

Also, the other problem with that if I’m on GitHub is that if you reference it in a paper, on GitHub you can keep changing the code and unless you “tag” it on GitHub like a version number and reference that tag in your paper you don’t know what is the correct version.

Q: What about pushing a version of the code from GitHub to [the data archiving tool] Zenodo and get a DOI?

Andrew: I didn’t know about that until recently!

Andrew presenting at Data Conversations

Q: So this mechanism is not widely known?

Andrew: I know what DOIs are but not really how you can get them.

Q: So are the issues why software isn’t shared about the lack of time or is it more technical as we have just discussed, to do with versions and ways of publishing?

Andrew: I think time and technical issues go hand in hand. To be technically better takes time and to do research takes time. It is always a tradeoff between “I want my next paper out” and spending extra time on your code. If your paper is already accepted that is “my merit” so why spend more time?

But there are incentives! When I submitted paper at an evaluation workshop I said that everybody should release their software because it was about evaluating models so it makes sense to have all the code online. So it was decided that we shouldn’t enforce the release but it was encouraged and the argument was that you are likely to get more citations. Because if your code is available people are more likely to use it and then to credit you by citing your paper. So getting more citations is a good incentive but I am not sure if there are some studies proving that releasing software correlates to more citations?

Q: There are a number of studies proving there is a positive correlation when you deposit your research data[1]. I am not aware there is one for software[2]. So maybe we need more evidence to persuade researchers to release code?

Andrew: Personally I think you should do it anyway! You spend so many hours on writing software so even if it takes you a couple of hours extra to put it online it might save somebody else a lot of time doing the same thing. But some technical training could help significantly. From my understanding, the better I got at doing software development the quicker I’ve been getting at releasing code.

Q: Is that something that Lancaster University could help with? Would that be training or do we need specialists that offer support?

Andrew: I am not too sure. I have a personal interest in training myself but I am not sure how that would fit into research management.

Q: I remember that at the last Data Conversations Research Software Engineers were being discussed as a support method.

Andrew: I think that would be a great idea. They could help direct researchers. Even if they don’t do any development work for them they could have a look at the code and point them into directions and suggest “I think you should do this or that”, like re-factoring. I think that kind of supervision would be really beneficial, like a mentor even if they are not directly on that project. Just for example ten per cent of their time on a project would help.

Q: Are you aware that this is happening elsewhere?

Andrew: Yes, I did a summer internship with the Turing Institute and they have a team of Research Software Engineers.

Q: And who do the Research Software Engineers support?

Andrew: The Alan Turing Institute is made up of five institutes. They represent the Institute of Data Science for the UK. They do have their own researchers but also associated researchers from the other five universities. The Research Software Engineers are embedded in the research side integrated with the researchers.

When I was an intern at the Turing Institute one of the Research Software Engineers had a time slot for us available once a week.

Q: Like a drop in help session?

Andrew: Yes, like that. They helped me by directing me to different libraries and software to unit test my code and create documentation as well stating the benefits of doing this. I know that others teams benefited from there guidance and support on using Microsoft Azure cloud computing to facilitate their work. I imagine that a lot of time was saved by the help that they gave.

Q: Thanks Andrew. And to get to the final question. You deposited data here at Lancaster University using Pure. Does that work for you as a method to deposit your research data and get a DOI? Does that address your needs?

Andrew: I think better support for software might be needed on Pure. It would be great if it could work with GitHub.

Q: Yes, at the moment you can’t link Pure with GitHub in the same way you can link GitHub with Zenodo.

Andrew: When you link GitHub and Zenodo does Zenodo keep a copy of the code?

Q: I am not an expert but I believe provides the DOI to a specific release of the software.

Andrew: One thing I think it is really good that we keep data at Lancaster’s repository. In twenty years’ time GitHub might not exist anymore and then I would really appreciate a copy store in the Lancaster archives. The assumption that “It’s in GitHub, it’s fine” might not be true.

Q: Yes, if we assume that GitHub is platform for long-term preservation of code we need to trust it and I am not sure that this is the case. If you deposit here at Lancaster the University has a commitment to preservation and I believe that the University’s data archive is “trustworthy”.

Andrew: So putting a zipped copy of your code is a good solution for now. But in the long term the University’s archives could be better for software. An institutional GitLab might be good and useful. I know there is one in Medicine but an institution wide one would help. It would be nice if Pure could talk to these systems but I can imagine it is difficult.

The area of Neuroscience seems to be doing quite well with releasing research software. You have an opt-in system for the review of code. I think one of the Fellows of the Software Sustainability Institute was behind this idea.

Q: Did that happen locally here at Lancaster University?

Andrew: No, the Fellow was from Cambridge. They seem to be ahead of the curve but it only happened this year. But they seem to be really pushing for that.

Q: Thanks a lot for the Data Interview Andrew!

The interview was conducted by Hardy Schwamm.

[1] For example: Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ, 1, e175.

[2] Actually there is a relevant study: Vandewalle, Patrick. Code Sharing Is Associated with Research Impact in Image Processing . Computing in Science & Engineering, 2012,



Data Interview with Alison Scott-Baumann and Shuruq Naguib

Our latest Data Interview follows up a presentation at our 2nd Data Conversation. Alison Scott-Baumann (Professor of Society & Belief SOAS) and Dr Shuruq Naguib (Lecturer in Politics, Philosophy and Religion Lancaster) are working on the Re/presenting Islam on Campus project. Re/presenting Islam on Campus is a three year project funded by the Arts and Humanities Research Council (AHRC) and by the Economic and Social Research Council (ESRC). It explores how Islam and Muslims are represented and perceived on UK University campuses.

We had the opportunity to discuss research data issues surrounding their project. It turned out to be a highly interesting conversation on topics such as confidentiality, the limits of anonymisation, legal frameworks and the freedom of speech.

Q: Could you describe the aims of your project?

Alison: Thanks for inviting us. It is strange to be on the receiving end because we have been doing a lot of data collection where we put people at ease and now we are at the other end.

About 4 years ago, I became concerned about the increasing surveillance culture around Muslim communities, particularly on campus because that has an impact on free expression or could do. To me as an experienced researcher this seemed to be a politicisation of a research field if you generally identify Muslims as the “official other” and also tell us that they are dangerous with the 2015 Counter-Terrorism and Security Act and its attendant Prevent duty. What is currently not acknowledged is that the Prevent duty is actually not compulsory but the university sector has adopted it in order to keep their reputations clean.

So it is quite a difficult topic and the project aims to look at four major questions:

  1. What do university staff and students know about Islam?
  2. Where do they find that information?
  3. Thirdly with specific reference to three issues, how do they formulate their opinions? The first issue with regard to Islam is gender because that’s often in the media. The whole hijab discussion for example. Radicalisation, there is no point ignoring it because … [even though] there is no evidence that anybody gets radicalised on campus. And the third one is inter-faith because relations among students of different faiths and intra-faith also is of interest to us because it is a very secular culture we live in and yet for many young people their faith identity is important, more important than we realise because of the secular atmosphere that we created on campus.
  4. The fourth question is given that there might be some discrepancies self-identified by our participants in their responses to their first three questions, what could be done to improve the quality of the discussion on campus about Islam? How could we improve the discussion about anything that is regarded by university authorities as risky?

So all the way right from the start when I built a team we were all thinking about issues around Islam but also about the implications of that for the campus about free speech. That turned out to be a big issue because that gets more and more discussed even in the press.

Alison Scott-Baumann

Q: How long does the project run?

Alison: It is a 3 year project from 2015-2018. We are two thirds through.

Q: What kind of data do you need to answer your research questions?

Shuruq:We have two sets of data. We have actually completed data collection. We have collected quantitative data through a survey questionnaire. It was designed to be sent to the 6 universitiesi which are participating in the research. Before we received the grant and throughout the first year we were in conversation with the gatekeepers at those universities who were usually senior managers. They promised to facilitate the research including the survey to staff and students.

When we started  on-site research, we also wanted to do the questionnaire at the same time but the gatekeepers withdrew their collaboration.  The gatekeepers tried to get approval from the vice-chancellors and senior management. We came across a problem on several sites and that is what some describe as survey-fatigue. They were worried about students and staff receiving too many requests to fill in questionnaires. It seemed that universities were very reluctant to facilitate our surveys.

We had to redesign the questionnaire so that it was no no longer specific to the case studies; it is was now nation-wide questionnaire targeting students only, and we went to a private company to do that. The private company had access to students and could build up a sample for us. For example, we wanted our sample to include Muslims and non-Muslims and equal representation of gender and other criteria that we had in mind. We decided not to do the staff questionnaire because you can’t do that through the private companies and the universities were refusing to help. We had to make these decisions because of that particular challenge.

The other subset of data which is qualitative is based on interviews, focus groups, ethnography and curricular material. On each of the six campuses we interviewed 10 students and 10 members of staff. We attempted to handpick staff according to an ideal list which represents a mix of administrative and academic staff, senior and junior staff in different departments, Human Resources,  deans and postdocs, etc. The student interviewees were recruited through emails sent through the student union or were invited by researchers. It was a random sample. There were four focus groups on each site, one with staff and three with students. We wanted one focus group to be with Muslims, one with non-Muslims and one mixed. We didn’t always achieve all types and we faced a real challenge in recruiting students. Sometimes non-Muslim students weren’t at all interested in religion or Islam. We tried different techniques such as focus groups in cafes or other hang-out spaces for students but if participants are not interested in your topic no matter how you promote it, it’s really challenging! You might get a self-selected sample of participants who are interested in that topic.

Then we’ve also done ethnography which included observing the sites where students are, talking to different student societies, talking to a wide range of university staff. We attended public events, observing and describing these events: Who attends them, who the speakers are, especially if they are related to topics of religion, Islam, freedom of speech?

Part of the research is also how Islam is studied in the classroom. For each campus we attempted to collate data about all the courses that included a component on Islam. For a long time we used to call this “Islamic Studies” but we don’t mean Islamic Studies in a narrow sense, we mean it in a broad sense. We changed that label for that category of data to “Studying Islam” to broaden it out to include a course in the Faculty of Medicine on for example “Religion and Health”. We collected material through desktop research on all the courses that are offered in the year of the field work which have a component on Islam or religion.

Then we tried to zoom in on some modules reflecting a range of disciplines and approaches, collecting course programme and syllabus for further analysis. Within that sample we also attended some of the classes to observe the actual teaching and how the students respond. So we have a very complex set of data and we are just about to start the analysis stage and there are quite a few challenges there too.

Q: You have collected a wide range of data, from publicly available information to sensitive data like views on religion. Does that have an impact on how you manage your data?

Alison: There are challenges of managing that data but also of collecting it. When I submitted the research proposal to AHRC that was a year before the Counter-Terrorism and Security Act was passed. When I was awarded the grant that act had been passed. So a situation on campus that had already been quite sensitive arguably becomes more so. We were determined as a team to protect the identity of participants and we have established a sequence of events which we hope maximizes that possibility. We do tell our participants that they have to accept that it is actually impossible for us to be completely sure that we can protect them. Because if somebody wants to hack and they have money and expertise then they can get access to stuff.

But I’ll run you quickly through how we do things. There are only two documents that have the allocated number given to a participant and their name. One of them is the consent form. That is kept away from the university, locked up. The other document that has their allocated number and their identity is an Excel spreadsheet which is kept in a virtual vault which has all their characteristics except their political views. We are not collecting political views which the 1998 Data Protection Act lists as something that should be protected. So we are acting in accordance with that Act by seeking to protect their identity.

Once we’ve done that we then tell them before they speak that they have the right to withdraw, the right to anonymity and confidentiality and we give them a timeline so they have six months in which they could say “I’m actually not comfortable with this” but nobody has done that. What we cannot be sure of, of course, is who are the people who walked away from the possibility of speaking to us? It could be the silent majority. We will never know that. We have worked through the student unions to secure the interested students but if something pops up on their screens regarding opinions on Islam there are people who might think “I don’t want to enter that arena” for all sorts of different reasons.

Q: Can you expand on your data security and confidentiality measures?   

Alison: We keep our master spreadsheet encrypted via VeraCrypt which is a non-aligned programme unlike BitLocker which belongs to Microsoft.

In order to conduct an interview or a focus group we allocate a number to each person and before we did this we thought participants will find this ridiculous. But actually, with focus group people find it liberating which is the ideal. Every time they spoke they said “Number 32 speaking” and they would even say things like “I would like to endorse what Number 42 has just said”. That was perfect!

Q: Instead of a name badge people would wear a number?

Alison: No name badge but a numbered postit on the table in front of them and we know who they are if we want to track back. That worked much better than we thought it possibly could.

Then before the interviews and focus groups are transcribed we had a company called Divas because it is a lot of material. They have their own confidentiality agreement and we created one from SOAS as well. Divas destroy the original audios after a couple of weeks. We keep them but will destroy them some time in the second year. They will never be archived.

After the transcripts come back to us we have to clean them up. We have to take out any mention of names.

Shuruq: Let me add to that. Two issues have come up when cleaning the data.

Q: By cleaning do you mean anonymising?

Shuruq:  Yes, anonymising and removing any identifiers. Even when we use numbers in the focus groups they will refer to sites on their particular campus which will make locations identifiable. Or they would refer to a lecturer by name or to a course title. These are all ways by which confidentiality on that campus would be undermined. So we weren’t anonymising just the participants but also ensuring the anonymity of the campuses. Although the campuses are all named in our research we have agreed that when we come to write up the findings, we will not identify the campuses, because of sensitive issues such as how does the university implement Prevent policies. There could be some negative opinions, some difficult experiences. We don’t want to link those to specific campuses. So we are cleaning the data more extensively than normal perhaps.

Shuruq Naguib

It is quite challenging because as you are stripping down the data you lose context. If there is a university in Wales the Welsh context actually has certain factors that are important to remember when you are analysing the data. Or a specific college in London, how do we do that? We were negotiating the cleaning of the data with regard to gender, ethnicity, background, names of places. We tried to replace these with things that identify these elements but which maintain the anonymity. If it is a café we would strip down the name but still reflect the fact that it is a café in a student union.

But sometimes, especially with interviews we’ve had people who have roles, for example a student who is the Head of a Society or who is active on campus, is well-known and speaks in a certain way. Even if we clean the transcription if we want to quote him he might still be identified by his peers and people who know him.

And then one of the things we are coming up against is transliteration because as we look at how Islam is studied, some of the courses are linked with language training and attract overseas students. It is normal to hear different languages in this context. In an interview different languages could be used. Most of our team members speak several languages so participants have felt at ease using other languages. So how do we transliterate or translate? Sometimes it’s copious work. Some of the terms used in Arabic have specific religious connotations.

This is also sensitive data because often Arabic is perceived suspiciously as a sign of being foreign, as a sign of being a bit radical or of being committed to certain religious concepts. Do you keep the Arabic in the data? Certain words like Hijab and Jihad are loaded with negative connotations in public discourses. On some occasions we made the decision not to send a particular interview to the transcriber because it would endanger the person because they have expressed political views or they used a language that might be misunderstood. To protect the identity of that particular person on one occasion, our postdoc decided to transcribe the interview herself.

Q: Will you be able to share your data?

Alison: It will go into the UK Data Archive. That is a commitment we made to the AHRC and the ESRC who are partly funding us. There are definitely difficulties in assessing the risk of re-identification because it is impossible for us to know how recognisable somebody is to their colleagues or their friends by the way they are expressing themselves.

Q: Can I just confirm that you will share only transcriptions?

Alison: Yes, no audio, no video. But also, we haven’t decided what level of sharing is needed. We have already discussed this with the UK Data Archive and they have three access levels. Our data will not be Open Access. Some of it might be open to all registered users; other data might be accessible to approved researchers only.  There might be two tiers. I think our concern all the way through was not that that anybody has said anything dangerous because nobody has but that it might be construed as overly political by somebody who is looking at that data. If one of our participants has a view on foreign policy that doesn’t concur with the Government – in a democracy that should be possible but may be problematic in the current climate.

Q: Thanks for the explanation. What kind of research data services can Lancaster University offer to help your project?

Alison: I am personally very interested in the General Data Protection Regulation (GDPR) which will come into force in 2018. It appears to be inviting member states to decide if they tighten up on consent. This is an issue to do with Big Data and the way in which it is possible for all of us to covertly record or film each other, track each other. Anything is possible now. So the issues about consent may impact upon our ethnography. We did nothing covertly but inevitably if we were in a big open meeting we may have made notes about something somebody said and even if we don’t identify them we haven’t asked their consent. We would like guidance to whether this is going to clamp down issues around consent or if it is business as usual which means that if you go to reasonable lengths to protect somebody’s identity then that is acceptable.

We would also like you to be our critical friend [laughs]. We have a year to go. I think we are well prepared and we worked really hard on this aspect but there may be issues that we haven’t covered.

Project website:

Q: Can I ask about the ethnography, field notes and observations, will you be able to share them?

Alison: I give you a specific example. At campuses where it was possible we secured the approval of members of staff to allow us to sit in a lesson. The students were told when we were there but we didn’t ask each of them to sign a consent form. For example a student in one class I was in about international politics described how her relatives were caught up in border violence in Eastern Europe. I didn’t have her name but I made a note of the fact that this was an example of the fact that a really difficult issue can be taught so well that the trust between the student and the staff is so high that a student can self-disclose.

But it might be necessary under the new General Data Protection Act to remove that and simply say that there was evidence that trust was high rather than given the specific example. To me it doesn’t seem that I am endangering that person’s identity, absolutely not.

Shuruq: And the other difficulty is of course that we have also done ethnography at public events which could have been organised by the chaplaincy or a student society. Again, if you wanted to identify these events that can be done. These societies often set up event pages.

It could also be a lecture on Islam and the media, which was one of the public lectures I attended. The speaker is well known and the event was well publicized. The discussions and kind of questions that emerged, my observations look at how the audience was made up ( mostly Muslims, very few of the white students attended during that talk). The ones who are interested in Islam in the media are those who are impacted by the media representation which is largely Muslim students on campus.

How do you keep aspects of the context that shed light on the meaningfulness of this event and which makes the ethnography useful without undermining anonymity?

Q: One final question: In our trainings we often hear the concern that if you include a statement in a consent form that anonymised data will be shared publicly you might get fewer participants. Is that something you have experienced?

Alison: No, participants accept that. The point is that if they come to meet us, if they made that step that means that the information that was sent out by staff or student bodies has convinced them that this is an ethically planned project where we are not going in with preconceptions. If we then say that anonymised data will be shared they accept that.

The issue I am raising is the one that the ICO [Information Commissioner’s Office] hasn’t really clarified is this issue about would you have to get a consent form from thirty people in a classroom which at one level is a reasonable extension of consent issues but challenges our understanding of ethnography.

Shuruq: Of course we don’t collect any information on the students; we don’t know who they are. But the course outlines and lecture names will not be anonymised in class ethnography so that is something we need to be reflecting upon. The other thing is that the lecturer of one class asked if we were allowing students to withdraw from the class and whether we are asking for their consent. Our team member asked for a verbal consent and the lecturer gave students the opportunity to stay or withdraw from the class. So this could be an issue for some people.

Alison presenting at Data Conversations

Q: Do you have any final comments on your project with regards to data?

Shuruq: On one campusat a private university they had a previous experience of research where the anonymity of some of the interviewees was not protected and the way they were represented in the book that came out of the research was very negative. They were extremely reluctant to allow us in without sufficient guarantees that we are going to protect their identity. But we are facing a serious dilemma because it is such a unique campus that it is impossible to report anything on it without revealing which one it is. That is a serious challenge.

Alison: Just to follow on from that. We mentioned right at the beginning free speech. These strictures which are ethically motivated like the possible new legislation [GDPR] about consent they are at one level eminently sensible but at another level they may make it almost impossible to do research on people’s ability to express themselves freely. If people can’t express themselves freely because it might compromise them or their institution then we can’t do the research. So it is a very clever double bind but it’s not good for democracy because the ability to express oneself freely has possibly become, seen in the public eye, the ability to have a strong opinion about something. Instead of what I think which is going right back to Socrates where you talk something through in order to understand it better and understand your own decision making processes. For young adults at university the heuristic value of freedom of expression, as long as is not rude or illegal, is absolutely paramount to having citizens who are able to conduct themselves wisely in this complex world! There are huge issues at stake here!

Alison, Shuruq, thank you very much for this interesting interview!

The interview was conducted by Hardy Schwamm @hardyschwamm


3rd Data Conversation – Software as data: summary and slides

We had our third Data Conversation here at Lancaster University again with the aim of bringing together researchers to share their data stories and discuss issues and exchange ideas in a  friendly and informal setting.

Data Conversations Agenda

We had a bit of a change this time, however, as we had a special guest speaker, Neil Chue-Hong of the Software Sustainability Institute talking about Software as “a different kind of research object“.

We all had plenty of time to eat pizza and crisps before Neil invited us all to consider reproducibility and sustainability in relation to software.  Neil has a very clear and engaging style which really helped us, the audience, navigate around the complex issues of managing software.  He asked us all to imagine returning to our work in three months time – would it make sense?  Would it still work?  He also addressed some of the complex issues around versioning, authorship and sharing software.

Neil Chue-Hong, Software Sustainability Institute

The second half of the afternoon followed the more traditional Data Conversations route of short lightning talks given by Lancaster University researchers.

Data Conversations

First up was Barry Rowlingson (Lancaster Medical School) talking about the benefits of using GitLab for developing, sharing and keeping software safe.

Barry Rowlingson (Lancaster University)

Barry Rowlingson weighs up the benefits of GitLab over GitHub…

Next was Kristoffer Geyer (Psychology) talking about the innovative and challenging uses of smartphone data for investigating behaviour and in particular the issues of capturing the data from external and ever changing software. Kris mentioned how the recent update of Android (to Oreo) makes retrieving relevant data more difficult – a flexible approach is definitely what is needed.

Kristoffer Geyer (Lancaster University)


Then we heard from Andrew Moore (School of Computing and Communications) who returned to the theme of sharing software, looking at some of the barriers and opportunities which present themselves. Andrew argued passionately that we need more resources for software sharing (such as specialist Research Software Engineers) but also that researchers need to share their attitudes towards sharing their code.

Andrew Moore (Lancaster University)

Our final speaker was the Library’s own Stephen Robinson (Library Developer) talking about using containers as a method of software preservation.  This provoked quite some debate – which is exactly what we want to encourage at these events!

We think these kind of conversations are a great way of getting people to share good ideas and good practice around data management and we look forward to the next Data Conversations in January 2018!

This blog post was co-authored by Rachel MacGregor and Hardy Schwamm.

From Planning to Deployment: Digital Preservation and Organizational Change June 2017

We were very excited to be visiting the lovely city of York for the Digital Preservation’s event “From Planning to Deployment: Digital Preservation and Organizational Change”.  The day promised a mixture of case studies from organisations who have or are in the process of implementing a digital preservation programme and also a chance for Jisc to showcase some of the work they have been sponsoring as part of the Research Data Shared Services project (which we are a pilot institution for).  It was a varied programme and the audience was very mixed – one of the big benefits of attending events like these is the opportunity to speak to colleagues from other institutions in related but different roles.  I spoke to some Records Managers and was interested in their perspective as active managers of current data.  I’m a big believer in promoting digital preservation through involvement at all stages of the data lifecycle (or records continuum if you prefer) so it is important that as many people as possible – whatever their role in the creation or management of data – are encouraged into good data management practices.  This might be by encouraging scientists to adopt the FAIR principles or by Records Managers advising on file formats, file naming and structures and so on.

William Kilbride, Digital Preservation Coalition introduces the event (CC-BY Rachel MacGregor)

The first half of the day was a series of case studies presented by various institutions, large and small, who had a whole range of experiences to share. It was introduced by a presentation from the Polonsky Digital Preservation Project based at Oxford and Cambridge Universities.  Lee Pretlove and Sarah Mason jointly led the conversation talking us through the challenges of developing and delivering a digital preservation project which has to continue beyond the life of the project.  Both Universities represented in this project are very large organisations but this can make the issues faced by the team extremely complex and challenging.  They have been recording their experiences of trying to embed practices from the project so that digital preservation can become part of a sustainable programme.

The first case study came from Jen Mitcham from York University talking about the digital preservation work they have undertaken their.  Jen has documented her activities very helpfully and consistently on her blog and she talked specifically about the amount of planning which needs to go into work and then the very real difficulties in implementation.  She has most recently been looking at digital preservation for research data – something we are working on here at Lancaster University.

Next up was Louisa Matthews from the Archaeological Data Service who have been spearheading approaches to Digital Preservation for a very long time.  The act of excavating a site is by its nature destructive so it is vital to be able to capture a data about it accurately and be able to return to and reuse the data for the foreseeable future.  This captures digital preservation in a nutshell!  Louisa described how engaging with their contributors ensures high quality re-usable data – something we are all aiming for.

The final case study for the morning was Rebecca Short from the University of Westminster talking about digital preservation and records management.  The university have already had success implementing a digital preservation workflow and are now seeking to embed it further in the whole records creation and management process.  Rebecca described the very complex information environment at her university – relatively small in comparison to the earlier presentations but no less challenging for all that

The afternoon was a useful opportunity to hear from Jisc about their Research Data Shared Services project which we are a pilot for.  We heard presentations from Arkivum, Preservica and Artefactual Systems who are all vendors taking part in the project and gave interesting and useful perspectives on their approaches to digital preservation issues.  The overwhelming message however has to be – you can’t buy a product which will do digital preservation.  Different products and services can help you with it, but as William Kilbride, Executive Director of the Digital Preservation Coalition has so neatly put it “digital preservation is a human project” and we should be focussing on getting people to engage with the issues and for all of us to be doing digital preservation.

Rachel MacGregor