Data Interview on “Messy Data”

Our latest Data Interview features our two Jisc sponsored Data Champions, Dr Jude Towers and Dr David Ellis. Jude is a Lecturer in Sociology and Quantitative Methods and David a Lecturer in Computational Social Science in our Psychology Department.

Jude and David recently presented at a Jisc event on ‘Stories from the Field: Data are Messy and that’s (kind of) ok’.

We talked to Jude and David about what Messy Data are (and many other things):

Q: At the recent Research Data Champions Day the title of your presentation was ‘Data are Messy and that’s (kind of) ok’. I wonder what are ‘messy research data’ in your fields?

Jude: My ‘messy data’ are crime data. The ‘messiness’ comes from a lot of different directions. One of the main ones is that administrative data is not collected for research, it is collected for other purposes. It never quite has the thing that you want. You need to work out if it is a good enough proxy or not.

For example I am interested in violence and gender but police crime data doesn’t disaggregate by gender.  There is no such crime as domestic violence, so it depends on whether somebody has flagged it as such, which is not mandatory, so it is hit and miss. I think the fact the data are not collected for research makes them messy for researchers, and then I guess there is all the other kind of biases that come with things like administrative data.  So if you think about crime, not everybody reports a crime, so you only get a particular sample. If you have a particular initiative, so every time there are international football matches they have big initiatives around domestic violence, so reporting goes up, so everyone says that domestic violence is related to football.  But is it, or is it just related to the fact that everyone one tells you, that you can report and they have zero tolerance to domestic violence during football matches?  It’s more likely to be recorded.

Then you get feedback loops, so the classic one at the moment is knife crime in London, because knife crime has gone up on the agenda more money and resource will go into knife crime, at some point that will probably go down, and something else will go up because there is a finite amount of resource.  These create feedback loops by the research that you do on the administrative data and people don’t always remember that when they come to interpret research.

Jude and David presenting on Messy Data

David: The majority of data within psychology that tends to measure people is messy because people are messy, particularly social, psychological phenomenon, there is always noise within. The challenge is often trying to get past that noise to understand what might be going on.  This is also true in administrative data and data you collect in a lab.  Probably the only exception in psychology is where people are doing very, very controlled maybe visual perception experiments where the measurement is very fine grain, but almost everything else in Psychology is by its nature extremely messy, and data never looks like it appears in a textbook.

Q: So there is always that ‘noise’ in research data, regardless if you use external data such NHS data, or if you collect data yourself, unless as you say it is in a very controlled environment?

David:  Yes. And I guess that within Psychology there is an argument that if the data is collected in a very controlled environment, is that actually someone’s real behaviour or is a less controlled environment more ecologically valid as you’ve always got that balance to try and address?

Q:  So what are the advantages, why do you work with messy data?     

Jude: Sometimes because there is nothing else. [laughs]

David: Because there is nothing else.  I think Psychology generally is going to be messy. Because as I said people aren’t perfect, you know they are not perfect scientific participants. Participants are not 100% predictable, people aren’t predictable social phenomena.  There are very few theories within social psychology, in fact, I don’t think there’s any that are 100% spot on.

When you compare that to say physics, where there is Newton’s law, where there are governing theories, which are singular truths, which explain a certain phenomenon. We don’t have much of that in psychology!  We have theories that tend to explain social phenomena but people are too unpredictable.  There are good examples of where theories have held for a long time but it is never a universal explanation.

David presenting

Q: What are the implications for management of that kind of messy data?

David: I think the implications are that you make sure that it is clear to people how you got from the raw data, which was noisy or messy, to something that resembles a conclusion.  So that could be: how did you get from X number of observations that you boiled down to an average that you then analysed? What is it the process of that? It’s not just about running a statistical test, it’s about the whole process from: this is what we started with and this is what we ended with.

Jude:  I think that’s right, I think being very clear about what your data can and cannot support and be very clear that you are not producing facts, you are testing theory, where everything is iterative, a tiny step towards something else, not the end. You never get to the end.

David: I think researchers have a responsibility to do that and people have to be careful in the language they use to convey how that has happened.  A good example of that at the moment is, there is a lot in the press and current debate about the effects social media has on children or on teenagers, and the way that it is measured and the language that is used to talk about that is to me totally disconnected. That behaviour isn’t really measured. It is generated by people providing an estimate of what they do, yet we know that, that estimate isn’t very accurate.  The conclusions which have been drawn  are that this is having this big effect on people.  I’m not saying it’s not having any effect; it’s not as exciting to say: ‘well actually the data’s really messy or not perfect, we can’t really conclude very much’. Instead it’s being pushed into saying that [social media] is causing a massive problem for young people, which we don’t know.  Which is why there is a responsibility for that to be clear and I don’t think in that debate it is clear, and I think there are big consequences because of it.

Jude and David at Jisc panel discussion

Q: So in your dream world, what would change, so we could work better with this kind of data?

Jude: I think we need better statistical literacy, across the board. This is what I did with my Masters students:  I told them to go and find a paper or media story which used centred statistics,  then critique it.  So, how do you know what someone is telling you is ‘true’? Why are they telling it you in that particular way? What data have they used? What have they excluded?

You go to the stats literature and they talk about outliers, as though it’s just a mere statistical phenomenon, but those decisions are often political and they massively change what we know, and nobody talks about that, nobody sets out exactly what that means.  The only official statistics for crime in England and Wales are currently capped at a maximum of five incidents.  If you are beaten up by your partner 40 times a year, only the first five are included in the count, which is a huge bias effect in what we know about crime.  Then in the way resources are distributed between different groups, about what crimes are going up and what are falling.  I think this lack of people questioning statistics in particular, but data more generally, is a real problem.  In our social science degrees we just do not teach undergraduates how to do that.  We do it with qualitative data, but we don’t do it with quantitative data. It’s exactly the same process, it’s exactly the same questions, but we just don’t do it, we are really bad at it Britain!

David: I think more generally, there is a cultural issue within the whole ethos of science, of how it gets published, of what becomes read and what doesn’t become read.  So again, say I go back, do a paper and find no relationship between social media use and anxiety.  That would be harder to publish than if I write a paper and find a tiny correlation, which is probably spurious and not even relevant, between anxiety and social media. So again, this comes down to both criticising what is out there but also what is just becoming more sellable or having more ‘impact’.  I use the word impact with inverted commas; what sounds more interesting, but actually might be totally wrong.  I think what is pushed is what’s more interesting rather than what is truth.  I think it’s worth remembering that science is about getting a result and trying to unpick it, looking at what else could explain this, what might we have missed.  Rather than saying ‘that’s it, it’s done’, it’s similar to what Jude was saying about a critical thinking process.

Q: Following on from what Jude said about the skills gap: You say that undergraduates are not taught the skills they need.  Therefore, when we eventually get PhD students and early career researchers this gap might have even increased?

Jude:  Yes, and they don’t use quantitative data, or they use it really uncritically. So lots of  post-graduate students who work on domestic violence won’t use quantitative data, but their thesis often  starts with ‘one in four women will experience domestic violence  in their lifetime’ or ‘two women a week are killed by intimate partners’, bbut they don’t know where that data comes from or how reliable it is or how it was achieved, yet it is just parroted.

David: I can give a similar example to that where it is sometimes difficult to take those numbers back, once they become a part of the common discourse.  So years ago we found that people check their smartphone 85 times a day on average.  Now that was a sample of about thirty young people. Now we obviously talked about that, but that number is now used repeatedly.  Now there is no way that my grandmother or my parents check their phone 85 times a day.  But that sample did, so there is now this kind of view that everyone checks it 85 times a day.  They probably don’t, but I can’t take that back now, there are things you don’t know at the time, but that is what that data showed.  It’s tricky to balance, and it was picked up as an impactful thing, but it wasn’t what we really meant.

Q: Is there also a job for you as a researcher if your findings are picked up by the media looking for a catchy easy numbers, to write your paper differently so that it is not being picked up so easily, or is it the fault of the media, because they are just looking for a simplified version of a complex issue?

David:  There is a cultural issue, a kind of toing and froing; because we want our work to be read and we want people to read it and certainly writing a press release is one way of doing that.  I think it’s actually what you put in the press release [that] has to be even more refined, because a lot of people won’t read the paper, but they will see the press release, and that will be spun.  Once the press release is done, it’s out of your control in some ways.  You can get it as right as you want but a journalist might still tweak it a certain way.  It’s a really tough balance because as you say the other extreme is to say I am just going to leave it. But then people might not hear about the work, so it’s a very tricky tightrope to walk.

Jude: We made the decision as a Centre when our work started getting picked up by the media, that we would not talk to the media about anything that had not been through peer review, so it is always peer reviewed first.  We work with one person from the press office, we work with her closely, all the way through the process of putting the paper together and deciding the press release and how we are going to release it.  What we have actually got now is contacts in several newspapers and media outlets And we say we will work with you exclusively providing this is the message which goes out. We have actually been successful enough that we’ve now got two or three people on board who will do that with us.  They get exclusives providing we see the copy before it goes public.


David: That is very hard to do, but really good.

Jude: We have been really hardcore and we’ve had a lot of pressure to put stuff out earlier, to make a bigger splash, to go with more papers. It was only I think because we resisted that, that in the long run it has been much better, although it is hard to resist the pressure.  The press in our early work wanted our trends, but we wanted them to talk about the data, we wouldn’t release the trends unless they talked about the problems with bias, official statistics.  So we kind of married the two, but they didn’t want it, but that was the  deal.

David: It’s like when you say: ‘people do X this number of times’ then you can’t put in brackets ‘within the sample’ so I understand where journalists come from and I understand the conversations with the press. To me as I said it’s like walking a tightrope. It has to be interesting enough that people want to read it, but at the same time it needs to be accurate.

Jude: But that’s the statistical literacy, because you want someone reading a media story going ‘Really? Well how did you get that?’ That’s something we would do as academics when you are reading it. People are always telling me ‘interesting facts’ about violence and my first reaction is always: ‘Where has that come from?’ These questions should become routine. I think journalist training is terrible!  I mean I have spent hours on the phone with journalists, who want me to say a really particular thing, and its clearly absolute nonsense! But they have got two little bits of data and they have drawn a line between them.

David: I have had a few experiences where journalists have tried to get a comment about someone else’s work and I have said things like, ‘I don’t think this is right’ or I’ve been critical and the journalist said, ‘well really what we are looking for is a positive comment’.  And I’ve said ‘well I’m not going to give you one’, and they have said ‘alright bye then’, and have gone and found someone that will.  That doesn’t happen very often, but we can see what they are kind of hoping for.  Presumably, some of the time I have said things where I have been really critical. The BBC are quite good at that; they get someone who they know is going to be critical without having to explicitly saying something negative.

Q: This has been fascinating; we have been though the whole life cycle of data from the creation to the management and now to the digestion by the media.  This tells us that data management issues are fundamental to the outputs of research.

Jude: I think it impacts on the open data agenda though ‘cause if I was going to put my data out, the caveat manual which came with it would be three times the size of the data.  Again, you don’t have any control over how someone presents an analysis of that data. I think it’s really difficult because we are not consistent with good practice in reporting on messiness of data.

David: I think there is a weight of responsibility on scientists to get that right! Because it does affect other things. I keep using social media as an example. The government are running an enquiry at the moment into the effects of screen time and social media. If I was being super critical I would say it’s a bit early for an enquiry, because there isn’t any cause and effect evidence. Even some of studies they report on their home page of the enquiry are totally flawed, one of them is not peer reviewed.  That lack of transparency or statistical literacy even among Members of Parliament, clearly, is leading to things being investigated where actually we could be missing a bigger problem here.  So that is just one example, but that is where there is a lot of noise about it, there is a lot of ‘this might be a problem’, or ‘is it a problem?’, right through to ‘it definitely is a problem’, without anyone standing back and going, ‘actually, is this an issue, is the quality of the evidence there?’


Jude: Or can you even do it at the moment?

David: Yes, absolutely! That is a separate area and there is a methodological challenge in that.

Jude: We get asked to measure trafficking in human beings on a regular basis, we’ve  even written a report that said you can’t measure it at the moment! There is no mechanism in place that can give you any data that is good enough to produce any kind of measure.

David: But that isn’t going to make it onto the front of the Daily Mail. [laughs]

Q: Maybe just to conclude our interview, what can the university do? You mentioned statistical literacy as one thing. Are there other things we can do to help?

Jude: We are starting to move a little bit in FASS [Faculty of Arts and Social Sciences] with some of Research Training Programme and I think things like the data conversations which are hard to measure but I think are actually having a really good impact.  Drawing people in through those kinds of mechanisms and then setting up people that are interested in talking about this would be good. I would like to see something around… what you need to tell people about your data when it’s published; you know, the caveats: what it can and can’t support, how far you can push it.

David: I think the University as a whole does a lot, certainly psychology, is preaching to the converted, in a way.  I would like a thing in Pure [Lancaster University Data Repository] that when you upload a paper it says… ‘have you have included any code or data?’ just as a sort of a ‘by the way you can do that’. One, it tells people that we do it and two, it reminds people that if you’re not doing that it would be useful just to have tick box just to see why.  Obviously, there are lots of cases where you can’t do it, but it would be good for that to be recorded. So is it actually, I can’t do it because the data is a total mess or some other reason or I’m not bothered.  There is an issue here about why not, because, if it has just been published it should be in a form which is sensible and clear.

Jude: I wonder if there is some scope in just understanding the data, so maybe like the data conversation is specifically about qualitative data, and then other even more obscure forms like literature reviews as data, ‘cause I still keep thinking about when you told me you offered to do data management with FASS and you were told they didn’t have any data.

I think that people don’t think about it as data in the same way and it would be really good to kind of challenge that.  I think data science has a massive problem in that area, it has become so dominant, and if you’re not doing what fits inside the data science box you’re not doing data and you’re not doing science and it’s really excluding.  I think for the university to embrace a universal definition of data would be really, really, beneficial.

David: It’s also good for the University, [to] capitalise on that extra resource; it would have a big effect on the institution as a whole.

Jude, David, thank you very much for this interesting interview!

Jude and David presenting

Jude and David have also featured in previous Data Interviews.

The interview was conducted by Hardy Schwamm, Research and Scholarly Communications Manager @hardyschwamm. Editing was done by Aniela Bylinski-Gelder and Rachel MacGregor.




5th Data Conversations – Stories from the Field

We recently held our fifth Data Conversations here at Lancaster University Library. These events bring researchers together and act as a forum to share their experiences of using and sharing data. The vibe’s informal and we provide our attendees with complementary coffee, cake and pizza…

It’s FAIR  to say that pizza is a popular part of the event. Who doesn’t love pizza…? The informal lunch at the start brings researchers together. It’s a chance to spark conversations and connections with colleagues from different disciplines and at different career stages.

Data Conversations' attendees enjoying refreshments and conversation

Once again we had a great programme with contributions from three fantastic speakers: 

Up first was Dr David Ellis, Lecturer in Computational Social Science from the Psychology department and one of our Jisc Data Champions. David spoke about his experiences (including challenges and solutions) of working with National Health Service Data.

David Ellis beginning his presentation







Next up was Jessica Phoenix, Criminology PhD Candidate. Jess spoke about her Masters dissertation project which looked at missing persons and the link between risk assessment and time to resolution. She spoke about the challenges and solutions associated with creating a dataset from pre-existing raw data. Issues that were amplified as the data were highly sensitive and identifiable (police records).

Image showing Jess as she begins her presentation









Last up was Professor Chris Hatton, Centre for Disability Research, Division of Health Research. Chris discussed his experience of collaborating with social workers to achieve uniquely valuable results. He also explored the way in which social media (his Twitter account) has provided a platform to engage with a wide array of voices that he couldn’t have reached through conventional research methods.

Chris enjoying jovial interaction with attendees








It was a another fantastic installment in an ongoing series of Data Conversations. We thoroughly enjoyed it and we’re looking forward to 6th Data Conversations: Keep it, throw it, put it in the vault…? We hope you can join us, sign up today!

Digital flyer promoting 6th Data Conversations to be held 18th September, 13:30-16:00, the Library, C130. Link below.

Joshua Sendall, Research Data Manager @JSendall

From Planning to Deployment: Digital Preservation and Organizational Change June 2017

We were very excited to be visiting the lovely city of York for the Digital Preservation’s event “From Planning to Deployment: Digital Preservation and Organizational Change”.  The day promised a mixture of case studies from organisations who have or are in the process of implementing a digital preservation programme and also a chance for Jisc to showcase some of the work they have been sponsoring as part of the Research Data Shared Services project (which we are a pilot institution for).  It was a varied programme and the audience was very mixed – one of the big benefits of attending events like these is the opportunity to speak to colleagues from other institutions in related but different roles.  I spoke to some Records Managers and was interested in their perspective as active managers of current data.  I’m a big believer in promoting digital preservation through involvement at all stages of the data lifecycle (or records continuum if you prefer) so it is important that as many people as possible – whatever their role in the creation or management of data – are encouraged into good data management practices.  This might be by encouraging scientists to adopt the FAIR principles or by Records Managers advising on file formats, file naming and structures and so on.

William Kilbride, Digital Preservation Coalition introduces the event (CC-BY Rachel MacGregor)

The first half of the day was a series of case studies presented by various institutions, large and small, who had a whole range of experiences to share. It was introduced by a presentation from the Polonsky Digital Preservation Project based at Oxford and Cambridge Universities.  Lee Pretlove and Sarah Mason jointly led the conversation talking us through the challenges of developing and delivering a digital preservation project which has to continue beyond the life of the project.  Both Universities represented in this project are very large organisations but this can make the issues faced by the team extremely complex and challenging.  They have been recording their experiences of trying to embed practices from the project so that digital preservation can become part of a sustainable programme.

The first case study came from Jen Mitcham from York University talking about the digital preservation work they have undertaken their.  Jen has documented her activities very helpfully and consistently on her blog and she talked specifically about the amount of planning which needs to go into work and then the very real difficulties in implementation.  She has most recently been looking at digital preservation for research data – something we are working on here at Lancaster University.

Next up was Louisa Matthews from the Archaeological Data Service who have been spearheading approaches to Digital Preservation for a very long time.  The act of excavating a site is by its nature destructive so it is vital to be able to capture a data about it accurately and be able to return to and reuse the data for the foreseeable future.  This captures digital preservation in a nutshell!  Louisa described how engaging with their contributors ensures high quality re-usable data – something we are all aiming for.

The final case study for the morning was Rebecca Short from the University of Westminster talking about digital preservation and records management.  The university have already had success implementing a digital preservation workflow and are now seeking to embed it further in the whole records creation and management process.  Rebecca described the very complex information environment at her university – relatively small in comparison to the earlier presentations but no less challenging for all that

The afternoon was a useful opportunity to hear from Jisc about their Research Data Shared Services project which we are a pilot for.  We heard presentations from Arkivum, Preservica and Artefactual Systems who are all vendors taking part in the project and gave interesting and useful perspectives on their approaches to digital preservation issues.  The overwhelming message however has to be – you can’t buy a product which will do digital preservation.  Different products and services can help you with it, but as William Kilbride, Executive Director of the Digital Preservation Coalition has so neatly put it “digital preservation is a human project” and we should be focussing on getting people to engage with the issues and for all of us to be doing digital preservation.

Rachel MacGregor

Jisc Research Data Shared Services March 2017

Here at Lancaster University we are very excited to be part of a group of pilot institutions taking part in Jisc’s Research data shared services project.  This aims to provide a flexible range of services which suit the varied needs of institutions in the HE sector help achieve policy compliance for deposit, publication, discovery, storage and long term preservation of research data. It’s an ambitious project but one that there is an undoubted need for and we are trying to work with Jisc to help them achieve this goal.

Last week we were invited down to Jisc London HQ to learn about the progress of the project and – just as importantly – share our own thoughts and experiences on the process.

Waterloo Sunset (author’s own, CC-BY)

Daniela Duca has written a comprehensive overview of the meeting and the way forward for Jisc from the meeting.

Our table represented a microcosm of the project: Cambridge University (large institution), ourselves at Lancaster (medium) and the Royal College of Music (small).  We all have extremely different needs and resources and how one institution tackles a problem will not work at another.  However we have a common purpose in supporting our academics and students in their research, ensuring compliance with funders and enabling our institutions to support first class research outputs to share with the wider world.

We had been asked to do some preparatory work around costing models for the meeting – I think it would be fair to say we all found this challenging – probably because it is!  My previous knowledge of costings comes from having looked at the excellent Curation Costs Exchange which is an excellent staring point for anyone considering approaching the very difficult task of costing curation services.

My main interest in the day lay in the preservation aspects of the project especially in exploring wider use cases.  It’s clear that many institutions have a number of digital preservation scenarios for which the Shared Service solution might also be applicable.  What is also clear is that there are so many possible use cases that it would be very easy to accidentally create a whole new project without even trying!  I think it’s fair to say that all of us in the room – whether we are actively involved in digital preservation or not – are very interested in this part of the project.  There is no sense in Jisc replicating work which has already been done elsewhere or is being developed by other parties so it presents an ideal opportunity for collaborative working and building on the strengths of the existing digital preservation community.

Overall there was much food for thought and I look forward to the next development in the shared services project.

Researchers: what do they really think?

Image: Flickr – Rul Fernandes CC BY 2.0

Well… it’s probably quite hard to get to the truth of the matter but here at Lancaster we are trying to find out what researchers really think.  This is crucial for developing and improving our services and vital for delivering the service our researchers want.

We are one of the organisations taking part in the JISC RDM Shared Services pilot and you can read their take on the work being done here.  With JISC’s help we undertook a researcher survey to find out a bit more about the kinds of research data which were being produced, how the data were (or weren’t) being managed and researcher attitudes towards their data.

Researchers were asked about the types of data which were generated from their research.  The results were quite interesting to us.  Unsurprisingly perhaps far and away the most popular “type” of data were “document or report” followed with a bit of a gap by spreadsheets.  Structured text files (eg xml, json etc) came a lot lower down the list as did databases.

Lancaster Researchers’ responses to JISC DAF Survey

What interested us was comparing the kinds of files which researchers said they created during the research process with the kinds of files which were actually being deposited with us as research outputs.  Obviously comparisons are problematic not least because our researchers were being asked about the data generated as part of their research activities rather than specifically those which were ultimately selected for permanent preservation.  We also know that we only get a small proportion of the research data which are being created within the university and the respondents may include people who have not deposited data with us. Having analysed the research datasets which we have already we can see that a huge percentage were structured or unstructured text files and a much smaller proportion were spreadsheets or Word documents.

Analysis of file formats undertaken at Lancaster University

Is it that our researchers have a false sense of the kinds of data which they are creating and using or is it that we as data curators have a poor understanding of the researcher community?  I suspect that it is a bit of both but as data curators it is our duty to both have a good understanding of the data environment and also to be able to communicate to our research community.  This is something we need to address as part of improving our advocacy and engagement strategies.

Another question which was asked was was about sharing data and this got answers which did surprise us.  The majority said that they did already share data and very few said they were not willing to share.  For the ones who did not share data it was mostly because it was sensitive or confidential data or they did not have permission to share it.  Of those who did share data the majority said it was for “the potential for others to re-use data” and because “research is a public good and should be open to all”.  An encouraging third of those questioned said they had re-used someone else’s data.

Results of JISC DAF survey for Lancaster University

Of course we know that the people who did answer our survey represent those who are in some way already engaged with the RDM process.  We also know that people are likely to give the answers they want us to hear!  But if people are serious about being willing and able to share we really want to support them in this.

So we’ve decided to try and get talking to our researchers – and for them to talk to each other – by setting up a series of Data Conversations – events where researchers can discuss creation and dissemination of data to try and encourage a climate of sharing and valuing the data.  It means we can hope for data that is well curated from the start of its life and that will be selected for deposit appropriately and with good metadata.

Better communication and advocacy will help us in the long run to preserve and share high quality relevant data which can be shared and reused.  Managing (research) data and long term preservation of digital data are collaborative activities and the more we understand and share the better we will be at achieving these goals.

Rachel MacGregor, Digital Archivist

RDMF16 – Creating a Research Data Community


 Creating a Research Data Community

Are research institutions engaging their researchers with Research Data Management (RDM)? And if so, how are they doing it? In this post Hardy Schwamm (@hardyschwamm),  Research Data Manager, Lancaster University, and Rosie Higman (@RosieHLib), Research Data Advisor, University of Cambridge, and explore the work they are doing in their respective institutions.

Whilst funder policies were the initial catalyst for many RDM services at UK universities there are many reasons to engage with RDM, from increased impact to moving towards Open Research as the new normal. And a growing number of researchers are keen to get involved! These reasons also highlight the need for a democratic, researcher-led approach if the behavioural change necessary for RDM is to be achieved. Following initial discussions online and at the Research Data Network event in Cambridge on 6 September, we wanted to find out whether and how others are engaging researchers beyond iterating funder policies.

At both Cambridge and Lancaster we are starting initiatives focused on this, respectively Data Champions and Data Conversations. The Data Champions at Cambridge will act as local experts in RDM, advocating at a departmental level and helping the RDM team to communicate across a fragmented institution. We also hope they will form a community of practice, sharing their expertise in areas such as big data and software preservation. The Lancaster University Data Conversations will provide a forum to researchers from all disciplines to share their data experiences and knowledge. The first event will be on 30 January 2017.

Having presented our respective plans to the RDM Forum (RDMF16) in Edinburgh on 22nd November we ran breakout sessions where small groups discussed the approaches our and other universities were taking, the results summarised below highlighting different forms that engagement with researchers will take.


Discussing RDM Community
RDMF16 Working Group discussing RDM Communities

Targeting our training

RDM workshops seem to be the most common way research data teams are engaging with researchers, typically targeting postgraduate research students and postdoctoral researchers. A recurrent theme was the need to target workshops for specific disciplinary groups, including several workshops run jointly between institutions where this meant it was possible to get sufficient participants for smaller disciplines. Alongside targeting disciplines some have found inviting academics who have experience of sharing their data to speak at workshops greatly increases engagement.

As well as focusing workshops so they are directly applicable to particular disciplines, several institutions have had success in linking their workshop to a particular tangible output, recognising that researchers are busy and are not interested in a general introduction. Examples of this include workshops around Data Management Plans, and embedding RDM into teaching students how to use databases.

An issue many institutions are having is getting the timing right for their workshops: too early and research students won’t have any data to manage or even be thinking about it; too late and students may have got into bad data management habits. Finding the goldilocks time which is ‘just right’ can be tricky. Two solutions to this problem were proposed: having short online training available before a more in-depth training later on, and having a 1 hour session as part of an induction followed by a 2 hour session 9-18 months into the PhD.

Tailored support

Alongside workshops, the most popular way to get researchers interested in RDM was through individual appointments, so that the conversation can be tailored to their needs, although this obviously presents a problem of scalability when most institutions only have one individual staff member dedicated to RDM.

IMG_20161122_121401There are two solutions to this problem which were mentioned during the breakout session. Firstly, some people are using a ‘train the trainer’ approach to involve other research support staff who are based in departments and already have regular contact with researchers. These people can act as intermediaries and are likely to have a good awareness of the discipline-specific issues which the researchers they support will be interested in.

The other option discussed was holding drop-in sessions within departments, where researchers know the RDM team will be on a regular basis. These have had mixed success at many institutions but seem to work better when paired with a more established service such as the Open Access or Impact team.

What RDM services should we offer?

We started the discussion at the RDM Forum thinking about extending our services beyond sheer compliance in order to create an “RDM community” where data management is part of good research practice and contributes to the Open Research agenda. This is the thinking behind the new initiatives at Cambridge and Lancaster.

However, there were also some critical or sceptical voices at our RDMF16 discussions. How can we promote an RDM community when we struggle to persuade researchers being compliant with institutional and funder policies? All RDM support teams are small and have many other tasks aside from advocacy and training. Some expressed concern that they lack the skills to market our services beyond the traditional methods used by libraries. We need to address and consider these concerns about capacity and skill sets as we attempt to engage researchers beyond compliance.

RDMF16 at work
RDMF16 at work


It is clear from our discussions that there is a wide variety of RDM-related activities at UK universities which stretch beyond enforcing compliance, but engaging large numbers of researchers is an ongoing concern. We also realised that many RDM professionals are not very good at practising what we preach and sharing our materials, so it’s worth highlighting that training materials can be shared on the RDM training community on Zenodo as long as they have an open license.

Many thanks to the participants at our breakout session at the RDMForum 16, and Angus Whyte for taking notes which allowed us to write this piece. You can follow previous discussions on this topic on Gitter.

Published on 30 November
Written by Rosie Higman and Hardy Schwamm
Creative Commons License