Data Interview with David Ellis (Part 1)

Part 1 (of two) of a Data Interview with Dr David Ellis (@davidaellis). David is a Lecturer in Computational Social Science and holds a 50th Anniversary Lectureship in Psychology at Lancaster University. David presented at the first Data Conversations on Data Visualization.

This is the first interview of hopefully a series to come about the impact of Open Data on research. The interview was conducted by Hardy Schwamm.

Q: We define Open Data as data that can be freely used, shared and built-on by anyone, anywhere, for any purpose. Open Data is also a way to remove legal and technical barriers to using digital information.  Does that go with your idea of what Open Data is?

David: Yes, I think so. I might add to that: the data is actually useful and fit for purpose. To me it’s one thing to just uploading all that data, make it available. But a lot of time, how useful that is on its own is not quite clear. As a psychologist you can run an experiment and you have a lot of data coming out of a study. You can just dump that data online but is there enough information there for other scientists to use that data and get the results?

Q: So would you say that the usefulness of data depends on what we as librarians call metadata, data about the data?

David: Yes, exactly. The definition you gave earlier is spot on. I would just add you need to make sure it is useful to other people. That might also depend on the audience but there are lots of datasets that people post for papers that are just the raw data. That is useful but to understand how they get from the raw data to the conclusions is an important step. There isn’t always space in publications to make that clear.

Q: My next question you have probably already answered already. What is your interest in Open Data? Do you support it as a principle or because it is useful for your research?

David: I do support it as a matter of principle! I always find it weird, even as a student, that you could have papers published and it was just a “Take our word for it” process. I still find that weird now. So absolutely, I support it as a matter of principle. I think as a scientist it just seems right. The data is the cornerstone of every publication. So if that is not there it seems like a massive omission, unless there is a reason for it not to be there. There are lots of mainstream psychology journals that don’t have any policy on data.

Q: That leads me to my next question: To what extent do researchers in your field Psychology support or embrace a culture Open Data?

David: Psychology does have a culture of it and it is probably growing. I think it is inevitable that this is going to become the standard practice if you look at the way Open Access publishing is going.

Q: Why do you think this is happening?

David: Because I think what is eventually happening is that journals are going to say… Lots of people who are doing it but it is like everything else, particularly if that data is going to be usable it does require a bit more effort on the author’s part to make sure that things are organised and that they have a Data Management Plan. I am not suggesting that lots of people don’t have Data Management Plans but it’s something that if you look at current problems in Social Psychology really that wasn’t being followed. There have been leaks and there have been other problems.

So if I tell you the story last week from a 3rd year student at Glasgow University had spotted errors in a published paper and it was actually errors in the Degrees of freedom. They didn’t need the raw data but the point is that a lot of that could have been sorted if the raw data had been made available. There are lots of little issues that keep coming up.

There is nevertheless still resistance and there are plenty of journals where there really is no policy, certainly the journals for which I review for. At the end, there is no data provided, I don’t know what the policy is. It would be nice if in the future authors could upload raw data but that depends on the journal’s policy and if the journal has a policy.

Q: Where should the push for Open Data come from? From journals, funders or the science community?

David: I think from all! If peer reviewers started asking for data, which I think more are, and I think if more scientists start uploading data as supplementary material as a matter of course then I think journals will start to do that. I guess the other option is that journals will start to be favoured that do provide additional resources. So particularly given how much money places like Elsevier make, what do they actually offer? If they want to sell themselves they could offer lots of things but they don’t seem to be pushing it.

And I appreciate it is very discipline specific, and that came up after my talk at the Data Conversations [on 30 January 2017] some disciplines don’t share data. It has improved massively since I started as a postgrad student. Then it just wasn’t a thing and it has slowly become more of an issue.

Q: Do you think this has to do with skills and knowledge of researchers and PhD students? Do they know how to prepare and share data? Do they know how to use other researchers’ data? Is there something missing?

David: A lot of psychologists are in a kind of hybrid area. They are obviously not statisticians and I do wonder if there is a bit of a concern because what if I upload everything, what if somebody finds a mistake? My view is always: I’d rather know that there is a mistake. But I do wonder if people are sometimes sceptical about. Not because they’ve got anything to hide but because they are not a 100 per cent sure sometimes. They understand the result and they know what the numbers mean but we are not mathematicians.

I am just curious that given the numbers of statistical mistakes being flagged up in psychology papers… I am sure I made mistakes myself. I’d just rather know about them. And having the data there means someone can check if they really want to. My view is that I am quite flattered if someone that bothered to go and re-run my analysis. They are obviously reading it!

The interview with David will be continued in Part 2 which you can find here.

Jisc Research Data Shared Services March 2017

Here at Lancaster University we are very excited to be part of a group of pilot institutions taking part in Jisc’s Research data shared services project.  This aims to provide a flexible range of services which suit the varied needs of institutions in the HE sector help achieve policy compliance for deposit, publication, discovery, storage and long term preservation of research data. It’s an ambitious project but one that there is an undoubted need for and we are trying to work with Jisc to help them achieve this goal.

Last week we were invited down to Jisc London HQ to learn about the progress of the project and – just as importantly – share our own thoughts and experiences on the process.

Waterloo Sunset (author’s own, CC-BY)

Daniela Duca has written a comprehensive overview of the meeting and the way forward for Jisc from the meeting.

Our table represented a microcosm of the project: Cambridge University (large institution), ourselves at Lancaster (medium) and the Royal College of Music (small).  We all have extremely different needs and resources and how one institution tackles a problem will not work at another.  However we have a common purpose in supporting our academics and students in their research, ensuring compliance with funders and enabling our institutions to support first class research outputs to share with the wider world.

We had been asked to do some preparatory work around costing models for the meeting – I think it would be fair to say we all found this challenging – probably because it is!  My previous knowledge of costings comes from having looked at the excellent Curation Costs Exchange which is an excellent staring point for anyone considering approaching the very difficult task of costing curation services.

My main interest in the day lay in the preservation aspects of the project especially in exploring wider use cases.  It’s clear that many institutions have a number of digital preservation scenarios for which the Shared Service solution might also be applicable.  What is also clear is that there are so many possible use cases that it would be very easy to accidentally create a whole new project without even trying!  I think it’s fair to say that all of us in the room – whether we are actively involved in digital preservation or not – are very interested in this part of the project.  There is no sense in Jisc replicating work which has already been done elsewhere or is being developed by other parties so it presents an ideal opportunity for collaborative working and building on the strengths of the existing digital preservation community.

Overall there was much food for thought and I look forward to the next development in the shared services project.

Impressions from IDCC17 in Edinburgh (12th International Digital Curation Conference)

The below is a very quick summary of things that I found interesting, remarkable or funny at IDCC17. But before I start, a big thank you to Kevin Ashley and his team for organising such an interesting event with a varied programme! And thanks for all the conference pictures on Flickr!

Surgeons Hall Edinburgh, IDCC17 venue

Monday, 20 February (Workshops)

Actually a nice idea to have the conference proper sandwiched between two days of workshops which gives attendees the chance to be quite flexible with their time commitment (you need to visit Edinburgh as well while you’re there)! The location was the Surgeons’ Hall which is conveniently located for attractions in the Old Town of Edinburgh.

I went to the “Technical Appraisal of Complex Digital Objects in Evolving Environments” workshop run by the PERICLES project. PERICLES is a four-year Project  funded by the European Union which will be finished in March 2017.

Simon Waddington (King’s College)

The project has the ambitious aim not just to preserve data files, but also the surrounding environment including software, and associated hardware requirements. I enjoyed the discussions about authenticity of objects (how much can you change or convert before you “lose” the original) and identification of videos. But I have to admit that the demo of the Ecosystem tool (using a complex ontology) was a bit too much for my limited understanding. But sometimes it is good to see your limits, so thanks to the presenters from King’s College and the University of Göttingen.

Monday finished in style with a drinks reception in the wonderful Playfair Library!

Drinks reception in Playfair Library

Tuesday, 21 February (Conference)

Tuesday started with the keynote “A Process View of Missing Data” from Maria Wolters who is a Reader in Design Informatics at School of Informatics at the University of Edinburgh.

Maria’s point is that Missing Data can improve overall data quality if we understand why data are missing!

Next up was a Parallel Session on “Curation Communities”. Marta Teperek and Rosie Higman reported on a topic that is close to my heart: engaging researchers in RDM and creating an RDM Community. The challenge our colleagues at Cambridge have is that the University “is a maze” with 150 Departments! Marta reported that the RDM approach in the past was led by the “stick approach” (e.g. pointing out compliance with data policies). This clearly has its limitations (which we also experience at Lancaster University). Instead, the support team in Cambridge is working on a more “democratic” and researcher-led process.

Marta contemplating Democracy

In the same spirit are Cambridge’s Data Champions who “are local experts on research data management and sharing who can provide advice and training within their departments.” Rosie organises training for the Data Champions so that they can in return train their peers in RDM. A great idea and I am curious to hear about the success. This is similar to the idea of Lancaster Data Conversations but more ambitious.

Rosie presenting the idea of Data champions

In the afternoon I went to the Parallel Session on Sensitive Data. Debra Hiom from the University of Bristol who gave a really interesting presentation on safe access (presentation available for download here as .pptx). Debra reported that Bristol have agreed on four standard data access level (Open, Restricted, Controlled and Closed) and have tasked an Expert Advisory Group on Data Access with handling the more sensitive cases.

Bristol data access levels

Tuesday finished with a very enjoyable Conference Dinner in The Caves which felt a bit like dining in underground club (which is exactly what The Caves are often used for).

IDCC17 Dinner at The Caves

Wednesday, 22 February

Wednesday offered more parallel sessions. I became a bit nostalgic at the talk of Alex Ball (Bath University) “Choose your own research data management guidance”. Alex and colleagues from GW4 universities are developing RDM guidance using interactive fiction software Squiffy. This is a very interesting take on RDM guidance which of course reminds of playing interactive games like The Hobbit back in the days. Really curious to see a demo hopefully soon!

Alex Ball, Bath University

Food for thought came from Jez Cope (Sheffield University) who advertised Library Carpentry (slides), a software skills training for library professionals. We have been thinking about digital skills here at Lancaster University, so a programme like Library Carpentry is very timely. Jez’ talk explained the concept of the training and we might well take part soon, so thanks for that.

Thursday, 23 February

Finally, on Thursday I participated in the workshop “Essentials 4 Data Support, the Train the Trainer”, delivered by Ellen Verbakel  (4TU.Centre for Research Data) and Marjan Grootveld (DANS). Ellen and Marjan presented the thinking behind their course (freely available here) which is a combination of face-to-face training with online modules and assignments. The training is aimed at “data supporters” (librarians, IT staff and researchers with duties involving data management).

Workshop participants

We did a number of exercises including mapping RDM stakeholders and the review of Data Management Plans.

RDM stakeholder map

It was very interesting exchanging views and experiences with international colleagues to how different legal frameworks, cultures and policies inform our work.

Then, finally IDCC was over and attendees faced storm Doris on their way home. Thanks, DCC team for an engaging, intersting and fun event!

First Data Conversations 30 January 2017 – Summary of event & slides

The first Data Conversations happened on Monday, 31st of January 2017. Below is a quick overview of the action. You can find slides of four talks below.

Data Conversations Opening

Adrian Friday opening Data Conversations

The event was opened by Professor Adrian Friday from the Data Science Institute (DSI) who emphasised that the DSI is all about collaboration between disciplines which is also the spirit of Data Conversations. In fact the 25 attendees came from  a range of Departments: Biological and Life Sciences, Chemistry, Computing, Educational Research, History, Law, Lancaster Environment Centre, Politics, Psychology and others.

Data Conversations Talks

Unfortunately, Dr Chris Jewell from the Medical School had to cancel his talk. You can see an overview of the agenda below.

Leif Isaksen – Does Linked Data Have to be Open?

Leif Isaksen from the History Department (Leif is also involved in the Data Science Institute) presented the Pelagios Commons project which provides online resources for using open data methods to link and explore historical places.

Leif Isaksen

Leif stressed that linking data is a social process which is built on open partnerships.

You can see Leif’s presentation below:

Jude Towers – Is Violent Crime Increasing or Decreasing?

Dr Jude Towers from Lancaster’s Sociology Department discussed crime rates, especially the rate of domestic violence over time through the Crime Survey for England and Wales. A current ESRC project is looking at how changing survey methodologies alter the underlying data of crime statistics.

Alison Scott-Baumann – Protecting participants and their data on a sensitive topic

Next up was Alison Scott-Baumann who is a Professor of Society and Belief in the Centre of Islamic Studies in the Near and Middle East Department at SOAS. Alison is the Project lead on (Re)presenting Islam on Campus. Lancaster is a project partner and Dr Shuruq Naguib added to Alison’s presentation.

Alison Scott Baumann

Alison and Shuruq explained how difficult it is to get the balance right between confidentiality and data security required to manage often highly sensitive data, and to meet the expectations of data sharing. They stressed how much effort they spend on explaining the terms of the consent forms to project participants.

David Ellis – Building interactive data visualisations to support publications

Dr David Ellis showed the audience an example of dynamic data visualisation using a dataset he published on Lancaster University’s Research Registry. (http://dx.doi.org/10.17635/lancaster/researchdata/58). David explained how he used the R package Shiny Apps to achieve this.

David explained that the visualisation helps not only other researchers but also enables the interested public to query his data. One example was interest from journalists into his research into predicting smartphone operating system from personality and individual differences.

Chris Donaldson & James Butler – Mining and mapping places with multiple names

Finally, Dr Christopher Donaldson and Dr James Butler talked about their research using a 1.5 million word corpus of Lake District 18th and 19th century literature. Christopher and James use the Edinburgh Geoparser System to automatically recognise place names in text and disambiguate them with respect to a gazetteer.

James demonstrated how he can deal with name variations (secondary names), it is a lot of work. For example, the lake “Coniston” appears in the corpus as:  Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston, Conis- ton, Conyngs Tun, Conyngeston, Thorstane’s watter, Turstinus.

Chris Donaldson
James Butler

Feedback so far

The feedback from attendees and presenters so far so far is encouraging.

Enjoyed the presentations. I hope these data conversations will become a nice community for those interested in data. Relaxed and nicely themed but not too prescribed. The venue was good and the cakes and biscuits were very good!

We got some comments on the length of the presentations and question time.

Really enjoyable – perhaps a bit more time for each speaker / questions and discussions.

We will look into amending the format. We do like to keep a balance between time for data stories and discussions and giving a number of Lancaster researchers a forum to talk about their experiences. Thanks for the comments and suggestions so far!

Upcoming: 2nd Data Conversations 4th of May

We hope to report on some of the data presentations in more detail in future blog posts. Meanwhile, we are already preparing for the next Data Conversations event on 4th of May (1.45-4 pm). The theme of the event will be “Data Security and Confidentiality”, and registrations are open: http://bit.ly/ludatacon2. Please come along and if you have any questions get in touch with the RDM Support Team: rdm@lancaster.ac.uk.

First Data Conversations – Speakers confirmed!

Data Conversations on 30 January “Sharing Data – Benefits and Boundaries

We are very excited about the first Data Conversations event at Lancaster University coming up on 30 January 2017, 1.45-4pm. There will be 6 short talks from academics talking about aspects of their research data. We can now publish the agenda.

Detailed agenda

13:45 Registration and Coffee  
14:00 Welcome to Data Conversations Nigel Davies

(Data Science Institute)

14:10 – 14:50 First round of short talks
14:10 – 14:20 1. Does Linked Data have to be Open? Reflections from the Pelagios Commons Leif Isaksen

(History)

14:25 – 14:35 2. The Politics of Counting: Is Violent Crime Increasing or Decreasing? Jude Towers

(Sociology)

14:40 – 14:50 3. Protecting participants and their data on a sensitive topic Alison Scott-Baumann

(PPR)

14:55 – 15:10 Tea and coffee
15:10 – 15:55 Second round of short talks  
15:10 – 15:20 4. Building interactive data visualizations to support publications David Ellis

(Psychology)

15:15 – 15:30 5. Efficient sharing of numerical output Chris Jewell

(Medical School / CHICAS)

15:35 – 15:45 6. Mining and mapping places with multiple names’ Chris Donaldson & James Butler (History)
15:55 Close Hardy Schwamm (Library)

We are very happy that we get speaker from a range of disciplines! We are looking forward to the first Data Conversations and will report on how it went. Watch this space!

 

Researchers: what do they really think?

Image: Flickr https://flic.kr/p/8WpM2U – Rul Fernandes CC BY 2.0

Well… it’s probably quite hard to get to the truth of the matter but here at Lancaster we are trying to find out what researchers really think.  This is crucial for developing and improving our services and vital for

We are one of the organisations taking part in the JISC RDM Shared Services pilot and you can read their take on the work being done here.  With JISC’s help we undertook a researcher survey to find out a bit more about the kinds of research data which were being produced, how the data were (or weren’t) being managed and researcher attitudes towards their data.

Researchers were asked about the types of data which were generated from their research.  The results were quite interesting to us.  Unsurprisingly perhaps far and away the most popular “type” of data were “document or report” followed with a bit of a gap by spreadsheets.  Structured text files (eg xml, json etc) came a lot lower down the list as did databases.

Lancaster Researchers’ responses to JISC DAF Survey

What interested us was comparing the kinds of files which researchers said they created during the research process with the kinds of files which were actually being deposited with us as research outputs.  Obviously comparisons are problematic not least because our researchers were being asked about the data generated as part of their research activities rather than specifically those which were ultimately selected for permanent preservation.  We also know that we only get a small proportion of the research data which are being created within the university and the respondents may include people who have not deposited data with us. Having analysed the research datasets which we have already we can see that a huge percentage were structured or unstructured text files and a much smaller proportion were spreadsheets or Word documents.

Analysis of file formats undertaken at Lancaster University

Is it that our researchers have a false sense of the kinds of data which they are creating and using or is it that we as data curators have a poor understanding of the researcher community?  I suspect that it is a bit of both but as data curators it is our duty to both have a good understanding of the data environment and also to be able to communicate to our research community.  This is something we need to address as part of improving our advocacy and engagement strategies.

Another question which was asked was was about sharing data and this got answers which did surprise us.  The majority said that they did already share data and very few said they were not willing to share.  For the ones who did not share data it was mostly because it was sensitive or confidential data or they did not have permission to share it.  Of those who did share data the majority said it was for “the potential for others to re-use data” and because “research is a public good and should be open to all”.  An encouraging third of those questioned said they had re-used someone else’s data.

Results of JISC DAF survey for Lancaster University

Of course we know that the people who did answer our survey represent those who are in some way already engaged with the RDM process.  We also know that people are likely to give the answers they want us to hear!  But if people are serious about being willing and able to share we really want to support them in this.

So we’ve decided to try and get talking to our researchers – and for them to talk to each other – by setting up a series of Data Conversations – events where researchers can discuss creation and dissemination of data to try and encourage a climate of sharing and valuing the data.  It means we can hope for data that is well curated from the start of its life and that will be selected for deposit appropriately and with good metadata.

Better communication and advocacy will help us in the long run to preserve and share high quality relevant data which can be shared and reused.  Managing (research) data and long term preservation of digital data are collaborative activities and the more we understand and share the better we will be at achieving these goals.

Rachel MacGregor, Digital Archivist

Acting On Change: Pericles/DPC Conference and DPA Awards London 2016

DPA Awards 2016 nominees and judges (Image @SueCorrigall licence OGL)
DPA Awards 2016 nominees and judges (Image @SueCorrigall licence OGL)

Last week I had the pleasure of attending the Pericles/DPC Conference: Acting on Change at the Wellcome Institute in London.  The theme of the conference was moving forward with digital preservation; in other words taking steps beyond just the technical tools and looking outward instead of inward.  There were excellent keynotes and panel sessions and useful and thought-provoking workshops.  PERICLES (Promoting and Enhancing Reuse of Information through the Content Lifecycle) is a EU funded four year project which seeks to address the issues of managing digital preservation in an ever changing world.

Kara Van Lassen of AVPreserve set the tone brilliantly with her inspiring keynote “Seeing the forest for the trees – looking outside the OAIS model” which focused mainly on moving away from what she called the “boutique approach” to digital preservation and towards developing a broader ecosystem of integrated automated services.  She touched on some of the difficulties in getting funding for what she calls “maintenance” (which after all is what digital preservation often is) as opposed to “cool new stuff” and recommended some listening and reading on the subject, such as the podcast “In Praise of Maintenance” and the article “Hail The Maintainers”. She concluded that what was required was a culture of change in the way people and organisations work so that digital preservation “just happens” and no one notices.  This was echoed many times during the course of the conference and is definitely something we have been thinking about in terms of the way we are developing our research services here at Lancaster University.

periconf1 periconf2
Barbara Sierman’s OAIS illustrations

The panel session following again took its theme of “Beyond OAIS”.  Many of us welcomed the all female panel and were impressed by the range and depth of experience represented with Angela Dappert, Pip Laurenson, Barbara Reed and Barbara Sierman – a truly international panel. Barbara Sierman spoke firmly in favour of the much-maligned OAIS model saying it was a guide and a conversation piece (she also had wonderful slides which illustrated her point of OAIS not being a cage to limit us but a guide to let us fly freely (and tweet too presumably!). Barbara Reed brought a welcome archival slant to the discussion with a more critical view of OAIS which she felt had certain assumptions which did not fit well with archival theory. Pip Laurenson likened the journey towards OAIS as like William Blake’s illustration of Dante’s Purgatory and Angela Dappert explained that the only way of never being wrong is by doing nothing. The call overall was to be involved and a good way to start with OAIS is to look at and contribute to the conversations taking place on the Digital Preservation wiki.

periconf3
Image from Tate Galleries

In the afternoon we continued with the theme of working on the the maintenance and advocacy side of digital preservation. As Dan Gillean of Artefactual pointed out in his presentation two thirds of ISO 16363 is organisational rather than technical. Jen Mitcham explained to us that the key to working successfully was to work collaboratively and Angela Dappert wanted more encouragement for smaller organisations to take the first steps in preservation. Matthew Addis suggested that capability models could be a good way to start. Anna Henry finished off with describing some of the challenges of communication. The panel session gave us all both a lot to look at and a lot to think about. Wrapping up for the day we were asked – what is stopping us from making progress in digital preservation? The answer is money and confidence. The latter we can do something about…

DPA Awards (image by permission of the Digital Preservation Coalition)
DPA Awards (image by permission of the Digital Preservation Coalition)

And speaking of boosting confidence I was delighted to be invited to the Digital Preservation Awards Ceremony where the hard work and fantastic achievements of many individuals and projects were deservedly rewarded.  There’s a full list of the winners here although those who didn’t come away with an award were winners too, having achieved what we were all talking about – advocacy, innovation and sustainability.  Hopefully it will also give a boost of confidence to all those nominated as well as raise the profile of these projects with funders, more senior managers and those in a position to put additional resources into the ongoing maintenance of digital preservation programmes.

DPA Awards (image by permission of the Digital Preservation Coalition)
DPA Award winners (image by permission of the Digital Preservation Coalition)

The keynote on the second day was from Matthew Addis who continued with the theme of looking outwards and urged everyone to look for and actively seek opportunities to make a difference.  “Everyone benefits from the power of the many”.  A question from the audience was “How do we avoid getting shoved out of the way by the IT community?” and the answer is:  We work together!  We should collaborate and not compete and help move digital preservation upstream to where it’s an invisible part of everyday practice.

The second day’s panel was no less impressive than the first with a wide range of experiences and backgrounds represented and again a truly international gathering.  The panel were posed a series of questions posed by Natalie Harrower and in being asked for a “wish list” for 20 years time in digital preservation came up with a surprisingly varied set of responses.  Neil Beagrie wanted better metrics and better evidence for the impact of data losses.  However he also made a very popular suggestion which was a call for more one page summary documents which can be used as part of the advocacy process.  Nancy McGovern wanted the dash board of dash boards for her work (as did we all) and Jean-Yves Vion-Dury wanted a more sophisticated system of knowledge exchange. Natalie Milic-Frayling was brave about admitting the mistakes of past programmes and called for built in continuity in the design of tools.  George Papastefanatos echoed Matthew Addis with a call for the end of digital preservation as a separate “thing” but rather that it was integrated into the way everyone creates and uses data.

The session was an extremely lively one and I will definitely be returning to the recordings of it to capture some of the nuances of the debates.  In the afternoon I chose Pro-Active Data Management with Simon Waddington, George Papastefanotos and Tomasz Miksa.  Simon Waddington covered some approaches to data appraisal and compared technical and human appraisal decisions.  He highlighted the potential benefits of using modelling to help predict change and inform these decisions but had to admit that this was very expensive…  George Papastefanatos looked at preventative maintenance – again back to theme of the morning – and how we should be working towards robust and adaptable systems.  Tomasz Miksa took up the very hot topic of research reproducibilty and drew parallels with digital preservation techniques in how the environment in which the data is created becomes vital for understanding, preserving or reproducing it.  There followed a lively debate centering around the gap between theory and research and everyday practice and the need to be realistic when assessing what is achievable and possible.  Miksa made the very important point that people will choose the best tools available for them – these may well not be the most sustainable.

Me at the wonderful Wellcome Collection
Me thinking about writing a one page report at the wonderful Wellcome Collection

Wrapping up William Kilbride invited us all to be the agents of change and I for one have come away with some homework.  Developing and improving our advocacy work.  Producing short (!) reports to set out what we are trying to achieve and how and finally to continue to work collaboratively with others to avoid reinventing the wheel and to enable everyone to move forward in an ever changing world.

Rachel MacGregor, Digital Archivist

All images authors (CC-BY) unless otherwise credited.

RDMF16 – Creating a Research Data Community

 

 Creating a Research Data Community

Are research institutions engaging their researchers with Research Data Management (RDM)? And if so, how are they doing it? In this post Hardy Schwamm (@hardyschwamm),  Research Data Manager, Lancaster University, and Rosie Higman (@RosieHLib), Research Data Advisor, University of Cambridge, and explore the work they are doing in their respective institutions.

Whilst funder policies were the initial catalyst for many RDM services at UK universities there are many reasons to engage with RDM, from increased impact to moving towards Open Research as the new normal. And a growing number of researchers are keen to get involved! These reasons also highlight the need for a democratic, researcher-led approach if the behavioural change necessary for RDM is to be achieved. Following initial discussions online and at the Research Data Network event in Cambridge on 6 September, we wanted to find out whether and how others are engaging researchers beyond iterating funder policies.

At both Cambridge and Lancaster we are starting initiatives focused on this, respectively Data Champions and Data Conversations. The Data Champions at Cambridge will act as local experts in RDM, advocating at a departmental level and helping the RDM team to communicate across a fragmented institution. We also hope they will form a community of practice, sharing their expertise in areas such as big data and software preservation. The Lancaster University Data Conversations will provide a forum to researchers from all disciplines to share their data experiences and knowledge. The first event will be on 30 January 2017.

Having presented our respective plans to the RDM Forum (RDMF16) in Edinburgh on 22nd November we ran breakout sessions where small groups discussed the approaches our and other universities were taking, the results summarised below highlighting different forms that engagement with researchers will take.

 

Discussing RDM Community
RDMF16 Working Group discussing RDM Communities

Targeting our training

RDM workshops seem to be the most common way research data teams are engaging with researchers, typically targeting postgraduate research students and postdoctoral researchers. A recurrent theme was the need to target workshops for specific disciplinary groups, including several workshops run jointly between institutions where this meant it was possible to get sufficient participants for smaller disciplines. Alongside targeting disciplines some have found inviting academics who have experience of sharing their data to speak at workshops greatly increases engagement.

As well as focusing workshops so they are directly applicable to particular disciplines, several institutions have had success in linking their workshop to a particular tangible output, recognising that researchers are busy and are not interested in a general introduction. Examples of this include workshops around Data Management Plans, and embedding RDM into teaching students how to use databases.

An issue many institutions are having is getting the timing right for their workshops: too early and research students won’t have any data to manage or even be thinking about it; too late and students may have got into bad data management habits. Finding the goldilocks time which is ‘just right’ can be tricky. Two solutions to this problem were proposed: having short online training available before a more in-depth training later on, and having a 1 hour session as part of an induction followed by a 2 hour session 9-18 months into the PhD.

Tailored support

Alongside workshops, the most popular way to get researchers interested in RDM was through individual appointments, so that the conversation can be tailored to their needs, although this obviously presents a problem of scalability when most institutions only have one individual staff member dedicated to RDM.

IMG_20161122_121401There are two solutions to this problem which were mentioned during the breakout session. Firstly, some people are using a ‘train the trainer’ approach to involve other research support staff who are based in departments and already have regular contact with researchers. These people can act as intermediaries and are likely to have a good awareness of the discipline-specific issues which the researchers they support will be interested in.

The other option discussed was holding drop-in sessions within departments, where researchers know the RDM team will be on a regular basis. These have had mixed success at many institutions but seem to work better when paired with a more established service such as the Open Access or Impact team.

What RDM services should we offer?

We started the discussion at the RDM Forum thinking about extending our services beyond sheer compliance in order to create an “RDM community” where data management is part of good research practice and contributes to the Open Research agenda. This is the thinking behind the new initiatives at Cambridge and Lancaster.

However, there were also some critical or sceptical voices at our RDMF16 discussions. How can we promote an RDM community when we struggle to persuade researchers being compliant with institutional and funder policies? All RDM support teams are small and have many other tasks aside from advocacy and training. Some expressed concern that they lack the skills to market our services beyond the traditional methods used by libraries. We need to address and consider these concerns about capacity and skill sets as we attempt to engage researchers beyond compliance.

RDMF16 at work
RDMF16 at work

Summary

It is clear from our discussions that there is a wide variety of RDM-related activities at UK universities which stretch beyond enforcing compliance, but engaging large numbers of researchers is an ongoing concern. We also realised that many RDM professionals are not very good at practising what we preach and sharing our materials, so it’s worth highlighting that training materials can be shared on the RDM training community on Zenodo as long as they have an open license.

Many thanks to the participants at our breakout session at the RDMForum 16, and Angus Whyte for taking notes which allowed us to write this piece. You can follow previous discussions on this topic on Gitter.

Published on 30 November
Written by Rosie Higman and Hardy Schwamm
Creative Commons License