2nd Data Conversations 4 May 2017 – Data Security and Confidentiality

The 2nd Data Conversations had the theme of Data Security and Confidentiality. More than 20 Lancaster researcher attended. It was nice to start with a slice of pizza and a brew.

Always nice to start an event with food!

As at the 1st Data Conversations we had five lightning talks. You can see the agenda below.

You can find a short summary of the event, the slides and some photos below.

Denes Csala – The sensor cloud around us: collecting, mining and visualizing the energy and building management data of the campus

Dr Denes Csala is a newly appointed lecturer in Energy Storage Systems Dynamics with Energy Lancaster.

There are 30,000 sensors on campus capturing all sorts of data about energy and energy consumption.  This has the potential for us to understand a huge amount about the way energy is managed and used but at the same time throws up the issue of managing extremely sensitive commercial and personal data.  Access to the data is strictly controlled but Energy Lancaster are very excited about the possibilities of what could be done with the data.

You can see an animated visualization of the campus energy metering system sensor data here:

Kopo Ramokapane – Could computing: When is Deletion Deletion

Kopo Ramokapane is a PhD student in the School of Computing and Communications. Kopo gave an overview about the growing importance of “the cloud”. But do we also see the implications that cloud computing has on security and privacy of our data?

Kopo reported that when you delete data in the cloud there is no way to be sure that all copies or all versions have been deleted from the cloud provider. This issue isn’t new but doesn’t get as much attention as it should be.  Because of the way Cloud storage operates it is almost impossible even for the service providers to be certain that all the data has been deleted.  Avoid storing confidential data in the Cloud and learn more about how the systems work! Lancaster University has a contract with cloud service Box which ensures that compliance issues are dealt with in relation to storage of confidential or sensitive data.

Karen Broadhurst and Stuart Bedston – Better data for better justice: Towards data-driven analyses of Family Court policy and practice

Professor Karen Broadhurst and Stuart Bedston from the Sociology Department reported on concerns about transparency in family court-decision-making.  Greater transparency and “open data” would have a positive impact in many ways but is hard to achieve looking at the security requirements and potential risks.

Karen and Stu presenting on “Better data for better justice”

Karen and Stu highlighted the changes that would be needed in order to strengthen interdisciplinary research using controlled-data here at Lancaster University but also the difficulties that stand in the way.

John Couzins – Security Overview at Lancaster University

Next on was John Couzins, the IT Security Manager of Lancaster University. John who works for the institutional IT service ISS reported on the certifications that are necessary to fulfil requirements of certain providers of confidential data. Current examples are Cyber Essentials Plus and the IG Toolkit (Information Governance Toolkit) which is used by the NHS.

Mateusz Mikusz – Running Research as a Service. Implications for Privacy Policies and Ethics

Mateusz Mikusz is working on his PhD in the School of Computing and Communications. He is working on a project that develops pervasive displays where students can get personalised content on public screens on campus if they use an app or iLancaster.

The issue regarding the data is that is used for two purposes:

  • To make the app and its use cases work
  • To create research data of usage and other properties that can be analysed by the project team

Mateusz explained that he is working hard to bring both things together in an ethical way that still allows innovative research.

Mateusz presenting the project

It was a great showcase for a lot of fantastic research that is taking place at Lancaster University and the way in which handling sensitive data and tackling data security is at the forefront of this.  There were probably as many questions raised as there were answers given but it was a great opportunity to share approaches to handling data securely and ethically.

Want to know more?  Get in touch with the RDM team rdm@lancaster.ac.uk

3rd Data Conversation – 19th September 2017

Join us for our next Data Conversation on 19th September on Software as Data with a special guest speaker Neil Chue Hong from the Software Sustainability Institute.

Data Interview with David Ellis (Part 1)

Part 1 (of two) of a Data Interview with Dr David Ellis (@davidaellis). David is a Lecturer in Computational Social Science and holds a 50th Anniversary Lectureship in Psychology at Lancaster University. David presented at the first Data Conversations on Data Visualization.

This is the first interview of hopefully a series to come about the impact of Open Data on research. The interview was conducted by Hardy Schwamm.

Q: We define Open Data as data that can be freely used, shared and built-on by anyone, anywhere, for any purpose. Open Data is also a way to remove legal and technical barriers to using digital information.  Does that go with your idea of what Open Data is?

David: Yes, I think so. I might add to that: the data is actually useful and fit for purpose. To me it’s one thing to just uploading all that data, make it available. But a lot of time, how useful that is on its own is not quite clear. As a psychologist you can run an experiment and you have a lot of data coming out of a study. You can just dump that data online but is there enough information there for other scientists to use that data and get the results?

Q: So would you say that the usefulness of data depends on what we as librarians call metadata, data about the data?

David: Yes, exactly. The definition you gave earlier is spot on. I would just add you need to make sure it is useful to other people. That might also depend on the audience but there are lots of datasets that people post for papers that are just the raw data. That is useful but to understand how they get from the raw data to the conclusions is an important step. There isn’t always space in publications to make that clear.

Q: My next question you have probably already answered already. What is your interest in Open Data? Do you support it as a principle or because it is useful for your research?

David: I do support it as a matter of principle! I always find it weird, even as a student, that you could have papers published and it was just a “Take our word for it” process. I still find that weird now. So absolutely, I support it as a matter of principle. I think as a scientist it just seems right. The data is the cornerstone of every publication. So if that is not there it seems like a massive omission, unless there is a reason for it not to be there. There are lots of mainstream psychology journals that don’t have any policy on data.

Q: That leads me to my next question: To what extent do researchers in your field Psychology support or embrace a culture Open Data?

David: Psychology does have a culture of it and it is probably growing. I think it is inevitable that this is going to become the standard practice if you look at the way Open Access publishing is going.

Q: Why do you think this is happening?

David: Because I think what is eventually happening is that journals are going to say… Lots of people who are doing it but it is like everything else, particularly if that data is going to be usable it does require a bit more effort on the author’s part to make sure that things are organised and that they have a Data Management Plan. I am not suggesting that lots of people don’t have Data Management Plans but it’s something that if you look at current problems in Social Psychology really that wasn’t being followed. There have been leaks and there have been other problems.

So if I tell you the story last week from a 3rd year student at Glasgow University had spotted errors in a published paper and it was actually errors in the Degrees of freedom. They didn’t need the raw data but the point is that a lot of that could have been sorted if the raw data had been made available. There are lots of little issues that keep coming up.

There is nevertheless still resistance and there are plenty of journals where there really is no policy, certainly the journals for which I review for. At the end, there is no data provided, I don’t know what the policy is. It would be nice if in the future authors could upload raw data but that depends on the journal’s policy and if the journal has a policy.

Q: Where should the push for Open Data come from? From journals, funders or the science community?

David: I think from all! If peer reviewers started asking for data, which I think more are, and I think if more scientists start uploading data as supplementary material as a matter of course then I think journals will start to do that. I guess the other option is that journals will start to be favoured that do provide additional resources. So particularly given how much money places like Elsevier make, what do they actually offer? If they want to sell themselves they could offer lots of things but they don’t seem to be pushing it.

And I appreciate it is very discipline specific, and that came up after my talk at the Data Conversations [on 30 January 2017] some disciplines don’t share data. It has improved massively since I started as a postgrad student. Then it just wasn’t a thing and it has slowly become more of an issue.

Q: Do you think this has to do with skills and knowledge of researchers and PhD students? Do they know how to prepare and share data? Do they know how to use other researchers’ data? Is there something missing?

David: A lot of psychologists are in a kind of hybrid area. They are obviously not statisticians and I do wonder if there is a bit of a concern because what if I upload everything, what if somebody finds a mistake? My view is always: I’d rather know that there is a mistake. But I do wonder if people are sometimes sceptical about. Not because they’ve got anything to hide but because they are not a 100 per cent sure sometimes. They understand the result and they know what the numbers mean but we are not mathematicians.

I am just curious that given the numbers of statistical mistakes being flagged up in psychology papers… I am sure I made mistakes myself. I’d just rather know about them. And having the data there means someone can check if they really want to. My view is that I am quite flattered if someone that bothered to go and re-run my analysis. They are obviously reading it!

The interview with David will be continued in Part 2 which you can find here.