This is our second Data Interview. This time we were glad to have a chat with Dr Jo Knight.
Jo is a Reader within the CHICAS research group, Research Director in the Lancaster Medical School and theme lead for Health within Lancaster’s Data Science Institute. Jo has experience in developing new methods for analysing genetic data as well as experience in applying known techniques to a large variety of datasets.
Q: Jo, when you talked at our recent Data Management event about a “positive” data management story and a “negative” story there was a lot of interest in that, so we thought we could use this in our next Data Interview. Which story would you like to start with?
Jo: I think it would be good to start with a negative one so I can end on a positive note. And chronologically that is how it occurred.
So the negative story relates to an early time in my career. I had some genetic data on a number of individuals, about 120. I did some statistical analysis of the data. I noticed that some of the patterns that I had in my analysis seemed unusual. They weren’t characteristic of the type of patterns you would expect given that the individuals in this sample were supposed to be siblings. I didn’t have enough genetic information to establish their relationships completely but I did have enough to see that overall patterns didn’t look how I expected them to.
I took the data to someone more experienced and said: “There is something wrong with the patterns here”, and he said “Yep, there is definitely something wrong. Those individuals clearly aren’t related to each other.”
At that time, given the technologies that were available, we couldn’t just get more data to determine the relationships. We had to throw all of that data away!
It was essentially because the data and the samples had not been linked and managed. At some point between labelling the samples, entering the labels into a database and recording the relationships and rest of the information about the individuals something had gone wrong. So the data management had gone wrong and these samples were now completely useless. As well loss of my time we couldn’t use these samples for any other work either. They no longer had the data provenance.
Q: Can you quantify how much time you invested in that project?
Jo: It’s hard to remember but for me it would have been months of work to interrogate the samples! It would also have cost a fair amount in reagents. And for the person that collected the data probably up to a year’s work getting all the DNA samples from the individuals. Furthermore those individuals had given samples for medical research that was not been able to be undertaken.
Q: That is a rather sad story.
Jo: Yes, it is.
Q: Now the positive story. What happened?
Jo: I’m involved in a Consortium now, the Psychiatric Genomics Consortium, and in this Consortium over 800 researchers from 38 countries have come together and worked really very hard through ethical approvals, data procedures, data collection and data pooling in order to collate samples.
And they have been able to collect data that is now published, actually a couple of years ago in 2014, on more than 35,000 schizophrenic cases and even more control samples than that. And through the good and appropriate management of data it has meant that we were able to identify 108 genetic risk loci for schizophrenia. It has enabled us to move the field forward in terms of beginning to understand the genetic contribution to schizophrenia.
For a long time we knew that schizophrenia has a genetic component but we were unable to pinpoint very many of the risk variants at all, and this study was a real landmark in identifying a large number of the risk variants involved in the disorder. Lots more work needs to be done! What is really exciting about the Consortium is that the original paper is just the tip of the iceberg. That was the paper where the first analysis was done but the data is now held and managed in a manner that researchers who work in psychiatric genetics are able to access that data, analyse that data and answer lots of different questions about the genetic predisposition to schizophrenia.
The Psychiatric Genomics Consortium holds data on lots of other disorders as well. Basically, the appropriate management of that data means we are able to learn a lot more about diseases than we would have if people hadn’t got together and as a large group effectively managed the data.
Q: What is the key step in doing this?
Jo: It’s a willingness to share data and to see the bigger scientific question that can be answered if you share the data, and not just try to hold onto it and answer your own smaller questions. It is a willingness to put considerable amounts of time into data management. So there are lots of people including myself that have informal unpaid roles in managing that data to make it accessible.
Q: What can we as an institution do to encourage that willingness to share data?
Jo: I think Lancaster University as an institution has a very strong positive view of collaborative research across the Faculties and beyond the University. And that’s the kind of thing that does encourage people to share data and be involved in these projects. I think that is something we need to continue to pursue. And also the support systems that we have in place, the people and systems that help us to deposit data and make it available.
Thanks very much for the interview Jo!
You can find out more about Jo and her research here. The full reference of the article on schizophrenia mentioned by Jo is:
“Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci.” (2014) Nature. 24 July. 511 (7510): 421-7. doi:10.1038/nature13595
Data Interview by Hardy Schwamm (@HardySchwamm), 3 May 2017.