{"id":651,"date":"2018-05-31T15:47:35","date_gmt":"2018-05-31T14:47:35","guid":{"rendered":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/?p=651"},"modified":"2018-09-18T11:09:07","modified_gmt":"2018-09-18T10:09:07","slug":"data-interview-on-messy-data","status":"publish","type":"post","link":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2018\/05\/31\/data-interview-on-messy-data\/","title":{"rendered":"Data Interview on &#8220;Messy Data&#8221;"},"content":{"rendered":"<p>Our latest Data Interview features our two <a href=\"https:\/\/researchdata.jiscinvolve.org\/wp\/tag\/research-data-champions\/\">Jisc sponsored Data Champions<\/a>, <a href=\"http:\/\/www.lancaster.ac.uk\/sociology\/about-us\/people\/jude-towers#projects\">Dr Jude Towers<\/a> and <a href=\"http:\/\/www.research.lancs.ac.uk\/portal\/en\/people\/david-ellis(79b9b341-0ecb-4df8-a9b0-3d16cfcc61cd).html\">Dr David Ellis<\/a>. Jude is a\u00a0Lecturer in <a href=\"http:\/\/www.lancaster.ac.uk\/sociology\/\">Sociology<\/a> and Quantitative Methods and David a Lecturer in Computational Social Science in our <a href=\"http:\/\/www.lancaster.ac.uk\/psychology\/\">Psychology<\/a> Department.<\/p>\n<p>Jude and David recently presented at a Jisc event on \u2018Stories from the Field: Data are Messy and that&#8217;s (kind of) ok\u2019.<\/p>\n<p><!--more--><\/p>\n<p><iframe loading=\"lazy\" title=\"Stories from the Field: Data are Messy and that&#039;s (kind of) ok\" src=\"https:\/\/www.slideshare.net\/slideshow\/embed_code\/key\/h6JaWYi8PMdFBC\" width=\"427\" height=\"356\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe><\/p>\n<p>We talked to Jude and David about what Messy Data are (and many other things):<\/p>\n<p><strong>Q: At the recent Research Data Champions Day the title of your presentation was \u2018Data are Messy and that&#8217;s (kind of) ok\u2019. I wonder what are \u2018messy research data\u2019 in your fields?<\/strong><\/p>\n<p><strong>Jude<\/strong>: My \u2018messy data\u2019 are crime data. The \u2018messiness\u2019 comes from a lot of different directions. One of the main ones is that administrative data is not collected for research, it is collected for other purposes. It never quite has the thing that you want. You need to work out if it is a good enough proxy or not.<\/p>\n<p>For example I am interested in violence and gender but <a href=\"https:\/\/data.police.uk\/\">police crime data<\/a> doesn\u2019t disaggregate by gender. \u00a0There is no such crime as domestic violence, so it depends on whether somebody has flagged it as such, which is not mandatory, so it is hit and miss. I think the fact the data are not collected for research makes them messy for researchers, and then I guess there is all the other kind of biases that come with things like administrative data.\u00a0 So if you think about crime, not everybody reports a crime, so you only get a particular sample. If you have a particular initiative, so every time there are international football matches they have big initiatives around domestic violence, so reporting goes up, so everyone says that domestic violence is related to football.\u00a0 But is it, or is it just related to the fact that everyone one tells you, that you can report and they have zero tolerance to domestic violence during football matches?\u00a0 It\u2019s more likely to be recorded.<\/p>\n<p>Then you get feedback loops, so the classic one at the moment is knife crime in London, because knife crime has gone up on the agenda more money and resource will go into knife crime, at some point that will probably go down, and something else will go up because there is a finite amount of resource.\u00a0 These create feedback loops by the research that you do on the administrative data and people don\u2019t always remember that when they come to interpret research.<\/p>\n<figure id=\"attachment_656\" aria-describedby=\"caption-attachment-656\" style=\"width: 660px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"656\" data-permalink=\"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2018\/05\/31\/data-interview-on-messy-data\/2018-03-26-12-38-54\/\" data-orig-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.38.54.jpg?fit=3840%2C2160\" data-orig-size=\"3840,2160\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;2&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;Nexus 6P&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1522067934&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;4.67&quot;,&quot;iso&quot;:&quot;242&quot;,&quot;shutter_speed&quot;:&quot;0.007993&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"2018-03-26 12.38.54\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;Jude and David presenting on Messy Data&lt;\/p&gt;\n\" data-large-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.38.54.jpg?fit=660%2C371\" class=\"size-large wp-image-656\" src=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.38.54.jpg?resize=660%2C371\" alt=\"\" width=\"660\" height=\"371\" srcset=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.38.54.jpg?resize=1024%2C576 1024w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.38.54.jpg?resize=300%2C169 300w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.38.54.jpg?resize=768%2C432 768w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.38.54.jpg?w=1320 1320w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.38.54.jpg?w=1980 1980w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><figcaption id=\"caption-attachment-656\" class=\"wp-caption-text\">Jude and David presenting on Messy Data<\/figcaption><\/figure>\n<p><strong>David:<\/strong> The majority of data within psychology that tends to measure people is messy because people are messy, particularly social, psychological phenomenon, there is always noise within. The challenge is often trying to get past that noise to understand what might be going on.\u00a0 This is also true in administrative data and data you collect in a lab.\u00a0 Probably the only exception in psychology is where people are doing very, very controlled maybe visual perception experiments where the measurement is very fine grain, but almost everything else in Psychology is by its nature extremely messy, and data never looks like it appears in a textbook.<\/p>\n<p><strong>Q: So there is always that \u2018noise\u2019 in research data, regardless if you use external data such NHS data, or if you collect data yourself, unless as you say it is in a very controlled environment?<\/strong><\/p>\n<p><strong>David:\u00a0 <\/strong>Yes<strong>. <\/strong>And I guess that within Psychology there is an argument that if the data is collected in a very controlled environment, is that actually someone\u2019s real behaviour or is a less controlled environment more ecologically valid as you\u2019ve always got that balance to try and address?<\/p>\n<p><strong>Q:\u00a0 So what are the advantages, why do you work with messy data? \u00a0\u00a0\u00a0\u00a0<\/strong><\/p>\n<p><strong>Jude: <\/strong>Sometimes because there is nothing else. [laughs]<\/p>\n<p><strong>David<\/strong>: Because there is nothing else.\u00a0 I think Psychology generally is going to be messy. Because as I said people aren\u2019t perfect, you know they are not perfect scientific participants. Participants are not 100% predictable, people aren\u2019t predictable social phenomena.\u00a0 There are very few theories within social psychology, in fact, I don\u2019t think there\u2019s any that are 100% spot on.<\/p>\n<p>When you compare that to say physics, where there is <a href=\"https:\/\/www.grc.nasa.gov\/www\/k-12\/airplane\/newton.html\">Newton\u2019s law<\/a>, where there are governing theories, which are singular truths, which explain a certain phenomenon. We don\u2019t have much of that in psychology!\u00a0 We have theories that tend to explain social phenomena but people are too unpredictable.\u00a0 There are good examples of where theories have held for a long time but it is never a universal explanation.<\/p>\n<figure id=\"attachment_657\" aria-describedby=\"caption-attachment-657\" style=\"width: 300px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"657\" data-permalink=\"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2018\/05\/31\/data-interview-on-messy-data\/2018-03-26-12-28-48\/\" data-orig-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.28.48.jpg?fit=3840%2C2160\" data-orig-size=\"3840,2160\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;2&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;Nexus 6P&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1522067328&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;4.67&quot;,&quot;iso&quot;:&quot;174&quot;,&quot;shutter_speed&quot;:&quot;0.007993&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"2018-03-26 12.28.48\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;David presenting&lt;\/p&gt;\n\" data-large-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.28.48.jpg?fit=660%2C371\" class=\"size-medium wp-image-657\" src=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.28.48.jpg?resize=300%2C169\" alt=\"\" width=\"300\" height=\"169\" srcset=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.28.48.jpg?resize=300%2C169 300w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.28.48.jpg?resize=768%2C432 768w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.28.48.jpg?resize=1024%2C576 1024w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.28.48.jpg?w=1320 1320w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.28.48.jpg?w=1980 1980w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption id=\"caption-attachment-657\" class=\"wp-caption-text\">David presenting<\/figcaption><\/figure>\n<p><strong>Q: What are the implications for management of that kind of messy data? <\/strong><\/p>\n<p><strong>David: <\/strong>I think the implications are that you make sure that it is clear to people how you got from the raw data, which was noisy or messy, to something that resembles a conclusion.\u00a0 So that could be: how did you get from X number of observations that you boiled down to an average that you then analysed? What is it the process of that? It\u2019s not just about running a statistical test, it\u2019s about the whole process from: this is what we started with and this is what we ended with.<\/p>\n<p><strong>Jude<\/strong>:\u00a0 I think that\u2019s right, I think being very clear about what your data can and cannot support and be very clear that you are not producing facts, you are testing theory, where everything is iterative, a tiny step towards something else, not the end. You never get to the end.<\/p>\n<p><strong>David: <\/strong>I think researchers have a responsibility to do that and people have to be careful in the language they use to convey how that has happened.\u00a0 A good example of that at the moment is, there is a lot in the press and current debate about the effects social media has on children or on teenagers, and the way that it is measured and the language that is used to talk about that is to me totally disconnected. That behaviour isn\u2019t really measured. It is generated by people providing an estimate of what they do, yet we know that, that estimate isn\u2019t very accurate.\u00a0 The conclusions which have been drawn\u00a0 are that this is having this big effect on people.\u00a0 I\u2019m not saying it\u2019s not having any effect; it\u2019s not as exciting to say: \u2018well actually the data\u2019s really messy or not perfect, we can\u2019t really conclude very much\u2019. Instead it\u2019s being pushed into saying that [social media] is causing a massive problem for young people, which we don\u2019t know.\u00a0 Which is why there is a responsibility for that to be clear and I don\u2019t think in that debate it is clear, and I think there are big consequences because of it.<\/p>\n<figure id=\"attachment_655\" aria-describedby=\"caption-attachment-655\" style=\"width: 660px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"655\" data-permalink=\"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2018\/05\/31\/data-interview-on-messy-data\/2018-03-26-14-27-38\/\" data-orig-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-14.27.38.jpg?fit=3840%2C2160\" data-orig-size=\"3840,2160\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;2&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;Nexus 6P&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1522074458&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;4.67&quot;,&quot;iso&quot;:&quot;380&quot;,&quot;shutter_speed&quot;:&quot;0.013284&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"2018-03-26 14.27.38\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;Jude and David at Jisc panel discussion&lt;\/p&gt;\n\" data-large-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-14.27.38.jpg?fit=660%2C371\" class=\"size-large wp-image-655\" src=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-14.27.38.jpg?resize=660%2C371\" alt=\"\" width=\"660\" height=\"371\" srcset=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-14.27.38.jpg?resize=1024%2C576 1024w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-14.27.38.jpg?resize=300%2C169 300w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-14.27.38.jpg?resize=768%2C432 768w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-14.27.38.jpg?w=1320 1320w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-14.27.38.jpg?w=1980 1980w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><figcaption id=\"caption-attachment-655\" class=\"wp-caption-text\">Jude and David at Jisc panel discussion<\/figcaption><\/figure>\n<p><strong>Q: So in your dream world, what would change, so we could work better with this kind of data?<\/strong><\/p>\n<p><strong>Jude: <\/strong>I think we need better statistical literacy, across the board. This is what I did with my Masters students:\u00a0 I told them to go and find a paper or media story which used centred statistics, \u00a0then critique it.\u00a0 So, how do you know what someone is telling you is \u2018true\u2019? Why are they telling it you in that particular way? What data have they used? What have they excluded?<\/p>\n<p>You go to the stats literature and they talk about outliers, as though it\u2019s just a mere statistical phenomenon, but those decisions are often political and they massively change what we know, and nobody talks about that, nobody sets out exactly what that means.\u00a0 The only official statistics for crime in England and Wales are currently capped at a maximum of five incidents.\u00a0 If you are beaten up by your partner 40 times a year, only the first five are included in the count, which is a huge bias effect in what we know about crime.\u00a0 Then in the way resources are distributed between different groups, about what crimes are going up and what are falling.\u00a0 I think this lack of people questioning statistics in particular, but data more generally, is a real problem.\u00a0 In our social science degrees we just do not teach undergraduates how to do that.\u00a0 We do it with qualitative data, but we don\u2019t do it with quantitative data. It\u2019s exactly the same process, it\u2019s exactly the same questions, but we just don\u2019t do it, we are really bad at it Britain!<\/p>\n<p><strong>David: <\/strong>I think more generally, there is a cultural issue within the whole ethos of science, of how it gets published, of what becomes read and what doesn\u2019t become read.\u00a0 So again, say I go back, do a paper and find <strong>no<\/strong> relationship between social media use and anxiety.\u00a0 That would be harder to publish than if I write a paper and find a tiny correlation, which is probably spurious and not even relevant, between anxiety and social media. So again, this comes down to both criticising what is out there but also what is just becoming more sellable or having more \u2018impact\u2019.\u00a0 I use the word impact with inverted commas; what sounds more interesting, but actually might be totally wrong.\u00a0 I think what is pushed is what\u2019s more interesting rather than what is truth.\u00a0 I think it\u2019s worth remembering that science is about getting a result and trying to unpick it, looking at what else could explain this, what might we have missed.\u00a0 Rather than saying \u2018that\u2019s it, it\u2019s done\u2019, it\u2019s similar to what Jude was saying about a critical thinking process.<\/p>\n<p><strong>Q: Following on from what Jude said about the skills gap: You say that undergraduates are not taught the skills they need.\u00a0 Therefore, when we eventually get PhD students and early career researchers this gap might have even increased?<\/strong><\/p>\n<p><strong>Jude<\/strong>:\u00a0 Yes, and they don\u2019t use quantitative data, or they use it really uncritically. So lots of \u00a0post-graduate students who work on domestic violence won\u2019t use quantitative data, but their thesis often \u00a0starts with \u2018one in four women will experience domestic violence \u00a0in their lifetime\u2019 or \u2018two women a week are killed by intimate partners\u2019, bbut they don\u2019t know where that data comes from or how reliable it is or how it was achieved, yet it is just parroted.<\/p>\n<p><strong>David: <\/strong>I can give a similar example to that where it is sometimes difficult to take those numbers back, once they become a part of the common discourse.\u00a0 So years ago <a href=\"http:\/\/journals.plos.org\/plosone\/article?id=10.1371\/journal.pone.0139004\">we found<\/a> that people check their smartphone 85 times a day on average.\u00a0 Now that was a sample of about thirty young people. Now we obviously talked about that, but that number is now used <a href=\"http:\/\/www.dailymail.co.uk\/sciencetech\/article-3294994\/How-check-phone-Average-user-picks-device-85-times-DAY-twice-realise.html\">repeatedly<\/a>.\u00a0 Now there is no way that my grandmother or my parents check their phone 85 times a day.\u00a0 But that sample did, so there is now this kind of view that everyone checks it 85 times a day.\u00a0 They probably don\u2019t, but I can\u2019t take that back now, there are things you don\u2019t know at the time, but that is what that data showed.\u00a0 It\u2019s tricky to balance, and it was picked up as an impactful thing, but it wasn\u2019t what we really meant.<\/p>\n<p><strong>Q: Is there also a job for you as a researcher if your findings are picked up by the media looking for a catchy easy numbers, to write your paper differently so that it is not being picked up so easily, or is it the fault of the media, because they are just looking for a simplified version of a complex issue?<\/strong><\/p>\n<p><strong>David:\u00a0 <\/strong>There is a cultural issue, a kind of toing and froing; because we want our work to be read and we want people to read it and certainly writing a press release is one way of doing that.\u00a0 I think it\u2019s actually what you put in the press release [that] has to be even more refined, because a lot of people won\u2019t read the paper, but they will see the press release, and that will be spun.\u00a0 Once the press release is done, it\u2019s out of your control in some ways.\u00a0 You can get it as right as you want but a journalist might still tweak it a certain way.\u00a0 It\u2019s a really tough balance because as you say the other extreme is to say I am just going to leave it. But then people might not hear about the work, so it\u2019s a very tricky tightrope to walk.<\/p>\n<p><strong>Jude:<\/strong> We made the decision as a Centre when our work started getting picked up by the media, that we would not talk to the media about anything that had not been through peer review, so it is always peer reviewed first.\u00a0 We work with one person from the press office, we work with her closely, all the way through the process of putting the paper together and deciding the press release and how we are going to release it.\u00a0 What we have actually got now is contacts in several newspapers and media outlets And we say we will work with you exclusively providing this is the message which goes out. We have actually been successful enough that we\u2019ve now got two or three people on board who will do that with us.\u00a0 They get exclusives providing we see the copy before it goes public.<\/p>\n<figure id=\"attachment_414\" aria-describedby=\"caption-attachment-414\" style=\"width: 300px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"414\" data-permalink=\"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2017\/06\/20\/data-interview-with-jude-towers\/jude-head\/\" data-orig-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/06\/Jude-Head.jpg?fit=754%2C754\" data-orig-size=\"754,754\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;2.2&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;iPhone 5s&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1434830357&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;4.15&quot;,&quot;iso&quot;:&quot;320&quot;,&quot;shutter_speed&quot;:&quot;0.041666666666667&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Jude\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;Jude &lt;\/p&gt;\n\" data-large-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/06\/Jude-Head.jpg?fit=660%2C660\" class=\"size-medium wp-image-414\" src=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/06\/Jude-Head.jpg?resize=300%2C300\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/06\/Jude-Head.jpg?resize=300%2C300 300w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/06\/Jude-Head.jpg?resize=150%2C150 150w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/06\/Jude-Head.jpg?w=754 754w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption id=\"caption-attachment-414\" class=\"wp-caption-text\">Jude<\/figcaption><\/figure>\n<p><strong>David:<\/strong> That is very hard to do, but really good.<\/p>\n<p><strong>Jude:<\/strong> We have been really hardcore and we\u2019ve had a lot of pressure to put stuff out earlier, to make a bigger splash, to go with more papers. It was only I think because we resisted that, that in the long run it has been much better, although it is hard to resist the pressure.\u00a0 The press in our early work wanted our trends, but we wanted them to talk about the data, we wouldn\u2019t release the trends unless they talked about the problems with bias, official statistics.\u00a0 So we kind of married the two, but they didn\u2019t want it, but that was the \u00a0deal.<\/p>\n<p><strong>David: <\/strong>It\u2019s like when you say: \u2018people do X this number of times\u2019 then you can\u2019t put in brackets \u2018within the sample\u2019 so I understand where journalists come from and I understand the conversations with the press. To me as I said it\u2019s like walking a tightrope. It has to be interesting enough that people want to read it, but at the same time it needs to be accurate.<\/p>\n<p><strong>Jude: <\/strong>But that\u2019s the statistical literacy, because you want someone reading a media story going \u2018Really? Well how did you get that?\u2019 That\u2019s something we would do as academics when you are reading it. People are always telling me \u2018interesting facts\u2019 about violence and my first reaction is always: \u2018Where has that come from?\u2019 These questions should become routine. I think journalist training is terrible!\u00a0 I mean I have spent hours on the phone with journalists, who want me to say a really particular thing, and its clearly absolute nonsense! But they have got two little bits of data and they have drawn a line between them.<\/p>\n<p><strong>David:<\/strong> I have had a few experiences where journalists have tried to get a comment about someone else\u2019s work and I have said things like, \u2018I don\u2019t think this is right\u2019 or I\u2019ve been critical and the journalist said, \u2018well really what we are looking for is a positive comment\u2019. \u00a0And I\u2019ve said \u2018well I\u2019m not going to give you one\u2019, and they have said \u2018alright bye then\u2019, and have gone and found someone that will. \u00a0That doesn\u2019t happen very often, but we can see what they are kind of hoping for.\u00a0 Presumably, some of the time I have said things where I have been really critical. The BBC are quite good at that; they get someone who they know is going to be critical without having to explicitly saying something negative.<\/p>\n<p><strong>Q: This has been fascinating; we have been though the whole life cycle of data from the creation to the management and now to the digestion by the media.\u00a0 This tells us that data management issues are fundamental to the outputs of research.<\/strong><\/p>\n<p><strong>Jude: <\/strong>I think it impacts on the open data agenda though \u2018cause if I was going to put my data out, the caveat manual which came with it would be three times the size of the data.\u00a0 Again, you don\u2019t have any control over how someone presents an analysis of that data. I think it\u2019s really difficult because we are not consistent with good practice in reporting on messiness of data.<\/p>\n<p><strong>David:<\/strong> I think there is a weight of responsibility on scientists to get that right! Because it does affect other things. I keep using social media as an example. The government are running an <a href=\"https:\/\/www.parliament.uk\/business\/committees\/committees-a-z\/commons-select\/science-and-technology-committee\/inquiries\/parliament-2017\/impact-of-social-media-young-people-17-19\/\">enquiry<\/a> at the moment into the effects of screen time and social media. If I was being super critical I would say it\u2019s a bit early for an enquiry, because there isn\u2019t any cause and effect evidence. Even some of studies they report on their home page of the enquiry are totally flawed, one of them is not peer reviewed.\u00a0 That lack of transparency or statistical literacy even among Members of Parliament, clearly, is leading to things being investigated where actually we could be missing a bigger problem here.\u00a0 So that is just one example, but that is where there is a lot of noise about it, there is a lot of \u2018this might be a problem\u2019, or \u2018is it a problem?\u2019, right through to \u2018it definitely <strong>is<\/strong> a problem\u2019, without anyone standing back and going, \u2018actually, is this an issue, is the quality of the evidence there?\u2019<\/p>\n<figure id=\"attachment_335\" aria-describedby=\"caption-attachment-335\" style=\"width: 300px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"335\" data-permalink=\"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2017\/03\/13\/data-interview-with-david-ellis-part-1\/david_ellis\/\" data-orig-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/03\/David_Ellis.jpg?fit=480%2C480\" data-orig-size=\"480,480\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"David_Ellis\" data-image-description=\"&lt;p&gt;David Ellis&lt;\/p&gt;\n\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/03\/David_Ellis.jpg?fit=480%2C480\" class=\"wp-image-335 size-medium\" src=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/03\/David_Ellis.jpg?resize=300%2C300\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/03\/David_Ellis.jpg?resize=300%2C300 300w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/03\/David_Ellis.jpg?resize=150%2C150 150w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2017\/03\/David_Ellis.jpg?w=480 480w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption id=\"caption-attachment-335\" class=\"wp-caption-text\">David<\/figcaption><\/figure>\n<p><strong>Jude:<\/strong> Or can you even do it at the moment?<\/p>\n<p><strong>David:<\/strong> Yes, absolutely! That is a separate area and there is a methodological challenge in that.<\/p>\n<p><strong>Jude:<\/strong> We get asked to measure trafficking in human beings on a regular basis, we\u2019ve \u00a0even written a report that said you can\u2019t measure it at the moment! There is no mechanism in place that can give you any data that is good enough to produce any kind of measure.<\/p>\n<p><strong>David:<\/strong> But that isn\u2019t going to make it onto the front of the Daily Mail. [laughs]<\/p>\n<p><strong>Q: Maybe just to conclude our interview, what can the university do? You mentioned statistical literacy as one thing. Are there other things we can do to help? <\/strong><\/p>\n<p><strong>Jude<\/strong>: We are starting to move a little bit in <a href=\"http:\/\/www.lancaster.ac.uk\/arts-and-social-sciences\/\">FASS<\/a> [Faculty of Arts and Social Sciences] with some of <a href=\"http:\/\/www.lancaster.ac.uk\/arts-and-social-sciences\/study\/postgraduate\/research-training-programme\/\">Research Training Programme<\/a> and I think things like the data conversations which are hard to measure but I think are actually having a really good impact.\u00a0 Drawing people in through those kinds of mechanisms and then setting up people that are interested in talking about this would be good. I would like to see something around\u2026 what you need to tell people about your data when it\u2019s published; you know, the caveats: what it can and can\u2019t support, how far you can push it.<\/p>\n<p><strong>David:<\/strong> I think the University as a whole does a lot, certainly psychology, is preaching to the converted, in a way.\u00a0 I would like a thing in Pure [Lancaster University Data Repository] that when you upload a paper it says\u2026 \u2018have you have included any code or data?\u2019 just as a sort of a \u2018by the way you can do that\u2019. One, it tells people that we do it and two, it reminds people that if you\u2019re not doing that it would be useful just to have tick box just to see why.\u00a0 Obviously, there are lots of cases where you can\u2019t do it, but it would be good for that to be recorded. So is it actually, I can\u2019t do it because the data is a total mess or some other reason or I\u2019m not bothered.\u00a0 There is an issue here about why not, because, if it has just been published it should be in a form which is sensible and clear.<\/p>\n<p><strong>Jude:<\/strong> I wonder if there is some scope in just understanding the data, so maybe like the data conversation is specifically about qualitative data, and then other even more obscure forms like literature reviews as data, \u2018cause I still keep thinking about when you told me you offered to do data management with FASS and you were told they didn\u2019t have any data.<\/p>\n<p>I think that people don\u2019t think about it as data in the same way and it would be really good to kind of challenge that.\u00a0 I think data science has a massive problem in that area, it has become so dominant, and if you\u2019re not doing what fits inside the data science box you\u2019re not doing data and you\u2019re not doing science and it\u2019s really excluding.\u00a0 I think for the university to embrace a universal definition of data would be really, really, beneficial.<\/p>\n<p><strong>David:<\/strong> It\u2019s also good for the University, [to] capitalise on that extra resource; it would have a big effect on the institution as a whole.<\/p>\n<p><strong>Jude, David, thank you very much for this interesting interview!<\/strong><\/p>\n<figure id=\"attachment_658\" aria-describedby=\"caption-attachment-658\" style=\"width: 660px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"658\" data-permalink=\"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2018\/05\/31\/data-interview-on-messy-data\/2018-03-26-12-31-42\/\" data-orig-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.31.42.jpg?fit=3840%2C2160\" data-orig-size=\"3840,2160\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;2&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;Nexus 6P&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1522067502&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;4.67&quot;,&quot;iso&quot;:&quot;120&quot;,&quot;shutter_speed&quot;:&quot;0.007993&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"2018-03-26 12.31.42\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;Jude and David presenting&lt;\/p&gt;\n\" data-large-file=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.31.42.jpg?fit=660%2C371\" class=\"wp-image-658 size-large\" src=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.31.42.jpg?resize=660%2C371\" alt=\"\" width=\"660\" height=\"371\" srcset=\"https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.31.42.jpg?resize=1024%2C576 1024w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.31.42.jpg?resize=300%2C169 300w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.31.42.jpg?resize=768%2C432 768w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.31.42.jpg?w=1320 1320w, https:\/\/i0.wp.com\/wp.lancs.ac.uk\/highly-relevant\/files\/2018\/05\/2018-03-26-12.31.42.jpg?w=1980 1980w\" sizes=\"auto, (max-width: 660px) 100vw, 660px\" \/><figcaption id=\"caption-attachment-658\" class=\"wp-caption-text\">Jude and David presenting<\/figcaption><\/figure>\n<p><a href=\"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2017\/06\/20\/data-interview-with-jude-towers\/\">Jude <\/a>and <a href=\"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2017\/03\/13\/data-interview-with-david-ellis-part-1\/\">David<\/a> have also featured in previous Data Interviews.<\/p>\n<p><em>The interview was conducted by Hardy Schwamm, Research and Scholarly Communications Manager <a href=\"https:\/\/twitter.com\/HardySchwamm\">@hardyschwamm<\/a><\/em>. <em>Editing was done by Aniela Bylinski-Gelder and Rachel MacGregor.<\/em><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Our latest Data Interview features our two Jisc sponsored Data Champions, Dr Jude Towers and Dr David Ellis. Jude is a\u00a0Lecturer in Sociology and Quantitative Methods and David a Lecturer in Computational Social Science in our Psychology Department. Jude and David recently presented at a Jisc event on \u2018Stories from the Field: Data are Messy &hellip; <a href=\"http:\/\/wp.lancs.ac.uk\/highly-relevant\/2018\/05\/31\/data-interview-on-messy-data\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Data Interview on &#8220;Messy Data&#8221;<\/span><\/a><\/p>\n","protected":false},"author":520,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[26,17],"tags":[],"class_list":["post-651","post","type-post","status-publish","format-standard","hentry","category-data-interviews","category-jisc"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p81NIC-av","jetpack-related-posts":[],"_links":{"self":[{"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/posts\/651","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/users\/520"}],"replies":[{"embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/comments?post=651"}],"version-history":[{"count":7,"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/posts\/651\/revisions"}],"predecessor-version":[{"id":770,"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/posts\/651\/revisions\/770"}],"wp:attachment":[{"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/media?parent=651"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/categories?post=651"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/highly-relevant\/wp-json\/wp\/v2\/tags?post=651"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}