{"id":1340,"date":"2020-06-03T07:07:39","date_gmt":"2020-06-03T07:07:39","guid":{"rendered":"http:\/\/wp.lancs.ac.uk\/the-ruskin\/?p=1340"},"modified":"2020-06-05T09:29:50","modified_gmt":"2020-06-05T09:29:50","slug":"ruskin-and-ai","status":"publish","type":"post","link":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/2020\/06\/03\/ruskin-and-ai\/","title":{"rendered":"Ruskin and AI"},"content":{"rendered":"<p><span style=\"font-size: 12pt\"><em>In this post, <a href=\"https:\/\/www.lancaster.ac.uk\/the-ruskin\/people\/postdocs-and-research-students\/\">Dr Rob Smail<\/a> (our recent <a href=\"https:\/\/www.lancaster.ac.uk\/the-ruskin\/research\/augmented-humanity\/\">AHRC Creative Economy Engagement Fellow<\/a>) reflects on his research in using Machine Learning to explore Ruskin&#8217;s manuscripts.<\/em><\/span><\/p>\n<p>What can computer Machine Learning reveal about Ruskin?\u00a0 During my time at The Ruskin as an AHRC Creative Economy Engagement Fellow, I\u2019ve been exploring how the digitisation of The Ruskin Whitehouse Collection can create opportunities for new kinds of research.<\/p>\n<p>The Ruskin Whitehouse Collection is the largest assemblage of Ruskin material in the world, and the most representative of Ruskin\u2019s working practices across a diverse range of media. In addition to 7,400 letters and 29 volumes of manuscript diaries, it includes thousands of drawings, paintings and photographs &#8211; digitising all this material will take years. However, supported by the <a href=\"http:\/\/www.nwcdtp.ac.uk\/\">North West Consortium Doctoral Training Partnership<\/a> (NWCDTP) and the <a href=\"https:\/\/www.friendsofnationallibraries.org.uk\/\">Friends of the National Libraries<\/a> (FNL), I\u2019ve been able to work with the team at The Ruskin on a study to guide this work.<\/p>\n<p>Our aims in this study were twofold.\u00a0 We wanted to set some basic digitisation standards and we wanted to experiment with using Machine Learning to trace connections across the full range of Ruskin\u2019s works.<\/p>\n<p><strong>The Source Set <\/strong><\/p>\n<p>Our first task was to select a source set, with a manageable number of items to develop and refine our approach.\u00a0 Building on my previous work at the <a href=\"https:\/\/www.lancaster.ac.uk\/lec\/\">Lancaster Environment Centre<\/a>, which focused on the historic flora of the Lake District, I decided to choose a source set that revealed Ruskin\u2019s thoughts about the region.<\/p>\n<p>Ruskin first visited the Lakes when he was 5, and he returned throughout his life before deciding to settle there in 1871, when he bought Brantwood, near Coniston. The last tour he made before buying Brantwood took place between late June and August 1867.\u00a0 On that occasion, Ruskin had come to the Lakes to recover from fatigue.\u00a0 His stay that summer helped him recoup, which is part of the reason he later made the region his home.<\/p>\n<p>Surprisingly, Ruskin\u2019s 1867 visit has received less attention than his other Lake District holidays. Therefore, we decided to centre our study in the letters he wrote during his tour, which had the added benefit of potentially enabling us to determine what it was about the Lakes that helped lift Ruskin\u2019s spirits.<\/p>\n<p>In all, we identified 53 letters.\u00a0 These included letters sent by the writer, Thomas Carlyle; the philologist, Fredrick Furnivall (of OED fame); the engraver, George Allen (who would later become Ruskin\u2019s publisher) and the painter, Thomas Richmond.\u00a0 But the majority of the letters \u2013 39 of the 53 \u2013 were sent to Ruskin\u2019s cousin, Joan Severn, and his mother, Margaret.<\/p>\n<figure id=\"attachment_1344\" aria-describedby=\"caption-attachment-1344\" style=\"width: 383px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1344 size-full\" src=\"http:\/\/wp.lancs.ac.uk\/the-ruskin\/files\/2020\/05\/Picture-2.png\" alt=\"Image of Ruskin\u2019s letter to Joan Severn, 2 July 1867\" width=\"383\" height=\"600\" srcset=\"https:\/\/wp.lancs.ac.uk\/the-ruskin\/files\/2020\/05\/Picture-2.png 383w, https:\/\/wp.lancs.ac.uk\/the-ruskin\/files\/2020\/05\/Picture-2-192x300.png 192w\" sizes=\"auto, (max-width: 383px) 100vw, 383px\" \/><figcaption id=\"caption-attachment-1344\" class=\"wp-caption-text\">Ruskin\u2019s letter to Joan Severn, 2 July 1867<\/figcaption><\/figure>\n<p><strong>Digitising the Letters<\/strong><\/p>\n<p>Digitising these letters was a two-part process, which was supported by the contributions of two digitisation assistants: <a href=\"http:\/\/www.nwcdtp.ac.uk\/current-students\/student-profiles-2\/claire-mcgann\/\">Claire McGann<\/a> and <a href=\"https:\/\/www.lancaster.ac.uk\/history\/about\/people\/ben-wills-eve\">Ben Wills-Eve<\/a>.\u00a0 Working together, we created an accurate and faithful <em>transcription<\/em> of the contents of each letter, and then we\u00a0<em>encoded<\/em> information about each letter\u2019s structure and layout into each transcription.<\/p>\n<p>After consulting current standards, we decided to adapt the <a href=\"https:\/\/www.researchgate.net\/publication\/270270907_Modest_XML_for_Corpora_Not_a_standard_but_a_suggestion\">\u2018modest approach\u2019 to XML<\/a> (eXtensible Markup Language) encoding recommended by our colleague <a href=\"https:\/\/www.lancaster.ac.uk\/linguistics\/about\/people\/andrew-hardie\">Andrew Hardie<\/a>. Andrew\u2019s approach provides a flexible way of using XML tag elements to encode extra information about the plain text transcriptions, whilst keeping the amount of tags added to a minimum.\u00a0 These elements, which appear inside chevrons, help capture different levels of semantic meaning, and they can help us ensure that information regarding each letter\u2019s structure and layout is retained during the process of digitisation. In order to ensure that our approach was in keeping with best practices in the field, we built on Andrew\u2019s model by selecting tag elements based on the standards of the <a href=\"https:\/\/tei-c.org\/\">Text Encoding Initiative<\/a> (TEI).<\/p>\n<figure id=\"attachment_1348\" aria-describedby=\"caption-attachment-1348\" style=\"width: 939px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1348\" src=\"http:\/\/wp.lancs.ac.uk\/the-ruskin\/files\/2020\/05\/Picture-1.png\" alt=\"Image of a sample XML transcription of Ruskin\u2019s letter to Joan, 2 July 1867\" width=\"939\" height=\"550\" srcset=\"https:\/\/wp.lancs.ac.uk\/the-ruskin\/files\/2020\/05\/Picture-1.png 939w, https:\/\/wp.lancs.ac.uk\/the-ruskin\/files\/2020\/05\/Picture-1-300x176.png 300w, https:\/\/wp.lancs.ac.uk\/the-ruskin\/files\/2020\/05\/Picture-1-768x450.png 768w\" sizes=\"auto, (max-width: 939px) 100vw, 939px\" \/><figcaption id=\"caption-attachment-1348\" class=\"wp-caption-text\">XML transcription of Ruskin\u2019s letter to Joan, 2 July 1867<\/figcaption><\/figure>\n<p><strong>Using Machine Learning<\/strong><\/p>\n<p>Once we finished digitising all 53 letters in the source set, we were able to run a series of tests using Machine Learning approaches to examine them.\u00a0 One aspect of the letters we were keen to examine was whether we could use \u2018classifiers\u2019 to detect differences in the way Ruskin wrote to different correspondents.<\/p>\n<p>Classifiers are algorithms that assist with predicative modelling.\u00a0 They\u2019re often used in supervised Machine Learning research, where raw input data needs to be sorted on the basis of specific characteristics.<\/p>\n<p>In this case, we used a classifier known as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Naive_Bayes_classifier\">Na\u00efve Bayes<\/a>, which is based on Bayes\u2019s Theorem and which has been shown to be reliable in the classification of texts.\u00a0 This theorem, formulated by the 18<sup>th<\/sup>-century minister and statistician, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Thomas_Bayes\">Thomas Bayes<\/a>, helps calculate the likelihood of an event on the basis of characteristics that might relate to that event.<\/p>\n<p>We were curious to see whether we could use Na\u00efve Bayes to group the letters in the source set by recipient based on each letter\u2019s stylistic characteristics.<\/p>\n<p>Na\u00efve Bayes works best when the algorithm can cross-reference several examples of the characteristics related to each classification.\u00a0 This process, which is sometimes called \u2018training\u2019, allows the classifier to learn which characteristics to associate with each group.\u00a0 So, we decided to restrict our experiment to the 39 letters in the source set to Ruskin\u2019s mother, Margaret, and his cousin, Joan.<\/p>\n<p>This gave us a small but sufficient sample with two clearly defined classifications: letters to Margaret and letters to Joan. Our aim was to determine if Na\u00efve Bayes could correctly identify which letters were written to whom based on the words Ruskin used.<\/p>\n<p>We split the letters in to two sets: a training set of 38 letters to which the recipient was known and a testing set of 1 letter, from which we\u2019d removed the recipient\u2019s name.\u00a0 Whereas the former was used to train Na\u00efve Bayes; we used the latter to test whether the trained classifier was able to determine to whom the anonymised letter was sent.<\/p>\n<p>We repeated the test 39 times, splitting the letters in every possible combination and then taking an average of all 39 predictions.\u00a0 We were pleased to find that Na\u00efve Bayes was able to predict the recipient of the testing set correctly 87.2 percent of the time.<\/p>\n<p><strong>Our Findings <\/strong><\/p>\n<p>Our study confirms that there\u2019s a discernible difference between the way Ruskin wrote to his mother and his cousin.\u00a0 Now, on the face of it, that might not seem all that surprising. Most of us adjust our style to suit our addressee.<\/p>\n<p>What matters though, is that our findings demonstrate that \u2013 even with a modest source set \u2013\u00a0we can begin to train software to detect these differences and this can help us identify patterns in Ruskin\u2019s writings across the whole collection.<\/p>\n<p>Identifying these sorts of patterns gives us a new way of assessing Ruskin\u2019s writing in different contexts over the course of his life, and an approach to determining when undated material was written and the identity of un-named correspondents. In future, it will be possible to train the software we\u2019ve used with increasing accuracy and to extend it to different types of textual material, including Ruskin\u2019s diaries.<\/p>\n<p>These possibilities are exciting.\u00a0 They will allow us to reveal new links across the collection, providing researchers and visitors with deeper insights into both Ruskin\u2019s works and his world.<\/p>\n<p>__________<\/p>\n<p><em>Dr Rob Smail received his PhD in History from the University of Manchester\u00a0<\/em><em>in 2012, and he completed his AHRC CEEF Fellowship at The Ruskin in 2019. His exploratory research with the Whitehouse Collection helped pave the way for further projects, including <a href=\"https:\/\/www.lancaster.ac.uk\/the-ruskin\/research\/digitising-manuscript-letters\/\">Digitising the Manuscript Letters of John Ruskin<\/a> and <a href=\"https:\/\/www.lancaster.ac.uk\/the-ruskin\/research\/lake-district-unesco-world-heritage-site\/\">Enriching understanding of natural-cultural heritage in the English Lake District<\/a>.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, Dr Rob Smail (our recent AHRC Creative Economy Engagement Fellow) reflects on his research in using Machine Learning to explore Ruskin&#8217;s manuscripts. What can computer Machine Learning reveal about Ruskin?\u00a0 During my time at The Ruskin as an AHRC Creative Economy Engagement Fellow, I\u2019ve been exploring how the digitisation of The Ruskin &hellip; <a href=\"https:\/\/wp.lancs.ac.uk\/the-ruskin\/2020\/06\/03\/ruskin-and-ai\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Ruskin and AI<\/span><\/a><\/p>\n","protected":false},"author":703,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-1340","post","type-post","status-publish","format-standard","hentry","category-misc"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p9OdOv-lC","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/posts\/1340","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/users\/703"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/comments?post=1340"}],"version-history":[{"count":12,"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/posts\/1340\/revisions"}],"predecessor-version":[{"id":1368,"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/posts\/1340\/revisions\/1368"}],"wp:attachment":[{"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/media?parent=1340"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/categories?post=1340"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wp.lancs.ac.uk\/the-ruskin\/wp-json\/wp\/v2\/tags?post=1340"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}