{"id":380,"date":"2016-10-22T12:38:42","date_gmt":"2016-10-22T12:38:42","guid":{"rendered":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/?p=380"},"modified":"2016-10-22T12:38:42","modified_gmt":"2016-10-22T12:38:42","slug":"smoothing-out-spelling-variation","status":"publish","type":"post","link":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/2016\/10\/22\/smoothing-out-spelling-variation\/","title":{"rendered":"Smoothing out spelling variation"},"content":{"rendered":"<p><em>Research Associate Jane Demmen highlights some of the issues involved in working with variable spellings that were typical of English in Shakespeare&#8217;s time&#8230;<\/em><\/p>\n<p>These days there are many sophisticated software tools that can find, count, sort and display words in a variety of useful ways to help linguists carry out research into texts which would be impossible using just the naked eye. We&#8217;re using some of these tools to produce the <em>Encyclopaedia of Shakespeare&#8217;s Language<\/em>. However, a major obstacle for many linguistic software tools is recognising words that are spelled in more than one way, and counting them as one word form and not as separate word forms. English spelling was not fully standardised until well after the time that Shakespeare&#8217;s plays were written, and it was normal for words to be spelled in a variety of ways (sometimes depending on the way in which the writer would pronounce the words in speech).<\/p>\n<p>A human can get over this fairly easily and understand, for example, that <em>would<\/em>, <em>woud<\/em> and <em>wud<\/em> are varying forms of the same word, but many computer software tools for linguistic research will read them as three different words. We want to group these varying word forms together to count them for the purposes of our Encyclopaedia entries, and indeed generally to be more accurate in our claims about Shakespeare&#8217;s language. Fortunately for us, we have the clever piece of software VARD 2 (Variant Detector; http:\/\/ucrel.lancs.ac.uk\/vard\/about\/) which was developed by Alistair Baron and colleagues in the School of Computing and Communications at Lancaster University a few years ago to help &#8216;regularise&#8217; spelling variation.<\/p>\n<p>VARD 2 has a built-in dictionary and a set of rules enabling it to recognise a great many variations of common spellings and then suggest an appropriate replacement to a standard form (the standard usually being a modern form, e.g. <em>would<\/em> in the example above). It has an automatic mode, in which it will find and replace spelling variants on its own when run through a text, including:<\/p>\n<ul>\n<li>dropping word-final &#8216;e&#8217;, e.g. in <em>horne<\/em> \u00e0 <em>horn<\/em>, <em>ink<\/em> \u00e0 <em>inke<\/em><\/li>\n<li>converting -ie word-endings to \u2013y, e.g. in <em>hypocrisie<\/em> \u00e0 <em>hypocrisy<\/em><\/li>\n<li>swapping &#8216;u&#8217; for &#8216;v&#8217; as in <em>knaue<\/em> \u00e0 <em>knave<\/em>, and &#8216;v&#8217; for &#8216;u&#8217; as in <em>vp<\/em> \u00e0 <em>up<\/em><\/li>\n<li>converting word-initial &#8216;i&#8217; to &#8216;j&#8217; in, e.g., <em>iest<\/em> \u00e0 <em>jest<\/em> and <em>iustice<\/em> \u00e0 <em>justice<\/em>.<\/li>\n<\/ul>\n<p>VARD 2 also has a manual mode, in which it highlights spelling variants for the user to check individually and then choose which replacement to use. In the manual mode, users can also add new words to VARD 2&#8217;s dictionary. Shakespeare&#8217;s plays have many archaic words which aren&#8217;t in VARD 2&#8217;s built-in dictionary, and there are also quite a few words for which VARD 2 has difficulty determining the appropriate spelling in a particular context. It can distinguish different parts of speech, but still has problems with, for example, determining whether the word form <em>deere<\/em> is the noun <em>deer<\/em> or the adjective\/noun <em>dear<\/em>. Similar difficulties arise with <em>bee<\/em> and <em>be<\/em>, <em>doe<\/em> and <em>do<\/em>, <em>would<\/em> and <em>wood<\/em> and many other cases, and so my colleague Sean Murphy and I have been using VARD 2 manually to make the appropriate choices ourselves.<\/p>\n<p>This also enables us to be sure we are retaining archaic forms which we don&#8217;t want to be erased through modernisation, for example, keeping <em>thou<\/em> as well as <em>you<\/em>, which have important distinctions in the way they are used and the meanings they convey in this historical period. In so doing, our version of VARD 2 (and we ourselves) have learned a lot of new (i.e. old!) words, and we&#8217;ll be able to use our customised version of VARD 2 to standardise the spelling in other plays from the same period which contain similar kinds of spelling variation. However, the process has not been without some interesting challenges, dilemmas, and, occasionally, spirited debate!<\/p>\n<p>Many of the words in Shakespeare&#8217;s plays are no longer in regular use, such as <em>affright<\/em>, <em>bespeak<\/em>, <em>eyne<\/em>, <em>holp<\/em>, <em>holpen<\/em>, <em>spake<\/em> and <em>vizard<\/em>, and others may never have been in regular use at all (such as <em>bragless<\/em>, <em>misgraffed<\/em> and <em>questrist<\/em>). In the process of attempting to standardise the spelling we therefore also have to decide which of the archaic forms we leave in (and in what forms), and how we standardise unfamiliar or archaic words. Do we<\/p>\n<ul>\n<li>leave them in the forms they are found,<\/li>\n<li>choose one or other of the variations shown in the <em>Oxford English Dictionary<\/em> for those that are in there (e.g. <em>scurril\/scurrile<\/em>), and\/or<\/li>\n<li>modernise them to some extent &#8211; thereby possibly creating word forms spelled in ways that may never have actually appeared in early versions of the plays?<\/li>\n<\/ul>\n<p>For example, if we standardise all past-participle \u2013t endings to \u2013ed (<em>blest<\/em> \u00e0 <em>blessed<\/em>, <em>forct<\/em> \u00e0 <em>forced<\/em>, <em>curst<\/em> \u00e0 <em>cursed,<\/em> <em>inricht<\/em> \u00e0 <em>enriched<\/em> and so on), what then do we do with <em>curstest<\/em> (a superlative adjective meaning &#8216;most cursed&#8217;)? If we follow our modernisation pattern and alter it to <em>cursedest<\/em>, we create a word form which doesn&#8217;t actually exist in our original-spelling set of plays (although it is found in the work of other writers of the period). Modernising spelling arguably makes it easier for the modern reader to understand \u2013 which is important \u2013 but does it then reduce authenticity?<\/p>\n<p>In practice, we have adopted a range of solutions for different kinds of words (the documentation of which has run into tomes rivalling the size of the Shakespeare canon itself!).<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Research Associate Jane Demmen highlights some of the issues involved in working with variable spellings that were typical of English in Shakespeare&#8217;s time&#8230; These days there are many sophisticated software tools that can find, count, sort and display words in &hellip; <a href=\"http:\/\/wp.lancs.ac.uk\/shakespearelang\/2016\/10\/22\/smoothing-out-spelling-variation\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":94,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[2],"tags":[37,42,48],"class_list":["post-380","post","type-post","status-publish","format-standard","hentry","category-blog","tag-shakespeare","tag-spelling-regularisation","tag-vard"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/wp-json\/wp\/v2\/posts\/380","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/wp-json\/wp\/v2\/users\/94"}],"replies":[{"embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/wp-json\/wp\/v2\/comments?post=380"}],"version-history":[{"count":0,"href":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/wp-json\/wp\/v2\/posts\/380\/revisions"}],"wp:attachment":[{"href":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/wp-json\/wp\/v2\/media?parent=380"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/wp-json\/wp\/v2\/categories?post=380"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/wp.lancs.ac.uk\/shakespearelang\/wp-json\/wp\/v2\/tags?post=380"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}