Smoothing out spelling variation

Research Associate Jane Demmen highlights some of the issues involved in working with variable spellings that were typical of English in Shakespeare’s time…

These days there are many sophisticated software tools that can find, count, sort and display words in a variety of useful ways to help linguists carry out research into texts which would be impossible using just the naked eye. We’re using some of these tools to produce the Encyclopaedia of Shakespeare’s Language. However, a major obstacle for many linguistic software tools is recognising words that are spelled in more than one way, and counting them as one word form and not as separate word forms. English spelling was not fully standardised until well after the time that Shakespeare’s plays were written, and it was normal for words to be spelled in a variety of ways (sometimes depending on the way in which the writer would pronounce the words in speech).

A human can get over this fairly easily and understand, for example, that would, woud and wud are varying forms of the same word, but many computer software tools for linguistic research will read them as three different words. We want to group these varying word forms together to count them for the purposes of our Encyclopaedia entries, and indeed generally to be more accurate in our claims about Shakespeare’s language. Fortunately for us, we have the clever piece of software VARD 2 (Variant Detector; which was developed by Alistair Baron and colleagues in the School of Computing and Communications at Lancaster University a few years ago to help ‘regularise’ spelling variation.

VARD 2 has a built-in dictionary and a set of rules enabling it to recognise a great many variations of common spellings and then suggest an appropriate replacement to a standard form (the standard usually being a modern form, e.g. would in the example above). It has an automatic mode, in which it will find and replace spelling variants on its own when run through a text, including:

  • dropping word-final ‘e’, e.g. in horne à horn, ink à inke
  • converting -ie word-endings to –y, e.g. in hypocrisie à hypocrisy
  • swapping ‘u’ for ‘v’ as in knaue à knave, and ‘v’ for ‘u’ as in vp à up
  • converting word-initial ‘i’ to ‘j’ in, e.g., iest à jest and iustice à justice.

VARD 2 also has a manual mode, in which it highlights spelling variants for the user to check individually and then choose which replacement to use. In the manual mode, users can also add new words to VARD 2’s dictionary. Shakespeare’s plays have many archaic words which aren’t in VARD 2’s built-in dictionary, and there are also quite a few words for which VARD 2 has difficulty determining the appropriate spelling in a particular context. It can distinguish different parts of speech, but still has problems with, for example, determining whether the word form deere is the noun deer or the adjective/noun dear. Similar difficulties arise with bee and be, doe and do, would and wood and many other cases, and so my colleague Sean Murphy and I have been using VARD 2 manually to make the appropriate choices ourselves.

This also enables us to be sure we are retaining archaic forms which we don’t want to be erased through modernisation, for example, keeping thou as well as you, which have important distinctions in the way they are used and the meanings they convey in this historical period. In so doing, our version of VARD 2 (and we ourselves) have learned a lot of new (i.e. old!) words, and we’ll be able to use our customised version of VARD 2 to standardise the spelling in other plays from the same period which contain similar kinds of spelling variation. However, the process has not been without some interesting challenges, dilemmas, and, occasionally, spirited debate!

Many of the words in Shakespeare’s plays are no longer in regular use, such as affright, bespeak, eyne, holp, holpen, spake and vizard, and others may never have been in regular use at all (such as bragless, misgraffed and questrist). In the process of attempting to standardise the spelling we therefore also have to decide which of the archaic forms we leave in (and in what forms), and how we standardise unfamiliar or archaic words. Do we

  • leave them in the forms they are found,
  • choose one or other of the variations shown in the Oxford English Dictionary for those that are in there (e.g. scurril/scurrile), and/or
  • modernise them to some extent – thereby possibly creating word forms spelled in ways that may never have actually appeared in early versions of the plays?

For example, if we standardise all past-participle –t endings to –ed (blest à blessed, forct à forced, curst à cursed, inricht à enriched and so on), what then do we do with curstest (a superlative adjective meaning ‘most cursed’)? If we follow our modernisation pattern and alter it to cursedest, we create a word form which doesn’t actually exist in our original-spelling set of plays (although it is found in the work of other writers of the period). Modernising spelling arguably makes it easier for the modern reader to understand – which is important – but does it then reduce authenticity?

In practice, we have adopted a range of solutions for different kinds of words (the documentation of which has run into tomes rivalling the size of the Shakespeare canon itself!).


About Mathew Gillings

PhD Linguistics student at Lancaster University.
This entry was posted in Blog and tagged , , . Bookmark the permalink.