Intern – Katie Bates


Having just finished my second year as an English Language and Linguistics student at  Lancaster University, I was thrilled to be given the opportunity to complete an internship with the department, on the subject of the language of Shakespeare.

My work focused specifically on defining low frequency words that featured in Shakespeare’s First Folio, specifically those that had been collectively less ten times, but were most often used only once! Attempting to make sense of some wonderful words (my favourites include lass-lorn, out-paramoured, and traitoress) felt like the careful work of excavation; the infrequent uses of these terms like uncovered fossils from which we can begin to wonder at the lesser-known communications, attitudes and cultures of a past time.

Like previous interns I worked with corpus methods, but I was the first to be able to take advantage of scripts that had been developed to improve speed and productivity. This meant that for each definition, both the relevant corpus data and existent definitions for each term were automatically pooled for me, resulting in much smoother workflow. Through a process of comparison with the existent dictionaries of Shakespeare’s language (e.g. those of Schmidt and Onions) and the Oxford English Dictionary, it became possible to piece together semblances of meaning. Whilst the meanings of some terms not used today are apparent to readers (e.g depopulate), others such as pregnancy (which Shakespeare used with the meaning of clever or quick-wittedness) required heavy inference and research into etymology, context and contemporaries in order to compose a definition.

My involvement with the project through the internship has renewed my awe for the beauty and complexity of Shakespeare’s language, whilst also providing a practical insight into the uses of corpus methods, which will be invaluable to my final year of study. I have found both the research and wider project truly fascinating, and look forward to its completion, so others can enjoy the rich resources it will offer.

Posted in Uncategorized | Comments Off on Intern – Katie Bates

Shakespeare’s Neologisms: From Myth to Evidence

Following on from the AHRC-funded Encyclopedia of Shakespeare’s Language project, we are pleased to announce that we have been successfully awarded a grant (£9,740.15) from the British Academy. The project will establish whether, and to what extent, widely held views about Shakespeare’s neologisms are a myth, and also improve our understanding and appreciation of his words.

The website of the well-respected Shakespeare Birthplace Trust proclaims that “William Shakespeare invented over 1,700 words”, with similar estimates being found across non-academic and academic works alike. However, these estimates are often based on the number of words in the Oxford English Dictionary that have as their first citation a work attributed to Shakespeare. No study, however, has systematically scrutinized each of these words, hunting for earlier uses.

The recent advent of Early English Books Online (the largest repository of historical English printed works), paired with the recently-released Enhanced Shakespearean Corpus, means that it is timely to undertake such a study. This study will also investigate a further set of potential neologisms based on a list of words that only occur in texts attributed to Shakespeare.

The project will be led by Prof. Jonathan Culpeper, with Prof. Jonathan Hope acting as a consultant, and Isolde van Dorst providing research assistance.

Posted in Uncategorized | Tagged , , | Comments Off on Shakespeare’s Neologisms: From Myth to Evidence

Scuffles, Swagger, and Shakespeare: The Hidden Story of English

Our very own Jonathan Culpeper recently featured in the BBC Four documentary “Scuffles, Swagger, and Shakespeare: The Hidden Story of English” presented by Dr. John Gallagher. Jonathan discusses some recent work coming out of the Encyclopedia of Shakespeare’s Language project, suggesting that the amount of neologisms credited as being coined by Shakespeare may not be as numerous as has been suggested in the past.

You can still catch the documentary on BBC iPlayer by clicking here.

Posted in Uncategorized | Tagged , , , , | Comments Off on Scuffles, Swagger, and Shakespeare: The Hidden Story of English

Intern – Eleanor Field

Hi, I’m Eleanor, and I’ve just finished my second year of studying English Language and Linguistics at Lancaster. During my summer break I completed an internship working on the Encyclopaedia of Shakespeare’s Language, supervised by Professor Jonathan Culpeper. I decided to apply for the internship as I have an interest in stylistics, and the fact that the internship allowed for the analysis of literary texts through linguistic methods (such as the use of corpora) seemed exciting to me.  I was initially nervous about using corpus research methods as it was something completely new to me, but with Professor Culpeper’s supervision I quickly picked it up and found it to be an invaluable tool throughout the project.

My role involved writing proposed definitions for the encyclopaedia, that could be verified by the project team, focussing on words that occur at an extremely low frequency within Shakespeare’s work. In order to do this, I considered Shakespeare’s usage of each word in context by using a corpus; viewing the language use in context really helped me to understand the intended meaning and guided the writing of the entries.

Completing the internship has not only improved my understanding of corpus research methods and provided me with independent study skills but also confirmed that I wish to pursue postgraduate study.  I loved completing the research for this project and hopefully will have the opportunity to undertake my own in the future!






Posted in Uncategorized | Comments Off on Intern – Eleanor Field

BBC Radio 3 Podcast

Jonathan Culpeper and Alison Findlay feature in a new 45-minute BBC Radio 3 Podcast called “New thinking: Shakespeare’s Language”, presented by John Gallagher. They discuss how the project works, and the light it’s shedding both on how Shakespeare worked as a writer, and on the development of the English language in Shakespeare’s day.

Give it a listen on the BBC website by clicking here!

Posted in Blog | Comments Off on BBC Radio 3 Podcast

Intern – Sam Hollands

Being involved with the Encyclopedia of Shakespeare’s Language project has been a great opportunity. I have been working as an intern for the last 4 weeks developing scripts to improve the efficiency of certain workflows, mainly designing a system to increase the speed that we can write definitions for the encyclopedia. The project has been great for giving me hands-on coding experience in an academic environment, and brushing up on my Bash and Python skills. My particular area of interest is in speech processing, so the invitation to improve my computational skills is something I am grateful to have received.

I found Shakespeare’s tendency to create neologisms through the prefix ‘un-’ particularly interesting, suggesting a vast proportion of his neologisms are just negations of words he didn’t invent. Where I once thought Shakespeare invented thousands of neologisms, it appears that this is a myth, largely predicated on the OED’s use of Shakespeare as first user for many terms that simply haven’t been tested for antedating. This has been a fantastic experience and I’m excited to see what the final results are and what percentage of the initial claims of approximately 1,500 neologisms are actually words Shakespeare invented.

Posted in Blog | Comments Off on Intern – Sam Hollands

Encyclopedia of Shakespeare’s Language Symposium

Posted in Blog | Comments Off on Encyclopedia of Shakespeare’s Language Symposium

A close encounter with Richard III

By Dr Jane Demmen, Senior Research Associate

Last month project Co-Investigator Andrew Hardie and I presented a paper at the Computational Methods for Literary-Historical Textual Scholarship conference at De Montfort University in Leicester (UK): a great event bringing together scholars from far and wide with interests in digital humanities approaches to literary texts. (The slides from this paper, and others from our project, are available on our website here Leicester is also the resting place of King Richard III: the last English monarch to die in the course of battle and the inspiration for one of Shakespeare’s most interesting and controversial villains…

In Shakespeare’s plays we first meet the character who becomes Richard III in Henry VI parts 2 and 3 (when he is Duke of Gloucester). Part 3 familiarises us with the cruel and merciless tyrant who later will murder any number of individuals standing between him and the English throne (in the play Richard III). Yet Richard’s character evokes a certain admiration, in part for his ruthless cunning but also for his grotesque humour. “See how my sword weeps for the poor king’s death”, he remarks of his bloody weapon, having just fatally stabbed Henry (in Henry VI part 3, 5:6). “Why I can smile, and murder while I smile,” he tells us in his long soliloquy at the end of Henry VI part 3, 3:2. Cleverly, Richard makes these wry observations when he is alone on stage, so we the audience find ourselves drawn into his dastardly ambitions simply through our privileged access to what is going on in his mind.

As in the play, the real king Richard III died leading his troops at the Battle of Bosworth (the last main conflict in the Wars of the Roses, between the noble houses of Lancaster and York), just outside the city of Leicester, in August 1485. He was 32 and had reigned for just two years, having assumed the throne when the young heir apparent, Edward V, was declared illegitimate (based on claims that his father Edward IV had been bigamously married to young Edward’s mother, Queen Elizabeth). The question of whether or not Richard III was subsequently responsible for the murder of young Edward (who with his brother became known as ‘the princes in the tower’), as is portrayed in the play, remains open and a topic of conjecture to this day.

Richard III’s remains were, astonishingly, discovered beneath a car park in Leicester in 2012, after a search instigated by members of the Richard III Society and an archaeological dig led by the University of Leicester (in co-operation with Leicester City Council, owners of the car park). The site of the car park had once been a friary, where Richard’s body was buried after his death and defeat in battle. Following an impassioned debate over where Richard’s remains should hereafter lie, they were finally interred in Leicester Cathedral in 2015 in a coffin made by one of his descendants, whose DNA was used to help confirm the identity of the remains.

On the plinth of Richard’s monument in the cathedral is his coat of arms, his personal emblem (the figure of a boar), and his personal motto Loyaulte me lie (“loyalty binds me”).

Right next to the conference building at DeMontfort University (the green edge of which is just visible in the far right of the photo below) is the Newarke Gate. Richard III’s body would likely have passed through it when it was brought back to the city, draped over a horse, on public display (as proof of Richard’s death, the defeat of the Yorkists, and victory of the Tudor forces).

The analysis of Richard’s remains bore out the fact that he had a curvature of the spine, but not that he was strikingly physically deformed, a popular idea which appears to have been fuelled by literary characterisation of him as twisted in body as well as mind. In Shakespeare’s plays, Queen Elizabeth (widow of Richard’s elder brother, the late king Edward IV) describes him as “that foul bunch-backed Toad” (Richard III, 4:3). Richard himself says, “Then since the Heavens have shaped my Body so, Let Hell make crooked my Mind to answer it” (although only in the audience’s hearing, in Henry VI part 3, 5:6).

Although the real-life Richard’s reign saw bloody conflict, treachery and political intrigue, as one walks around the cathedral and the city of Leicester in the present day noticing artefacts and snippets of information about his life, it’s apparent that he is remembered as a supporter of education and fair laws for ordinary people, and not the villainous tyrant imagined in Shakespeare’s plays.

Education was of course the main reason for my visit to Leicester, and is cornerstone of our project. Learning more about Richard III was an unexpected bonus to attending the conference. There’s more information about visiting the cathedral at and about the nearby King Richard III Visitors’ Centre at


Posted in Uncategorized | Tagged , , , , | Comments Off on A close encounter with Richard III

New intern

We are very pleased to welcome Poppy Plumb to the Encyclopaedia of Shakespeare’s Language team for the next few weeks. Find out a little more about Poppy and what she’ll be working on below…

I’ve just finished my second year at Lancaster studying English Language and Literature, and for the next few weeks I’ll be working on the Encyclopaedia of Shakespeare’s Language. I was excited to hear about this project because of its application of linguistic and corpus methods to Shakespeare. Being such an integral part of the literary canon, and embedded in the study of English Literature throughout compulsory education in the UK, I find the opportunity to take a more linguistic approach to Shakespeare refreshing and exciting. I’m also keen to pursue postgraduate study following my undergraduate degree, so the opportunity to work on a research project like this would provide me with invaluable experience.

My research on the project will be focussing on neologisms: the words that Shakespeare supposedly coined. I’ll be building upon the work of past interns and comparing various definitions of each word from different sources; checking each word is present in a corpus of Shakespeare’s plays, and coming up with a proposed definition of each word for the encyclopaedia.

Given the immense number of words being added to the encyclopaedia, the work Poppy is doing is integral to its compilation. We look forward to working with her.

Posted in Uncategorized | Tagged , , | Comments Off on New intern

Is that a verb I see before me? Implementing grammatical category/part-of-speech tagging in the Shakespeare Corpus

Jane Demmen discusses the process of part-of-speech tagging the Shakespeare corpus, explores some of the issues the team encountered, and their subsequent solutions…

One of the many software programs that enables us to carry out the task of creating an electronic encyclopaedia of Shakespeare’s language is the Constituent Likelihood Automatic Word-tagging System (known to its friends as CLAWS). CLAWS “reads” the text of each play and assigns a label to each word denoting its grammatical function (also known as a part-of-speech tag or POS tag).

Why bother with grammatical labels for every word?
Assigning grammatical category labels to the texts of Shakespeare’s plays is essential to our project for several reasons. Crucially, it enables the word-stock of the plays to be classified into headwords, which form the basis of the dictionary-type entries in Volume 1 of our encyclopaedia. A headword is the lemma or base form of a set of grammatically-related words. For example, the headword fight (verb) is related to fights, fightest, fighting, fought and foughtst. The headword fight (noun) is treated separately, because it’s a different part of speech, and is related to fights (plural). Most dictionaries are arranged in a similar way. Importantly, it also lays useful groundwork for further potential studies, especially:

  • creating a descriptive grammar of Shakespeare’s language
  • studying variation in styles of grammar amongst different characters, plays, genres and between Shakespeare and other authors
  • investigating change in grammatical usage over time (within the Shakespeare canon, or between Shakespeare and other authors).

Finally, at some later stage we may want to apply semantic category tags (labels denoting the area of meaning or semantic “domain” to which each word belongs) to our Shakespeare play-texts using another software tool, for example, the USAS (UCREL Semantic Analysis System) software tool ( The USAS tool relies partly on grammatical information from the CLAWS tags in order to assign categories of meaning to words in a text, and if these are incorrect it’s less likely to be able to suggest appropriate meaning categories.

A brief history of CLAWS
CLAWS was developed at the Lancaster research centre UCREL (the University Centre for Computer Corpus Research on Language; in the 1990s. The CLAWS tagset (the range of part-of-speech/grammatical category labels it assigns) has been through several iterations. We use the CLAWS6 tagset in our project (, which has about 200 possible labels for different grammatical categories!

How does it work?
When the CLAWS software is run over a text, it assigns the part-of-speech (POS) tags in part by using the information from its lexicon (a built-in dictionary of known words and the grammatical role(s) they can take) and in part using a set of context-based rules (for example, nouns tend to be preceded by determiners). Of course, many words can play more than one possible grammatical role. For example, to is a highly frequent word which can be a preposition, if it occurs before a noun, or part of an infinitive verb. In cases like this, CLAWS will assign a series of possible tags, starting with the one it calculates as having the greatest probability of being correct. It displays that tag within square brackets, with other possible tags after it. It expresses the probability of each tag being correct as a percentage. POS-tagged words in a text file appear like this:





Breach_[NN1/100] VV0@/0]

In the example above (from Henry V 3_1), CLAWS correctly assigns the POS tags for Once as a general adverb (RR), more as a comparative general adverb (RRR), unto as a general preposition (II), the as an article (AT) and Breach as a singular common noun (NN1). The tags for Breach show the probability of it being a noun as 100% in this context, and a 0% probability of it being a verb.

So far, so good. If only it was always this straightforward!

CLAWS and older forms of English
CLAWS was developed for late 20th century English, with which it has an impressively high accuracy rate of 96-97% (when applied to the British National Corpus, according to the writers of the manual Geoffrey Leech and Nick Smith in 2000). However, we know from other research carried out by Lancaster colleagues in 2007 that its accuracy drops slightly with English from the 16th/17th century. When the spelling is standardised, as it has been for our project, we can expect an accuracy rate of about 89% – which is still very good, but not good enough for us to be confident in building frequency-based encyclopaedia entries that rely on grammatical information. Therefore, project Co-Investigator Andrew Hardie has carried out some development work on CLAWS specifically for this project (for example, extending its lexicon to include verb forms that agree with the pronoun thou), and he, I and recent CASS PhD graduate Jennifer Hughes have been manually checking the POS tags assigned by CLAWS to every single word of 38 Shakespeare plays, and correcting any tagging errors.

What kind of things does CLAWS have trouble with?
There are a number of factors which cause CLAWS difficulty in working out the grammatical role of words in Shakespeare’s language. Some are to do with the style of English of this period in general, such as word orders which were typical then but not now (e.g. the main verb coming first in questions, as in “Know you where you are?”, “Saw you Aufidius?”). Words which are unfamiliar because they are no longer in use also cause problems (e.g. ancient, familiar to us as an adjective meaning ‘very old’ in present-day English, but in earlier times also used as a noun to mean either someone who lived a long time ago, or someone who was a standard-bearer/ensign (a military term). Printing errors (spelling anomalies, missing words or words which may be incorrect) cause further difficulties. Some of these remain in our texts as linguistic artefacts, particularly if there is disagreement among scholars over what the intended word is.

During the course of the tag checking we’ve expanded the tagging lexicon of CLAWS by several thousand words so that, for example, it now knows that ancient can be a noun.

Other factors relate to the type of texts we’re dealing with, and which we could expect to encounter in plays not only by Shakespeare, but also by other dramatists of his day. These include foreign words (French, Italian, Spanish and/or Latin being popular). For example, in Twelfth Night 4_2, Feste the clown (as Sir Topas) says:

“Bonos dies, Sir Toby;”

Bonos dies is meant to be either Latin or Spanish for ‘good day’, which we would tag simply as foreign words (FW). Although CLAWS does recognise some foreign words and tag them as such, it doesn’t in this case, and tags Bonos as a plural noun (NN2) and dies as a verb in the second person present (VVZ). Wordplay, puns, innuendo and other such language features beloved of dramatists, especially in comedy dialogue, sometimes baffle CLAWS (and, not infrequently, the human researcher).

For anyone interested in the details of typical and recurring POS tagging errors we’ve encountered and corrected in our data, here are a few:

Unfamiliar-looking adverb (not ending in –ly) tagged incorrectly (as a noun, in this case)
“Why do you speak so startingly and rash [NN1/58] JJ/42?” (Othello 3_4)

Infinitive verbs incorrectly tagged as preposition followed by noun
“To [II/100] TO/0 lip NN1 a wanton in a secure couch,” (Othello 4_1)

Noun incorrectly tagged as verb
“Alas! what cry [VV0/62] NN1/38 is that?” (Othello 5_2)

 Noun incorrectly tagged as adjective
“Among the Nettles at the Elder [JJR/74] NN1/26 tree:” (Titus Andronicus 2_3)

Verb incorrectly tagged as noun
“Both heaven and earth Friend [NN1/99] NP1/1 thee for ever.” (The Two Noble Kinsmen 1_4)

Adjective incorrectly tagged as noun
“Patience dear [NN1/55] JJ/44 UH/0 RR@/0 Niece,” (Titus Andronicus 3_1)

 Interjection incorrectly tagged as verb
“Hail [VV0/68] UH/21 NN1/11 to thee Lady” (Othello 2_1)
“Marry [VV0/86] UH/14 for justice she is so employed” (Titus Andronicus 4_3)

To conclude, POS tag checking, although labour-intensive, has been a crucial process in the data preparation for our project because it’s vital to the quality of the output that the underlying assumptions about grammatical categories are correct. It’s not the type of task everyone would enjoy, though I have: it’s challenged and improved my understanding of grammar in the period Shakespeare was writing. The expansion of the tagging lexicon by several thousand words means that we now have a version of CLAWS which is much better equipped for use with English from earlier centuries, which we anticipate will be a useful future resource.

Posted in Uncategorized | Tagged , , , , , | Comments Off on Is that a verb I see before me? Implementing grammatical category/part-of-speech tagging in the Shakespeare Corpus