What was Christmas like for Shakespeare?

Christmas in Shakespeare’s time wasn’t a particularly glamorous affair, and it was quite unlike the finely-decorated homes and glitzy German markets that we know today.

It should therefore come as no surprise that the word Christmas only appears a mere three times in Shakespeare’s First Folio, and the word Yule doesn’t appear at all. Likewise, in our comparative corpus of other plays from the Elizabethan period (EEBO: Early English Books Online), Yule appears a number of times, confirming that it was still being celebrated, but the few cases of Christmas are in rather odd contexts – one is a malapropism, and one refers to the game blind-man’s buff.

One might be quick to assume that Shakespeare was lacking festive spirit or that the plays simply didn’t encompass Christmas festivities, but the real reason is that Christmas as we know it did not fully develop until the 1840s and beyond. Before that time, Easter was the main Christian festival, with Christmas being a largely secondary celebration. Over the years, Christmas developed into a twelve-day festival which started on the 25th December, and culminated on the Twelfth Night.

The Twelfth Night of Christmas was when theatre was performed at the royal court. It was no coincidence that Shakespeare named his play in that regard. One might say that it was merely a practical title referring to when the play was intended to be performed.

The festive connotations that we now attribute to certain words were simply non-existent in the works of Shakespeare. Christmas as we know it would not arrive for another 300-or-so years until Queen Victoria and her husband Albert introduced the festivities from Germany. However, that isn’t to say that the words themselves did not exist.

The word bauble appears 6 times in Shakespeare’s First Folio, each in a different text, but not one of these instances refers to the shiny red objects that dangle from our trees. Indeed, quite the opposite – baubles around that time were insignificant toys, or mere pieces of rubbish. In any case, the word carries significantly negative connotations in its usage, quite unlike today.

In the play Cymbeline, Pisanio calls a letter a senseless bauble (III.2.20). Whilst Shakespeare would have used the word senseless to refer to an object that lacks sensation or feeling, its meaning is now very different. Far from referring to the letter as a foolish Christmas decoration, Pisanio is instead referring to a worthless object that lacks human sensation.

Whilst Shakespeare himself may not have recognised Christmas as a time for fun, food, and family – the team working on the Encyclopaedia of Shakespeare’s Language would like to wish all our readers a restful break, and a prosperous New Year.

Posted in Blog | Tagged , , , | Comments Off on What was Christmas like for Shakespeare?

Is ‘more better’ a mistake if Shakespeare said it?

Research Associate Sean Murphy looks at Shakespeare’s use of more better, and considers whether we can really consider it a mistake if Shakespeare himself said it…

Learning English as a foreign language is hard. Learners are often corrected by teachers for saying or writing things like staying in our country is more better than going abroad. Being corrected by the teacher in front of their peers (with the corresponding loss of face which that may entail), or seeing the error underlined in red, can undermine learners’ self-confidence. But what if the teacher pointed out that, in Shakespeare’s First Folio, we find:

Art ignorant of what thou art. naught knowing

Of whence I am: nor that I am more better

Then Prospero

(The Tempest, I.ii)

The teacher might also highlight that, where the student uses the modern standard than, Shakespeare uses then. Of course, whether than and then after a comparative form were really different words, or merely orthographic variants (the First Folio contains 5 instances of better than, and 78 of better then) is debatable. Shakespeare uses more better three times in his plays, and in the very large database, Early English Books Online (comprising a wide variety of texts from 1473 – 1700), more better occurs no fewer than 5,180 times. In other words, rather than ‘making a mistake’, the student has used a structure common in Early Modern English (more better), together with the form (than) commonly used in comparative structures in present-day English (PDE).

As part of the Encyclopaedia of Shakespeare’s Language project, I’ve been re-reading the plays and regularising variant spellings in the First Folio (see the post on Smoothing out spelling variation). Having worked for many years as an English language teacher, I began to notice how many common ‘mistakes’ made by Spanish learners of English crop up in Shakespeare. Among others, these include non-standard use of: comparative and superlative forms of adjectives (more better, most coldest); prepositions (reason of, despite of); nouns (these news, informations) and double negatives (I can’t wear no dress)[1]. I amused myself with the idea that perhaps, as one writer has suggested, Shakespeare’s plays were actually written by Cervantes[2]. A more plausible explanation seemed to be that here was evidence of change in the English language, and the occurrence of a process of standardization from Shakespeare’s time to the present-day.

So to what extent can some typical learner errors be found in Shakespeare and Early Modern English? Using the power of corpus linguistics, and databases available on Lancaster University’s CQPweb (a corpus query tool)[3], I was able to search for examples of the above-mentioned non-standard forms in the following corpora: the Longman Learners Corpus (9 million words); the Shakespeare First Folio Corpus (1 million words); and the Early English Books Online (EEBO) Corpus (1.2 billion words). The results, all of which are attested examples in the corpora, are given in the table below.

Common non-standard forms in learner English compared to Shakespeare and Early Modern English

  Longman Learners Corpus (1990-2002)

9 million words


(First Folio, 1623)

1 million words

Early English Books Online (EEBO)


1.2 billion words

Double comparatives and superlatives


(more + comparative adjective)



(most + superlative adjective)





staying in our country is more better than going abroad

(133 / 14.8)[4]


Obihiro is one of the most coldest cities

(30 / 3.3)





nor that I am more better Then Prospero (Tem)

(22 / 20.3)


the most coldest that euer turn ‘d vp Ace (WT)

(6 / 5.5)





Are ye not more better then they?

(5,180 / 4.3)



winter, which is the most coldest season of the year

(3,627 / 3.0)



reason of




despite of


I asked the reason of the traffic jam

(161 / 17.9)


Despite of these problems, I would live there

(75 / 8.4)


Now doe you know the reason of this hast? (RJ) (14 / 12.9)


Despight of mine owne Nature (KL)

(17* / 15.7)


He demanded the reason of it

(143,907 / 119.7)


Despite of you i’ll tarry with them still

(6,532* / 5.4)



these news







when I buy the newspaper, these news are old

(9 / 1.0)


Please send me informations about your courses

(270 / 30.1)


But wherefore doe I tell these Newes to thee? (HIV 1)

(5** / 4.6)


In seeking tales and Informations Against this man (HVIII)

(1 / 0.9)


because these news are general

(1099** / 0.9)



He must deliver all those Informations (3,768 / 3.1)

Double negatives





I can not wear no dress for very hot day

(2 / 0.2)



I can not goe no further (AYL)

(2 / 1.8)



I can not go no           faster (2)

(453 / 0.4)

*   search for despite and despight

** search for news and newes

The first point to note is that all the learner ‘mistakes’ searched for (in bold) can be found in both Shakespeare’s plays and Early Modern English. The first statistic given means that there are 133 instances of more better, more stronger, more richer, etc. in the Longman Learners Corpus. When this figure is normalised, we get a frequency 14.8 instances per million words. As we can see, this is less than the normalised frequency in Shakespeare, but more than in Early English Books Online. In fact, if we glance over the normalised figures as a whole, we see that Shakespeare uses the non-standard forms (i.e. non-standard in PDE) more than the learners or his contemporaries, except in the case of reason of (where EEBO wins by a mile) and informations (the learners comfortably take the biscuit).

Secondly, it is important to remember that Shakespearean texts were written before any kind of written standard form of English had taken root. Standard forms simply did not matter so much in the late 16th and early 17th centuries as they do nowadays. With the advent of a standard form of the language and increasingly prescriptive attitudes from the 18th century onwards, adherence to a standard is now considered to be of great importance, and non-standard forms are often regarded as sub-standard. The paradox with Shakespeare is that there are forms in Shakespeare that are non-standard by today’s conventions, yet are lauded as a kind of supra-standard.

With language learners and teachers in mind, we might reflect on the following:

  • if some learner ‘mistakes’ can be found in canonical works such as those of Shakespeare and other writers, shouldn’t they more aptly be described as ‘variants’ – after all, they are the same as attested historical forms, some of which are regarded as prestigious;
  • the negative connotations implied by the term ‘mistakes’, and its association with prescriptivist attitudes to language are out of sync with modern linguistic theories which aim to describe language use, whether in the press, by learners or by previous generations;
  • English learners need to be praised and encouraged to increase their confidence in using the language – excessive attention to ‘mistakes’ is unlikely to help them achieve such self-belief:
  • teachers can raise learners’ awareness of historical variation and encourage comparison with PDE, and may even spark an interest in Shakespeare.

In short, learners of English should take heart. Far from making mistakes, they are speaking like Shakespeare. Their English teachers should be proud of them.

[1] Of course, the double negative is still used in PDE, as this example shows: they wouldn’t have no records (BNC Sampler Corpus).

[2] The article is in Spanish. It reviews a book by a Catalan author who claims that Cervantes and Shakespeare were the same person, on the basis of an alphanumeric analysis of their works.

[3] Available at https://cqpweb.lancs.ac.uk/

[4] Statistics shown refer to (absolute frequency / normalised frequency per million words).

Posted in Blog | Tagged , , , | Comments Off on Is ‘more better’ a mistake if Shakespeare said it?


Second year undergraduate students who are majoring or minoring in the Department of Linguistics and English Language at Lancaster University are invited to apply and work with us on the Encyclopaedia of Shakespeare’s English Project. This unique internship will give students the opportunity to directly engage with the project and contribute work towards the encyclopaedia.

This year, we have three different potential projects for students to work on. Click the links below to find out more information about each project:

Students will be working on a corpus of Shakespeare’s language, comparing the language of Shakespeare to that of his contemporaries, and looking at this work within the context of current research. Whilst knowledge of corpus methods is not essential, and full training will always be provided, an eagerness to engage with this new form of language analysis is vital.

Application forms can be downloaded from the departmental website here, and then emailed to Silke Brandt.

Informal enquires about the individual projects can be made to Jonathan Culpeper: j.culpeper@lancaster.ac.uk.


Posted in Blog | Tagged , , | Comments Off on SPRINT

Smoothing out spelling variation

Research Associate Jane Demmen highlights some of the issues involved in working with variable spellings that were typical of English in Shakespeare’s time…

These days there are many sophisticated software tools that can find, count, sort and display words in a variety of useful ways to help linguists carry out research into texts which would be impossible using just the naked eye. We’re using some of these tools to produce the Encyclopaedia of Shakespeare’s Language. However, a major obstacle for many linguistic software tools is recognising words that are spelled in more than one way, and counting them as one word form and not as separate word forms. English spelling was not fully standardised until well after the time that Shakespeare’s plays were written, and it was normal for words to be spelled in a variety of ways (sometimes depending on the way in which the writer would pronounce the words in speech).

A human can get over this fairly easily and understand, for example, that would, woud and wud are varying forms of the same word, but many computer software tools for linguistic research will read them as three different words. We want to group these varying word forms together to count them for the purposes of our Encyclopaedia entries, and indeed generally to be more accurate in our claims about Shakespeare’s language. Fortunately for us, we have the clever piece of software VARD 2 (Variant Detector; http://ucrel.lancs.ac.uk/vard/about/) which was developed by Alistair Baron and colleagues in the School of Computing and Communications at Lancaster University a few years ago to help ‘regularise’ spelling variation.

VARD 2 has a built-in dictionary and a set of rules enabling it to recognise a great many variations of common spellings and then suggest an appropriate replacement to a standard form (the standard usually being a modern form, e.g. would in the example above). It has an automatic mode, in which it will find and replace spelling variants on its own when run through a text, including:

  • dropping word-final ‘e’, e.g. in horne à horn, ink à inke
  • converting -ie word-endings to –y, e.g. in hypocrisie à hypocrisy
  • swapping ‘u’ for ‘v’ as in knaue à knave, and ‘v’ for ‘u’ as in vp à up
  • converting word-initial ‘i’ to ‘j’ in, e.g., iest à jest and iustice à justice.

VARD 2 also has a manual mode, in which it highlights spelling variants for the user to check individually and then choose which replacement to use. In the manual mode, users can also add new words to VARD 2’s dictionary. Shakespeare’s plays have many archaic words which aren’t in VARD 2’s built-in dictionary, and there are also quite a few words for which VARD 2 has difficulty determining the appropriate spelling in a particular context. It can distinguish different parts of speech, but still has problems with, for example, determining whether the word form deere is the noun deer or the adjective/noun dear. Similar difficulties arise with bee and be, doe and do, would and wood and many other cases, and so my colleague Sean Murphy and I have been using VARD 2 manually to make the appropriate choices ourselves.

This also enables us to be sure we are retaining archaic forms which we don’t want to be erased through modernisation, for example, keeping thou as well as you, which have important distinctions in the way they are used and the meanings they convey in this historical period. In so doing, our version of VARD 2 (and we ourselves) have learned a lot of new (i.e. old!) words, and we’ll be able to use our customised version of VARD 2 to standardise the spelling in other plays from the same period which contain similar kinds of spelling variation. However, the process has not been without some interesting challenges, dilemmas, and, occasionally, spirited debate!

Many of the words in Shakespeare’s plays are no longer in regular use, such as affright, bespeak, eyne, holp, holpen, spake and vizard, and others may never have been in regular use at all (such as bragless, misgraffed and questrist). In the process of attempting to standardise the spelling we therefore also have to decide which of the archaic forms we leave in (and in what forms), and how we standardise unfamiliar or archaic words. Do we

  • leave them in the forms they are found,
  • choose one or other of the variations shown in the Oxford English Dictionary for those that are in there (e.g. scurril/scurrile), and/or
  • modernise them to some extent – thereby possibly creating word forms spelled in ways that may never have actually appeared in early versions of the plays?

For example, if we standardise all past-participle –t endings to –ed (blest à blessed, forct à forced, curst à cursed, inricht à enriched and so on), what then do we do with curstest (a superlative adjective meaning ‘most cursed’)? If we follow our modernisation pattern and alter it to cursedest, we create a word form which doesn’t actually exist in our original-spelling set of plays (although it is found in the work of other writers of the period). Modernising spelling arguably makes it easier for the modern reader to understand – which is important – but does it then reduce authenticity?

In practice, we have adopted a range of solutions for different kinds of words (the documentation of which has run into tomes rivalling the size of the Shakespeare canon itself!).


Posted in Blog | Tagged , , | Comments Off on Smoothing out spelling variation

Panel Meeting – 27 July 2016

On 27th July 2016, the Encyclopaedia of Shakespeare’s Language team held its first panel meeting. The panel meeting was essentially an opportunity for the project’s advisors / ambassadors to visit our research centre and learn more about our aims and ambitions. More importantly, it was also an opportunity for the wider panel to critically assess the project team’s progress, and point out any flaws or difficulties that may arise. The panel were described as the project’s critical friends.

The day began at 10:30am, and ended at 4:30pm. The day consisted of a series of mini presentations focussing on results and method. There was also extensive discussion about the project’s engagement and publicity activity, and how to decide on the most effective way to convey the encyclopaedia’s information to its users. The day’s agenda is attached below:

Time Event
10:30 – 11:00 Arrival and coffee
11:00 – 12:20 A series of mini presentations focussing on results
  – Project outline (Jonathan Culpeper)
– Lexical items: Scottish, Irish and Welsh (Jonathan Culpeper / Alison Findlay)
– Grammatical items: The affirmatives yes, yea and ay (Jonathan Culpeper)
– Character profiles: Romeo and Juliet (Jonathan Culpeper)
– Play genre: Tragedy vs. comedy (Dawn Archer)
– The language of soliloquy (Sean Murphy)
– The language of Shakespeare and that of his contemporary playwrights: The weather (Jane Demmen)
12:20 – 1:30 Lunch (preceded by a brief photo opportunity and a toast)
1:30 – 2:30 A series of mini presentations focussing on method
  – Spelling regularisation (VARD) (Dawn Archer / Paul Rayson/ Alistair Baron)
  – Part-of-speech tagging (CLAWS) (Paul Rayson)
  – Semantic tagging (USAS) (Paul Rayson)
  – Social tagging (Dawn Archer)
  – Comparative playwrights corpus (Jane Demmen)
  – EEBO and genre (Sean Murphy)
  – Corpus methods and CQPweb (Andrew Hardie)
2.30 – 2.45 Users, engagement, publicity (Jonathan Culpeper / Alison Findlay / Mathew Gillings)
2:45 – 3:15 Tea break
3:15 – 4:30 Panel meeting (Chaired by David Crystal)

The panel meeting itself generated some food for thought, and the valued input from our panel has resulted in a number of tactical changes. The aim still remains the same, but the ways to achieve the means has changed ever-so-slightly. There was also extensive discussion about how the encyclopaedia could be used in a classroom or theatre context, and our panel members who represent those interests suggested that people may benefit from an app in addition to the printed encyclopaedia.

Overall, the panel meeting was highly successful, and we are grateful to our panel members for joining us and offering their thoughts.

Posted in Blog | Tagged , , | Comments Off on Panel Meeting – 27 July 2016

My winning proposal: putting Shakespeare together

Principal Investigator, Jonathan Culpeper, was interviewed about the Encyclopaedia of Shakespeare’s Language by ResearchResearch.com…

ResearchResearch Logo“In December 2015, Jonathan Culpeper, a professor of English language and linguistics at Lancaster University, learned that he had been successful in obtaining a grant of £797,997 from the Arts and Humanities Research Council’s open research call.

Grant success
He says his winning proposal, to create an encyclopaedia of Shakespeare’s language, was successful for three reasons. Number one: “There was clearly an academic gap. I’ve always thought it is quite paradoxical that people often talk about the wonderful language in Shakespeare and yet, when you go to the library there are just a handful of books on that topic on the shelves. Shakespeare’s language has definitely been overlooked in terms of academic treatment.”

Number two: “It’s clear we have the tools now to do this. When I started my PhD in the late 1980s, a large corpus might have had something like 1 million words. Now 1 billion is not uncommon.” A particular strength of his proposal, Culpeper says, is his team’s close affiliation with Lancaster’s Centre for Corpus Approaches to Social Science (Cass), which is funded by the Economic and Social Research Council. “Cass has developed a lot of tools for the social sciences. This project allows us to leverage them in the humanities, and specifically for Shakespeare,” he says.

Number three: “It’s topical. I think the celebrations surrounding the 400th anniversary of Shakespeare’s death helped bring focus to the project.”

You can read the full article by visiting ResearchResearch.com – just click here

Posted in Uncategorized | Tagged , , | Comments Off on My winning proposal: putting Shakespeare together

Technology, Shakespeare, Linguistics…

Shakespeare TwitterBernard Murphy offers an overview of some of the work being undertaken by Research Associate, Sean Murphy, on the project…

“My brother Sean is working on post-doctoral research in linguistics, especially the use of language in Shakespeare’s plays. Which may seem like a domain far removed from the interests of the technologists who read these blogs, but stick with me. This connects in unexpected ways to analytics of interest to us techies, and ultimately to a topic of interest to every reasonable person worldwide.

Let me start with Sean’s research. His goal has been to understand the different use of language, for example pronouns, between soliloquies in the comedies, history plays and tragedies. I won’t tax the patience of SemiWiki readers by going into the details – if you want to know more, there’s a link at the end of this blog. His approach is based on something called Corpus Linguistics – analysis of a body of writing to find trends and correlations.

Since Shakespeare’s works, prolific though he was, fit comfortably into one large, small-print volume, analysis of an electronic version can be performed easily with desktop software. Think of a statistical analysis package applied to language rather than numbers, looking at frequencies of word usage, or words used in close proximity. There are multiple software packages (from small and probably mostly academic vendors) for this type of analysis.”

You can read the full article on SemiWiki.com by clicking here

Posted in Uncategorized | Tagged , | Comments Off on Technology, Shakespeare, Linguistics…

What’s in a soliloquy?

Research Associate, Sean Murphy, offers his thoughts on what makes the soliloquies of Shakespearean comedy different to those in a history play or tragedy…

Is the language of a soliloquy in a Shakespearean comedy different from that of a soliloquy in a history play or a tragedy? You would think so. But how? Intuitively, you might say that soliloquies in comedies are all about love, in histories, they’re probably about the King, and in tragedies, characters are always saying O. And you’d be right. But could you go any further than that? What other words mark out soliloquies in each genre as distinct from the other two genres?

I set out to find the answers by comparing frequency lists of soliloquies in comedies, histories and tragedies, and identifying statistically significant words. My comparisons produced some predictable and some surprising results.

As expected, comedies contain almost two-thirds of uses of the word love in soliloquies. Apart from conventional uses such as I do love thee, love is often personified implicitly as the goddess, Venus or the god, Cupid, as in Love, lend me wings. The relative overuse in comic soliloquies of love, and the pronouns I, she and her, contrast with a relative underuse of thy and thou. This confirms our idea of the typical soliloquist as an introspective lover, and also reminds us that second-person pronouns are more likely to be used in conversation, at least in comedy.

In history soliloquies, the most significant words are Henry and King – hardly surprising since seven of the ten history plays in the First Folio concern a king called Henry, and there are no characters called Henry in comedies or tragedies! Male names and titles (York, Edward, Richard and Clarence) are very significant in histories, whereas the female pronoun ‘her’ is statistically rare. As the critic Juliet Dusinberre says, in Shakespeare’s history plays, women stand for permanence and fidelity against shifting political sands but are essentially impotent.

Sometimes research reveals a word that appears to be significant, but is used repeatedly by one character, so is not necessarily representative of that genre. In the following soliloquy, Richard III, who is a rather self-absorbed character, uses myself nine times in as many lines of soliloquy:

What do I fear? Myself? There’s none else by;
Richard loves Richard, that is I, and I,
Is there a murderer here? No — Yes, I am.
Then fly — what, from myself? Great reason, why?
Lest I revenge. What, myself upon myself?
Alack, I love myself — Wherefore? For any good
That I myself have done unto myself?
O no, alas, I rather hate myself
For hateful deeds committed by myself:
I am a villain — yet I lie, I am not.
Fool, of thyself speak well — fool do not flatter,
– Richard III

Interestingly, the repetitions of myself contrast with the one instance of thyself in the last line, perhaps suggesting that Richard is suffering from what we now call a multiple personality disorder.

Tragic soliloquy accounts for over half the uses of the expression O, as in O Brutus! – just the kind of passionate style you might expect in tragedy. But perhaps the most curious finding is the frequency of ‘t’, the contracted form of ‘it’. Maybe Shakespeare intended it to represent the speech style of a character speaking alone and affected by circumstances such as bereavement or destitution: Fie on’t! O fie! (Hamlet I.ii); Who is’t can say, “I am the worst?” (King Lear IV.i).

Thou, thy and thee are characteristic of tragic soliloquies (unlike in comedy, remember). They usually refer to absent characters, places, nature or objects (such as a candle): If I quench thee, thou flaming Minister, / I can again thy former light restore (Othello, V.ii). Why are thou, thy and thee so common? It may be because tragedy involves forces beyond a character’s control (even though it is their failings which lead to the tragedy). Perhaps Shakespeare is suggesting that characters are trying to communicate with the wider universe in an attempt to justify their feelings and actions.

To Shakespeare lovers, that which we call a soliloquy will always be special. Knowing a little bit more about the kind of language that makes up different kinds of soliloquies can perhaps help us to appreciate the artistry that lies within.

Posted in Blog | Comments Off on What’s in a soliloquy?

A methodological journey…

Principal Investigator, Jonathan Culpeper, shares his thoughts on why the methods used on the Encyclopaedia of Shakespeare’s Language project are quite so unique…

Just before Christmas 2015, the AHRC announced that it was going to fund the £1 million Encyclopaedia of Shakespeare’s Language project. I actually had the idea for the project 20 years ago. The fact that it took so long has much to do with method.

The approach I envisaged for Shakespeare’s language is analogous to more recent developments in dictionaries of general English, and, specifically, the departure from the philological tradition that resulted in the Collins Cobuild Dictionary of the English Language, the first full corpus-based dictionary. Being corpus-based implies both a particular methodology for revealing meanings, and a particular theoretical approach to meaning. There is less reliance on the vagaries and biases of editors, and a greater focus on the evidence of actual usage. The question ‘what does X mean?’ is pursued through another question: ‘how is X used?’

But I wanted more from the encyclopaedia than this. I wanted it to be comparative, to reveal not just the usage of words and other linguistic units in Shakespeare but also in the general language of the period. This way, we can tap into issues such as what is distinctive about Shakespeare’s language, and, more particularly, how Shakespeare’s language would have been perceived by his contemporary audience.

For example, the play Henry V contains Welsh, Irish and Scottish characters. A pilot examination I conducted with Alison Findlay (English and Creative Writing) of the words Welsh, Irish and Scottish used in over 100 million words written in Shakespeare’s time revealed that: (1) that the Welsh barely registered on the Elizabethan consciousness, being considered a harmless in-group, only noteworthy for their curious language, (2) the Irish were wild, savage, rebels, viewed positively only in relation to Irish rugs (an important colonial import), and (3) the Scottish, whilst also rebels, were respected for their political power. (Current Shakespearean dictionaries do not contain entries for any of these three words).

The problem 20 years ago was the lack of comparative data. Back in the early 1990s, the leading historical corpus of English was without doubt the Helsinki Corpus of English Texts, completed in 1991. This corpus amounted to 1.5 million words – an impressive figure in those days! Moreover, it had been put together with great care; it was reliable. But those 1.5 million words covered the period 730 to 1710. The section contemporaneous with Shakespeare amounted to less than half a million words, and was thus far short of what is required for serious comparative work.

To solve the problem, I set about, with Merja Kytö, creating the Corpus of English Dialogues. The reason for the focus on dialogues is that this would provide an interesting comparison for the dialogues of Shakespeare’s plays. This project soaked up 10 or more years, not just in creating the corpus but also in publishing the various insights it afforded into early modern dialogues along the way.

I was then overtaken – in a positive way! – by other events, notably, the advent of a fully-searchable 1.2 billion transcribed version of Early English Books Online (EEBO) (i.e. EEBO-TCP). For years, EEBO, which contains pretty much all early modern printed output, had been of limited value to linguists because the texts were only available as images, and language searches relied on OCR, with all its inaccuracies. Now, however, I have a 321 million word fully searchable corpus of texts written by Shakespeare’s contemporaries.

In addition, solutions, or at least partial solutions, had evolved for the various problems associated with the computational analysis of historical language data. Early modern spelling variation had been a major stumbling block (e.g. the word would could be spelt would, wold, wolde, woolde, wuld, vvold, etc.). This problem has been largely solved by the Variant Detector (VARD), devised by scholars at Lancaster, especially Alistair Baron . The Lancaster-developed CLAWS part-of-speech annotation system, which works well for present-day English, has been adapted for Early Modern English (though more work will be necessary). Similarly, semantic annotation has received attention from generations of researchers at Lancaster University, and has been (and is being) adapted for Early Modern English, most recently within the AHRC-funded SAMUELS project, involving a consortium of universities, including Lancaster.

I don’t doubt that there will be many more twists and turns, lumps and bumps in the future methodological journey. But I am cheered by the fact that I will not be facing them alone but in the company of a wonderful group of people who are part of the project: Andrew Hardie and Tony McEnery (both LAEL), Paul Rayson (Computing and Communications), Alison Findlay (English & Creative Writing) and Dawn Archer (Manchester Metropolitan).

For a brief project description, see: AHRC award to create a new Encyclopaedia of Shakespeare’s Language.

Posted in Blog | Tagged , , | Comments Off on A methodological journey…