Isolde van Dorst, recent graduate from the University of Groningen and the University of Malta, discusses her study on pronominal address terms in Shakespeare’s texts in collaboration with the Encyclopaedia of Shakespeare’s Language project.
As part of my masters degree in Language and Communication Technologies, I wrote my thesis on the use of pronominal address terms in Shakespeare. Since my program is focused mainly on computational linguistics rather than literature, I decided to reach out to the team at Lancaster to see if a collaboration was possible. This, as you may have guessed, was successful and I was able to spend four months in Lancaster to do my research under the supervision of Jonathan Culpeper and Andrew Hardie. My partnership with the Shakespeare project has been highly successful. Not only was I able to complete my masters degree and, for the first time, use these particular computational methods to investigate Shakespeare’s pronominal address terms in an objective and extensive way, but I am now offering support to the project directly. Just two months after finishing my thesis, I began working on Shakespeare’s low frequency items with other members of the project team.
In recent decades, there has been a lot of research on Shakespeare’s use of the singular second person pronouns you, thou and thee. However, the results so far were inconclusive as to which features influence the choice of pronoun. Does the speaker’s age have an impact? Or their social status? Or its n-grams? As part of my research, I developed a prediction model to find which linguistic and extra-linguistic features influence the pronoun choice made by Shakespeare. The 23 features used in this study contain speaker and addressee information (e.g. age and status), play and scene data (e.g. play name and genre), and contextual information (e.g. the words used in close proximity of the pronoun).
The three algorithms used in this study, Naive Bayes, decision tree and support vector machine, are selected based on their difference in assumptions and learning biases. Additionally, a binary and trinary prediction was performed. For the trinary classification, the three pronouns thou, thee and you were kept separated. In the binary classification, thou and thee were condensed into one category THOU. The latter is common in YOU/THOU research, while the difference in case of the THOU pronouns supports a trinary approach. Computational linguists may be interested to know that of the three algorithms, the support vector machine models score best on the four scores assessed in this study: precision, recall, F-measure and accuracy. With 87.3% accuracy, the binary support vector machine model scored 24% better than the baseline.
For literary researchers and linguists, I found that there is one group of features that show up as the main predictor of the pronoun, namely the words of the n-gram. In particular the words directly on the right and directly on the left of the pronoun are important, which show that the direct linguistic context of the pronoun is most important when predicting the pronoun. There are several other features that show a positive influence on the pronoun prediction, among which are the names of the speaker and addressee, the status differential, and positive and negative sentiment. Overall, it is clear to me that there is significantly more scope carrying out research in this area, and I am immensely grateful to the project team for allowing me to use the dataset and work alongside them.