Statistics & data visualisation

The summer school in Statistics and Data Visualisation for Corpus Linguistics is aimed at students and researchers who wish to learn more about the use of statistics to explore language corpora. No prior knowledge of statistics is required; all necessary concepts and procedures will be explained in the course.

The summer school offers a practical introduction to the statistical procedures used for the analysis language corpora. The curriculum provides an overview of the main statistical procedures used in the field of corpus linguistics together with simple examples of application of these methods.

This summer school is organised and taught by Dr. Vaclav Brezina with contributions from other staff from Lancaster University. Vaclav Brezina is a Senior Lecturer at the Department of Linguistics and English Language, Lancaster University and a member of the ESRC Centre for Corpus Approaches to Social Science. He is the author of Statistics for Corpus Linguistics: A Practical Guide, which will be used as the course book for the summer school

Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

“Mastery of statistics is empowering. However, as statistical tools and analyses become more complex and sophisticated, they can also become rather daunting for the users. […] Every summer, a large number of students from different parts of the world come to Lancaster to learn about corpora and statistics during a week of Lancaster summer schools in corpus linguistics. These students often ask me what the best statistical test is to use with corpora, what the best collocation measure is etc. I usually respond: in many cases, the most powerful statistical technique is common sense” (pp. 283-4).

The topics include, for example:

  • Statistics: basic terms and concepts
  • Efficient data visualisation
  • Null hypothesis significance testing and effect sizes
  • Sampling methods and representativeness
  • Frequency and dispersion; descriptive and inferential statistics
  • Register variation and multi-dimensional analysis

To find out more about the school you can read a blog post by Stefania Maci who participated in the summer school last year. In her blog, Stefania describes how the research started with a group of fellow summer schools participants in Lancaster in the summer of 2017 took them to a presentation at an international corpus conference.