Corpus courses for credit

The Department of Linguistics and English Language offers individual postgraduate courses  in corpus linguistics which can be taken for credit as part of our online offer of PG courses and programmes. You can read more about the courses  on this website and apply for a place on these courses on this website. If you have any questions about the short courses and how to apply for them, please get in touch with us at or you can email the Programme Director Dr Dana Gablasova ( with your questions or to book a slot for an online chat.

This year and next academic year, we offer the following courses:

Fundamentals of corpus linguistics (runs October to December 2024, 2025)

This module provides an overview of corpus linguistic methods and their application in a range of areas, including sociolinguistics, discourse analysis and applied linguistics. It will enable students to acquire theoretical knowledge of the underlying principles of the field of corpus linguistics as well as practical skills. The module introduces key corpus linguistic techniques such as concordance analysis, the analysis of wordlists and ngram lists, keyword analysis and collocation analysis. It also provides an overview of practical applications of corpus methods in a wide range of areas of linguistic and social research. The topics taught in this module include, for example:

The story of corpus linguistics: Introduction to the field – Linguistic description and corpus annotation – Concordances and frequency information  – Collocations and n-grams analysis  – Types of corpora, available corpora, and corpus building – Corpus linguistics and society: discourse analysis, sociolinguistics – Corpus linguistics and pragmatics  – Corpora in the classroom – Presenting corpus research in research reports 

Corpus-based vocabulary and grammar of English (runs October to December 2024, 2025)

This module offers a detailed corpus-based description of the grammar and lexicon of the English language and the methodology to arrive at this description. Students will learn about words – for example, their meanings and the relationships between words as observed in a corpus (collocation and colligation) – as well as word classes, phrases and clauses. The module discusses the concept of lexicogrammar, a notion that allow us to see language holistically with the attention to patterns, which do not fit in older grammar book and dictionary descriptions. Special attention is paid to spoken language and the ‘grammar of conversation’. The module has a dual focus: 1) It provides linguistic knowledge and relevant terminology about the English language, guiding students through grammatical, lexical and lexico-grammatical patterns observed in large general corpora. 2) It teaches transferable skills of language description of a variety of lexico-grammatical phenomena based on corpus evidence. The topics taught in this module include, for example:

Words and word classes – Phrases and clauses – Grammar of the noun phrase – Grammar of the verb phrase – The grammar of conversation – Words and their meanings – Producing (pedagogical) wordlists – Special topics in corpus-based analyses of lexicogrammar


Corpus design and data collection (runs January-March 2024, 2025)

This module provides essential information about corpus design and data collection, one of the key areas in corpus linguistics. It will equip students with necessary skills for carrying out their research projects that are not dependent on existing corpora; instead, students will be able to collect data from a variety of sources and compile them into a properly sampled dataset. Building on a long tradition of corpus development at Lancaster University and providing specific examples from recent projects such as the British National Corpus 2014, Guangwai Lancaster Corpus of L2 Chinese or Trinity Lancaster Corpus, the module offers both theoretical knowledge and practical skills for the students to be able to build their own corpus. The topics taught in this module include, for example:

Corpus as a sample: Types of sampling, sampling frame – Corpus design: Necessary steps before data collection – Corpus development: Recording data and meta-data, data cleaning, xml conversion – Corpus annotation: Types of annotation, POS tagging and lemmatization, semantic and error tagging – Written corpus design – Spoken corpus design – Learner corpus design – Multimodal corpus design – Corpus distribution and copyright

Corpus based discourse analysis (runs January-March 2024, 2025)

This optional module offers an in-depth exploration of Corpus based discourse analysis, a prominent area of the application of the corpus method with a very long Lancaster tradition. A range of practical examples of corpus-based discourse studies across a variety of discourse domains (e.g. media discourse, healthcare-related discourse, etc.) will guide students to develop their skills in this area. The module includes an overview of different fields in which corpus-based discourse analysis can be employed, detailed discussion of linguistic and societal implications of these topics, as well as relevant social and linguistic theories. The topics taught in this module include, for example:

Key concepts and the role of corpora in discourse analysis – News discourse – Language use online – Social media discourse – Healthcare discourse – Forensic linguistics – Discourse and gender – Literary analysis – Writing up discourse analysis


Using corpora in language teaching (runs January-March 2024, 2025)

This module covers the topics and skillset related to a major application of corpora and corpus methods – that of language teaching. The module will thus be relevant for teaching of English as a foreign/second language, teaching English for academic purposes, teaching English at secondary schools (for example for English Language A-level) as well as for teaching of other languages of interest to the students. The module will also be of relevance to the writers of teaching materials. This module provides students with a solid grounding in the principles, practices and skills required for a successful integration of corpus resources into different aspects and areas of language teaching. The course combines the best practice in language teaching with the skills needed to use the state-of-the art corpus resources in the classroom to increase the effectiveness of language teaching outcomes. The topics taught in this module include, for example:

Key issues in corpus-based approaches to language teaching – Direct use of corpora in the classroom – Data-driven learning – Developing corpus-based teaching materials – Analysing learner language using corpora – Corpora in teaching vocabulary and grammar – Corpora in teaching speaking and writing skills – Corpora in teaching English for Academic Purposes – Corpora and language assessment

Statistics and data visualization (runs April-June 2024, 2025)

This core module provides an overview of the main statistical procedures used for the analysis of linguistic data and language corpora, together with examples of application of these methods. Since corpus linguistics is an essentially quantitative approach, the module will enable students to acquire theoretical knowledge of the mathematical modelling of linguistic data and of appropriate statistical tests, as well as practical skills to carry out a range of statistical analyses of linguistic (corpus) data. The module is tailor-made for linguistics students and structured according to linguistic topics and the relevant statistical methods for their analysis. The topics taught in this module include, for example:

Measures of frequency, dispersion and diversity – Meta-analysis and effect sizes – Statistics behind collocations, keywords and reliability of manual coding – Contingency tables, the chi-squared test and regression models – Correlation, cluster analysis and factor analysis – T-test, ANOVA and their non-parametric counterparts – Bootstrapping and non-parametric regression – Data visualization – Presenting statistical information in research reports