WHAT IS CORPUS LINGUISTICS?

Corpus linguistics is a scientific method of language analysis which draws on empirical evidence from corpora (i.e. large electronic datasets of language samples) and automatic (computer-aided) quantitative analysis to provide description of systematic variation in language use across different domains and language users. The use of advanced quantitative methods, statistics and data visualization makes corpus approaches especially effective at identifying (complex) lexico-grammatical patterns typical of different genres, linguistics settings and user groups. As a result, corpora have been employed extensively in the description of language in professional and educational contexts, resulting in major theoretical advances in the understanding of linguistic variation in these domains with practical applications, for example, in language teaching and testing.

Corpus research has played a major role in describing the structure of the English lexicon, identifying core items in general English as well as vocabulary typical for academic domains. Wordlists based on these studies have been used widely in EAP/ESP research and practice. Corpus methods have also contributed to evidence-based description of typical forms and functions of academic genres such as narrative and argumentative writing. Finally, corpora have significantly influenced the direction of the research that describes lexical, grammatical and formulaic complexity of texts in order to establish the linguistic knowledge needed to produce or comprehend these texts. Overall, corpus-based research has provided unique data and methods that have significantly advanced our understanding of academic/specialised English, with a direct, transformative impact on the research, teaching and testing of English used in educational contexts.