EMI CORPUS

The EMI corpus consists of two components: i) EMI Corpus of student academic writing, and ii)  EMI Corpus of student readings. The data for the corpus were collected from students at eight universities: Lancaster University (UK), Vienna University of Business and Economics (Austria), University of Milan (Italy), University of Turin (Italy), Xi’an Jiaotong University (China), Xi’an Jiaotong-Liverpool University (China), Prince of Songkla University (Thailand) and Thammasat University (Thailand).

EMI Corpus of Student Academic Writing

The corpus: The corpus contains over 4.5 million words from 2,000 pieces of student academic writing. It represents writing in three major disciplinary areas: i) Humanities and Social Science (over 700 texts), ii) Science and Technology (over 500 texts), and iii) Business and Management (over 600 texts). These disciplinary areas contain writing from disciplines such as:

Humanities and Social Science: Education, English literature, Creative      writing, Law, Philosophy, History, Sociology and others.

Science and Technology: Engineering, Physics, Chemistry, Biology, Medicine, Statistics, Computer science and other.

Business and Management: Management Science, Finance, Accounting, Marketing, Business analytics and others.

Level of writing: The majority of the corpus of student academic writing represents postgraduate level of study (e.g. MA, MSc, etc). The corpus contains a balanced component of both undergraduate and postgraduate writing for the disciplinary area of Business and Management.

Additional data: In addition to the writing samples, the corpus also contains the following information (metadata) about the writers and the pieces of writing:

Writers – social and linguistic characteristics: age, gender, L1 background, English proficiency (if applicable), years spent in an English speaking country, years spent learning English, the programme of study, the level of study (e.g. undergraduate, postgraduate, PhD).

Writers – academic habits: how much time they spent reading and writing in English, what type of reading and writing they typically conduct on weekly basis, what academic support (if any) they have accessed, academic reading and writing in L1.

Writing samples: the course and programme the piece of writing was produced for, the text length, genre, the  instructions for the pieces, the mark received for the writing, the perceived difficulty of the assignment by the writer.

More information about the corpus constructions and decisions involved in building it can be found in the following article:

Gablasova, D., Harding, L., Bottini, R., Brezina, V., Ren, H. S., Iamartino, G.,  Li, Y., Liu, T., Poggesi, L., Savski, K., Toomaneejinda, A. & Zottola, A. (2024). Building a corpus of student academic writing in EMI contexts: Challenges in corpus design and data collection across international higher education settings. Research Methods in Applied Linguistics3(3), 100140.

 

EMI Corpus of Student Academic Readings

The Corpus currently contains over 10 million words of students readings from different disciplinary areas collected at the eight universities. The data is being currently processed to be incorporated into a searchable format.