We  are lucky to have access to some major digital collections and the software tools to analyse them. Most of these can be made available to students. 

Textual sources: 

We have access to newspapers and similar sources covering almost the entire period from the beginning of printing in 1473 until the 20th century. These are usually available through CQPweb (see below) to allow corpus-based analysis, rather than simply through standard web interfaces. These include:

There are also more specialised corpora such as the Newsbooks at Lancaster corpus of 17th century material. Taken together, these collections consist of over 50 billion words of text.

In addition, we have the text from the Histpop collection of the printed reports that accompanied the British and Irish censuses from 1800-1937. As well as being in CQPweb for corpus analysis, this has also been geoparsed to allow geospatial analysis.

The Corpus of Lake District writing is also available in formats that make it suitable for both corpus and geospatial analysis.

Quantitative sources:

We have access to the Great Britain Historical GIS, a major historical GIS that includes most of the recurrent census,vital registration, and other data published in the 19th and 20th centuries linked to the boundaries used to publish them.

We have similar material for Ireland. This was used in our Troubled Geographies: A spatial history of religion and society in Ireland project.

A database containing the anonymised Integrated Census Microdata (ICeM) of all of the individual-level census data from 1851 to 1911 is also available. 

Migration data from a large sample of genealogical data of migrations within Britain by people born from 1750-1930. This was the data set that was used to produce Colin Pooley and Jean Turnbull’s (1998) Migration and Mobility in Britain since the Eighteenth Century. (UCL Press). These are suitable for analysis within GIS.

Agricultural Returns data (The National Archives series MAF68) which provide detailed information on crops and livestock on a parish-by-parish basis. We have transcriptions of this for a range of counties for 1870, 1881, 1891, 1901 and 1911. These can be geo-referenced for mapping and spatial analysis.


As well as a wide range of commercial and free open source packages, we also have a range of software packages that have been developed or enhanced at Lancaster that are highly relevant to digital humanities.

  • AntConc is a free corpus analysis toolkit for concordancing and text analysis.
  • CQPweb provides a corpus analysis system that enables us to work with very large corpora, in excess of a billion words.
  • #LancsBox provides another corpus analysis program that allows the analysis of large corpora.
  • We have also developed a version of the Edinburgh Geoparser suitable for concordance geoparsing. This enables us to get textual sources into a form suitable for geospatial analysis.
  • Recogito is an online platform for the semantic annotation of place references in images, texts and tables. Exported data can be mapped or connected to geospatially related content elsewhere on the Web.
  • Peripleo is a search service for geospatially annotated online content in the Humanities.
  • WMatrix is a tool for corpus analysis and comparison that provides a web interface to the English USAS and CLAWS corpus annotation tools.
  • We also have a range of tools for conducting Geographical Text Analysis. Please contact Ian Gregory for more details.