Environmental Data Science

What is Data Science?

Data science is the science of extracting meaning from potentially complex data. This is a fast-moving field, drawing principles and techniques from a number of different disciplinary areas including computer science, statistics and complexity science. Data science is having a profound impact on a number of areas including commerce, health and smart cities.

We argue that data science can have an equal if not greater impact in the area of environmental sciences. It can offer a rich tapestry of new techniques to support both a deeper understanding of the natural environment in all its complexities. We can also address the development of well-founded mitigation and adaptation strategies in the face of climate change.

In our view, data science should be woven into the very fabric of environmental sciences going forward, as we seek a new kind of science and subsequently intellectual breakthroughs that can transform society.

A Data Science of the Natural Environment

The challenges of environmental data science are quite distinct making this an exciting field of study:

  • Looking at the 4 V’s of data science, in some areas volume and velocity dominate (cf. big data), whereas in the environmental sciences variety and veracity are equally important with the data being highly heterogeneous and complex and derived from a wide range of sources with different levels of provenance;
  • Environmental science is driven by the desire to understand the fundamental processes involved in complex Earth systems and hence it is important for data-driven understanding (from data models) to co-exist and work alongside knowledge from process modelling;
  • Earth systems are fundamentally complex systems and hence there is a need to address a range of characteristics associated with complex systems including feedback loops, the importance of extreme events, strong sensitivity to small changes in input parameters, and emergent/chaotic behaviour more generally;
  • The temporal and spatial characteristics of environmental data are extremely important and there is a need to support sophisticated reasoning across scales.