About this book

This book serves as the basis for the series of courses in Applied Geodata Science, taught at the Institute of Geography, University of Bern. The starting point of this book were the tutorials edited by Benjamin Stocker, Loïc Pellissier, and Joshua Payne for the course Environmental Systems Data Science (D-USYS, ETH Zürich). The present book was written as a collaborative effort led by Benjamin Stocker, with contributions by Pepa Arán and Koen Hufkens, and exercises by Pascal Schneider.

The target of this book are people interested in applying data science methods for research. Methods, example data sets, and prediction challenges are chosen to make the book most relatable to scientists and students in Geography and Environmental Sciences. No prior knowledge of coding is required. Respective essentials are briefly introduced as primers. The focus of this book is not on the theoretical basis of the methods. Other “classical” statistics courses serve this purpose. Instead, this book introduces essential concepts, methods, and tools for applied data science in Geography and Environmental Sciences with an emphasis on covering a wide breadth. It is written with a hands-on approach using the R programming language and should enable an intuitive understanding of concepts with only a minimal reliance on mathematical language. Worked examples are provided for typical steps of data science applications in Geography and Environmental Sciences. The aim of this book is to teach the diverse set of skills needed as a basis for data-intensive research in academia and outside.

We also use this book as a reference and on-boarding resource for group members of Geocomputation and Earth Observation (GECO), at the Institute of Geography, University of Bern.


Images and other materials used here were made available under non-restrictive licenses. Original sources are attributed. Content without attribution is our own and shared under the license below. If there are any errors or any content you find concerning with regard to licensing or other, please contact us or report an issue. Any feedback, positive or negative, is welcome.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

How to cite this book

Benjamin Stocker, Koen Hufkens, Pepa Arán, & Pascal Schneider. (2023). Applied Geodata Science (v1.0). Zenodo. DOI

About this course

This book contains the lecture notes and exercises for the following courses, offered for Geography and for Climate Sciences students at the University of Bern, Switzerland:

Applied Geodata Science I

This is a course for Bachelors students in their second or third year of studies in Geography.

  • Chapters 1 to Chapter 11

Applied Geodata Science II

This is a course for Master students in Geography or Climate Sciences. Chapters are in development.

Course goal

The overall goal of this set of courses is that students and other readers learn to tell a story with (environmental and geo-) data.

Learning Objectives

The overall learning objectives are:

  • Design and communicate your research project as a reproducible workflow.
    • Find, access, process, and visualise large environmental and geographic data.
    • Write legible code and manage collaborative code and data-centered projects.
    • Manage analysis code for long-term reproducibility.
  • Identify, quantify, and interpret patterns in large environmental and geographic data.
    • Devise suitable data visualisations.
    • Determine suitable model formulations and implement effective model training.
    • Describe the challenges of model fitting with large data.
  • Implement and make use of Open Science practices and resources to support data science projects in Geography and Environmental Sciences.

Course contents

This course covers all steps along the data science workflow (see Fig. 0.1) and introduces methods and tools to learn the most from data, to effectively communicate insights, and to make your workflow reproducible. By following this course, you will be well equipped for joining the Open Science movement.

The data science workflow and keywords of contents covered in Applied Geodata Science I. Figure adapted from: [Wickham and Grolemund *R for Data Science*](https://r4ds.had.co.nz/index.html)

Figure 0.1: The data science workflow and keywords of contents covered in Applied Geodata Science I. Figure adapted from: Wickham and Grolemund R for Data Science

This chapter starts by providing the context for this course: Why Applied Geodata Science? Why now?

Chapters 1 and 2 serve as primers to get readers with a diverse background and varying data science experience up to speed with the basics for programming in R, which we rely on in later chapters.

Chapter 3 introduces efficient handling and cleaning of large tabular data with the R tidyverse “programming dialect”. The focus is on non-geospatial data. Closely related to transforming data and its multiple axes of variation is data visualisation, covered in Chapter 4.

Chapters 5, 7, and 6 introduce essential tools for the daily work with diverse data, for collaborative code development, and for an Open Science practice.

With Chapters 8, Chapter 9, Chapter 10, and Chapter 11, we will get into modelling and identifying patterns in the data.

Chapters 1-11 serve as lecture notes for Applied Geodata Science I and as learning material for students and scientists in any data-intensive research domain. These chapters are not explicitly dealing with geospatial data and modelling. Modelling with geospatial and temporal data is the subject of the course Applied Geodata Science II and will be introduced with a focus on typical applications and modelling tasks in Geography and Environmental Sciences. Respective materials are not currently contained in this book but will be added here later.

All tutorials use the R programming language, and a full list of the packages used in this course are provided in Appendix B.