2 Course plan

For spring semester 2025.

This first block introduces tools and methods necessary for implementing a reproducible data science workflow in the R language, working through chapters 1 to 4 of the accompanying book Applied Geodata Science.

These chapters help get readers with a diverse background and varying data science experience up to speed with the basics for programming in R, which we rely on in later chapters.

Data wrangling introduces efficient handling and cleaning of large tabular data with the R tidyverse “programming dialect”. The focus is on non-geospatial data. Closely related to transforming data and its multiple axes of variation is data visualisation

Session 1 - 17.02. - Getting started

Introduction to Applied Geodata Science (Chap. 1)
- The data science workflow and data science basics
- The concept of modelling
- Introduction to R language and setup of individual laptops (Chap. 2)

Session 2 - 24.02. - Programming primers

Getting started
- RMarkdown
- Workspace management and reproducible workflows
Introductory Programming Exercises: (Chap. 3)

Session 3 - 03.03. - Data wrangling

Handling tabular data in R (Chap. 4)
[Report Exercise: Tidy data]

Session 4 - 10.03. - Data visualization

Data visualization in R (Chap. 5)
[Report Exercise: Data visualization]

Session 5 - 17.03. - Data variety

Data variety (Chap. 6)

Session 6 - 24.03. - Open science

Open and reproducible science (Chap. 7)

Session 7 - 31.03. - Code management

Code/project management with git (Chap. 8)
[Report Exercise: Collaborating with git]

Session 8 - 07.04. - CARE SESSION I

Work on R. Ex. 7 as team exercise
Self-study of tutorial and exercises

Session 9 - 14.04. - Regression (Report: stepwise regr.)

Regression and classification
[Report Exercise: Stepwise regression]

Session 10 - 28.04. - Supervised ML I (Report: KNN)

Supervised machine learning I
[Report Exercise: KNN]

Session 11 - 05.05. - Supervised ML II (Report: flux modeling)

Applications of machine learning in Geography and Earth system sciences (lecture)
Supervised machine learning II
[Report Exercise: Flux modeling]

Session 12 - 12.05. - Random Forest

Random Forest

Session 13 - 19.05. - Interpretable machine learning

Interpretable machine learning

Session 14 - 26.05. - CARE SESSION II

Catch-up and support on Report Exercises