2 Course plan
For spring semester 2025.
This first block introduces tools and methods necessary for implementing a reproducible data science workflow in the R language, working through chapters 1 to 4 of the accompanying book Applied Geodata Science.
These chapters help get readers with a diverse background and varying data science experience up to speed with the basics for programming in R, which we rely on in later chapters.
Data wrangling introduces efficient handling and cleaning of large tabular data with the R tidyverse “programming dialect”. The focus is on non-geospatial data. Closely related to transforming data and its multiple axes of variation is data visualisation
Session 1 - 17.02. - Getting started
- Introduction to Applied Geodata Science (Chap. 1)
- The data science workflow and data science basics
- The concept of modelling
- Introduction to R language and setup of individual laptops (Chap. 2)
Session 2 - 24.02. - Programming primers
- Getting started
- RMarkdown
- Workspace management and reproducible workflows
- Introductory Programming Exercises: (Chap. 3)
Session 3 - 03.03. - Data wrangling
- Handling tabular data in R (Chap. 4)
- [Report Exercise: Tidy data]
Session 4 - 10.03. - Data visualization
- Data visualization in R (Chap. 5)
- [Report Exercise: Data visualization]
Session 5 - 17.03. - Data variety
Session 6 - 24.03. - Open science
Session 7 - 31.03. - Code management
- Code/project management with git (Chap. 8)
- [Report Exercise: Collaborating with git]
Session 8 - 07.04. - CARE SESSION I
- Work on R. Ex. 7 as team exercise
- Self-study of tutorial and exercises
Session 9 - 14.04. - Regression (Report: stepwise regr.)
- Regression and classification
- [Report Exercise: Stepwise regression]
Session 10 - 28.04. - Supervised ML I (Report: KNN)
- Supervised machine learning I
- [Report Exercise: KNN]
Session 11 - 05.05. - Supervised ML II (Report: flux modeling)
- Applications of machine learning in Geography and Earth system sciences (lecture)
- Supervised machine learning II
- [Report Exercise: Flux modeling]
Session 12 - 12.05. - Random Forest
Session 13 - 19.05. - Interpretable machine learning
Session 14 - 26.05. - CARE SESSION II
- Catch-up and support on Report Exercises