Setup and data provenance

Setup

Install all critical packages required in gathering and processing the data by installing the FluxDataKit package.

devtools::install_github(
  "geco-bern/FluxDataKit"
)

In addition clone the project as we’ll need some additional scripts which are only available in the source code of the project (see data provenance below).

  git clone https://github.com/geco-bern/FluxDataKit.git

With all tools installed we need to download the required input data for aggregation.

Data provenance

A list of numbered scripts is provided in the data-raw directory of the cloned project, and which govern the downloading and compiling of intermediate data. The numbered scripts should be run in sequence. All code is written in R and should be executed in the project environment of the FluxDataKit package (i.e. open the FluxDataKit.proj file in RStudio or make sure you have set the topmost directory to the project folder).

Due to login-walls, or missing API infrastructure, most data downloads are still manual instead of automated. The below section therefore links to the locations where you can manually get the data. No fixes will be provided to facilitate this process further.

By default the data will be downloaded to a directory with the data-raw/ base path. Before running any scripts, check if data is written to this directory. When you want to alter the storage location you may use soft links.

Site meta data

Meta-data is compiled on a site by site basis using the 01_collect_meta-data.R script. This compiles all easily available meta-data through either API calls or scraping the data downloaded in the previous step (setup / data collection). In the current script, paths are set for data stored in data-raw. However, you are free to move the data anywhere you like as long as you adjust the paths in the meta-data script.

source("data-raw/01_collect_meta-data.R")

The flux data source (PLUMBER-2, Ameriflux, ICOS WarmWinter2020, or ICOS Drought2018) is determined for each site based on which source provides the longest data time series. Site meta information is sourced from multiple sources to maximise available information. This is done in scripts data-raw/01_collect_meta-data.R and data-raw/02_compile_final_site_list.R.

source("data-raw/02_compile_final_site_list.R")

Additional site meta info sources

Additional data sources are used for compiling site meta information in script data-raw/02_compile_final_site_list.R.

Falge et al.: https://doi.org/10.3334/ORNLDAAC/1530
ICOS site list, downloaded from http://www.europe-fluxdata.eu/home/sites-list. Contained in this repo (data-raw/meta_data/sites_list_icos.csv)
Koeppen-Geiger climate zone classification. File (22 MB) contained in this repo (data-raw/ancillary_data/koeppen_geiger/).
Root zone water storage capacity based on Stocker et al., 2023. File not contained in this repo, but available on Zenodo.
ETOPO1 digital elevation model (doi:10.7289/V5C8276M). File not contained in this repo. Obtainable from https://www.ngdc.noaa.gov/mgg/global/.

Flux data

All ecosystem flux sources are downloaded at half-hourly (HH) rates. Data sources and final paths are listed below. Top-level paths for flux data are considered to be sufficient for further processing.

Estimated data sizes are provided to indicate network load, although for most these downloads should be easily manageable from a download and storage perspective.

We sourced data from openly available ecosystem flux data products:

Plumber2

Reference: https://dx.doi.org/10.25914/5fdb0902607e1.

“PLUMBER2 is a model inter-comparison project for land surface models. Multiple leading land surface and ecosystem models are evaluated for water and carbon fluxes at 170 flux tower sites, spanning multiple biomes and climate zones globally.” the full description of the dataset can be found in the publication by Ukkola et al. 2021 and combines FLUXNET2015, La Thuile and OzFlux collections.

The downloading and conversion is facilitated using a script 00_download_plumber_data.R included in the data-raw directory.

Ameriflux OneFlux

This is the latest Ameriflux release, downloaded data on 14 Oct 2023 from https://ameriflux.lbl.gov/. The data should be downloaded manually from the website data portal. A login is required.

ICOS Drought2018

Reference: https://doi.org/10.18160/YVR0-4898.

ICOS WarmWinter2020

Reference: https://doi.org/10.18160/2G60-ZHAK.

Remote sensing data

MODIS LAI/FPAR data (MCD15A2H Collection 6.1, doi:10.5067/MODIS/MCD15A2H.061) is downloaded by an included script, relying on the MODISTools package. Additional data processing steps to match the temporal resolution and coverage of the flux data include:

Bad quality data removal (R/fdk_match_modis.R)
Aggregation across nine pixels (encompassing a circle with radius of ~750 m around tower location) (R/fdk_match_modis.R)
Outlier removal (fdk_smooth_ts.R and R/fdk_detect_outliers.R)
Fill missing values by mean seasonal cycle (fdk_smooth_ts.R)
Smooth and interpolate to time step of flux data using LOESS (fdk_smooth_ts.R)

WARNING: The MODIS product MCD15A2H v061 is available only from 2002-07-04 to 2023-02-17. In FluxDataKit, MODIS FPAR/LAI data is extended by a mean seasonal cycle to match the flux data coverage. To retain only original MODIS data from the period covered by the data product, remove data based on the dates on your own.

MODISTools

MODIS data is gathered for FluxDataKit using the fdk_download_modis() function. This function takes a list of sites and meta-data to download the required data using the MODISTools package.

With the site list generated or manually populated you can download the data stored in the data-raw/modis/ directory as:

fdk_download_modis(
  df = fdk_site_info,
  path = "data-raw/modis/"
)

Consider that this will take a while and is preferably wrapped in a script to be executed in the background. For one off sites the above method is valid. For batch processing we refer tot the 04_download_modis_data.R script in the data-raw directory.

FluxnetEO

To supplement the land surface model driver data, and the derived rsofun input, we used the FluxnetEO product (Walther et al. 2022) and similarly named package.

Data is provided as zipped archives and total considerable amounts of data, which requires ample storage space. The MODIS dataset has a compressed size of ~32 GB, which results in an uncompressed size of ~68 GB. The LandSat dataset exceeds 150 GB compressed (this data will currently not be considered).

References

Walther, S., Besnard, S., Nelson, J. A., El-Madany, T. S., Migliavacca, M., Weber, U., Carvalhais, N., Ermida, S. L., Brümmer, C., Schrader, F., Prokushkin, A. S., Panov, A. V., and Jung, M.: Technical note: A view from space on global flux towers by MODIS and Landsat: the FluxnetEO data set, Biogeosciences, 19, 2805–2840, https://doi.org/10.5194/bg-19-2805-2022, 2022.

Koen Hufkens and Benjamin Stocker