01_setup.Rmd
Install all critical packages required in gathering and processing
the data by installing the FluxDataKit
package.
devtools::install_github(
"geco-bern/FluxDataKit"
)
In addition clone the project as we’ll need some additional scripts which are only available in the source code of the project (see data provenance below).
With all tools installed we need to download the required input data for aggregation.
A list of numbered scripts is provided in the data-raw
directory of the cloned project, and which govern the downloading and
compiling of intermediate data. The numbered scripts should be run in
sequence. All code is written in R
and should be executed
in the project environment of the FluxDataKit
package
(i.e. open the FluxDataKit.proj file in RStudio or make sure you have
set the topmost directory to the project folder).
Due to login-walls, or missing API infrastructure, most data downloads are still manual instead of automated. The below section therefore links to the locations where you can manually get the data. No fixes will be provided to facilitate this process further.
By default the data will be downloaded to a directory with the
data-raw/
base path. Before running any scripts, check if
data is written to this directory. When you want to alter the storage
location you may use soft links.
Meta-data is compiled on a site by site basis using the
01_collect_meta-data.R
script. This compiles all easily
available meta-data through either API calls or scraping the data
downloaded in the previous step (setup / data collection). In the
current script, paths are set for data stored in data-raw
.
However, you are free to move the data anywhere you like as long as you
adjust the paths in the meta-data script.
source("data-raw/01_collect_meta-data.R")
The flux data source (PLUMBER-2, Ameriflux, ICOS WarmWinter2020, or
ICOS Drought2018) is determined for each site based on which source
provides the longest data time series. Site meta information is sourced
from multiple sources to maximise available information. This is done in
scripts data-raw/01_collect_meta-data.R
and
data-raw/02_compile_final_site_list.R
.
source("data-raw/02_compile_final_site_list.R")
Additional data sources are used for compiling site meta information
in script data-raw/02_compile_final_site_list.R
.
data-raw/meta_data/sites_list_icos.csv
)data-raw/ancillary_data/koeppen_geiger/
).All ecosystem flux sources are downloaded at half-hourly (HH) rates. Data sources and final paths are listed below. Top-level paths for flux data are considered to be sufficient for further processing.
Estimated data sizes are provided to indicate network load, although for most these downloads should be easily manageable from a download and storage perspective.
We sourced data from openly available ecosystem flux data products:
Reference: https://dx.doi.org/10.25914/5fdb0902607e1.
“PLUMBER2 is a model inter-comparison project for land surface models. Multiple leading land surface and ecosystem models are evaluated for water and carbon fluxes at 170 flux tower sites, spanning multiple biomes and climate zones globally.” the full description of the dataset can be found in the publication by Ukkola et al. 2021 and combines FLUXNET2015, La Thuile and OzFlux collections.
The downloading and conversion is facilitated using a script
00_download_plumber_data.R
included in the
data-raw
directory.
This is the latest Ameriflux release, downloaded data on 14 Oct 2023 from https://ameriflux.lbl.gov/. The data should be downloaded manually from the website data portal. A login is required.
Reference: https://doi.org/10.18160/YVR0-4898.
Reference: https://doi.org/10.18160/2G60-ZHAK.
MODIS LAI/FPAR data (MCD15A2H
Collection 6.1, doi:10.5067/MODIS/MCD15A2H.061) is downloaded by an
included script, relying on the MODISTools
package.
Additional data processing steps to match the temporal resolution and
coverage of the flux data include:
R/fdk_match_modis.R
)R/fdk_match_modis.R
)fdk_smooth_ts.R
and
R/fdk_detect_outliers.R
)fdk_smooth_ts.R
)fdk_smooth_ts.R
)WARNING: The MODIS product MCD15A2H v061 is available only from 2002-07-04 to 2023-02-17. In FluxDataKit, MODIS FPAR/LAI data is extended by a mean seasonal cycle to match the flux data coverage. To retain only original MODIS data from the period covered by the data product, remove data based on the dates on your own.
MODIS data is gathered for FluxDataKit using the
fdk_download_modis()
function. This function takes a list
of sites and meta-data to download the required data using the
MODISTools
package.
With the site list generated or manually populated you can download
the data stored in the data-raw/modis/
directory as:
fdk_download_modis(
df = fdk_site_info,
path = "data-raw/modis/"
)
Consider that this will take a while and is preferably wrapped in a
script to be executed in the background. For one off sites the above
method is valid. For batch processing we refer tot the
04_download_modis_data.R
script in the
data-raw
directory.
To supplement the land surface model driver data, and the derived
rsofun input, we used the FluxnetEO
product (Walther et
al. 2022) and similarly named package.
Data is provided as zipped archives and total considerable amounts of data, which requires ample storage space. The MODIS dataset has a compressed size of ~32 GB, which results in an uncompressed size of ~68 GB. The LandSat dataset exceeds 150 GB compressed (this data will currently not be considered).
Walther, S., Besnard, S., Nelson, J. A., El-Madany, T. S., Migliavacca, M., Weber, U., Carvalhais, N., Ermida, S. L., Brümmer, C., Schrader, F., Prokushkin, A. S., Panov, A. V., and Jung, M.: Technical note: A view from space on global flux towers by MODIS and Landsat: the FluxnetEO data set, Biogeosciences, 19, 2805–2840, https://doi.org/10.5194/bg-19-2805-2022, 2022.