03_data_generation.Rmd
Note this routine will only run with the appropriate files in the correct place. Given the file sizes involved no demo data can be integrated into the package.
As mentioned in the introduction (once all required data are
downloaded) the FluxDataKit
package ensures the proper
compilation of rsofun
driver data. Although we will
distribute a finished dataset the below instructions allow you to
recreate these data for a particular site (or sites).
To generate consistent data from FLUXNET formatted sources we need a site list with some additional meta-data. A site list is generated using the script as described in the ‘data coverage’ vignette. We refer to this vignette to compile the list of sites which can be processed.
Once a site list has been compiled you can use it (and all the other
input data) to generated either land surface model or
rsofun
compatible datasets. Here, the former is used as a
precursor to the latter.
By default land surface model compatible data is generated using the
FluxnetLSM package. Retaining only this data can be done by specifying
the format
parameter, and setting it to “lsm”. This routine
will only save the netcdf intermediates that are otherwise used for
formatting p-model compatible data and will not include any other
ancillary data.
Meta-data requirements: Note that the sites file can be generated by other means than the included script. It only has to contain the following values: A site name (
sitename
), latitude and longitude (lat
,lon
), elevation (elv
), the start and end date of the dataset (date_start
/date_end
), the original product and data path (product
anddata_path
respectivelly, which are combined into the formal data directory), a start and end year (year_start
,year_end
), the Koeppen Geiger code for a site (koeppen_code_beck
), the water holding capacity (whc
), and the IGBP land cover class (igbp_land_use
). Routines specified in the processing scripts are there to make it easy to gather these data but users are free to compile additional data for their own use. The above fields are however required.
# load the sites to process
# as generated from scripts in `data-raw` (see github repo)
sites <- FluxDataKit::fdk_site_info |>
filter(
sitename == "FR-Fon"
) |>
mutate(
data_path = "/data/scratch/FDK_inputs/flux_data"
)
# output LSM formatted data
fdk_process_lsm(
sites,
out_path = tempdir(),
modis_path = "/data/scratch/FDK_inputs/modis",
overwrite = TRUE
)
# list generated files
list.files(tempdir(),glob2rx("*FR-Fon*.nc"), recursive = TRUE)
By default the format parameter is set to “lsm”, providing land
surface model netcdf files as output. You can specify “fluxnet” to
convert the data to rsofun
compatible FLXUNET output.
# read in demo data
# for FR-Fon site (as LSM data)
# convert to the HH fluxnet format
fluxnet <- fdk_convert_lsm(
site = "FR-Fon",
path = tempdir(),
fluxnet_format = TRUE,
out_path = tempdir()
)
You can plot conversion results to quickly inspect the results. Here
we retain the data in its original FLUXNET formatting and output the
gapfilled and amended data as a data frame. This data frame is input to
the plotting routine fdk_plot
, which returns an overview
plot to the specified out_path
directory.
# Read in and convert the data
df <- fdk_convert_lsm(
site = "FR-Fon",
path = tempdir(),
fluxnet_format = TRUE
)
# plot the returned data frame
# as a file
fdk_plot(
df,
site = "FR-Fon", # for writing things to file
out_path = tempdir(),
overwrite = TRUE
)
Fluxnet data processed to netcdf files can be converted back to FLUXNET CSV based files, with the same column naming conventions as the original files. Data however is downsampled to a daily time step, and additional variables and gap filling is retained from the above LSM based product.
It must be noted that the these daily products, although adhering to the FLUXNET naming conventions (both in filename and column names), are not equivalent to the data generated by the OneFlux processing pipeline.
# Downsample data
fdk_downsample_fluxnet(
df,
site = "FR-Fon", # a site name
out_path = tempdir(),
overwrite = TRUE
)
In addition, MODIS data can be merged from the FluxnetEO dataset using the R package with the same name. The latter ensures that rsofun driver (and target) data are amended with MODIS data for, among others, machine learning projects.
library(rsofun)
# processing of the half hourly data to
# p-model input drivers for rsofun
rsofun_data <- fdk_format_drivers(
site_info = FluxDataKit::fdk_site_info |>
filter(sitename == "FR-Fon"),
path = paste0(tempdir(),"/"),
verbose = TRUE
)
# optimized parameters from previous work
params_modl <- list(
kphio = 0.09423773,
soilm_par_a = 0.33349283,
soilm_par_b = 1.45602286,
tau_acclim_tempstress = 10,
par_shape_tempstress = 0.0
)
# run the model for these parameters
output <- rsofun::runread_pmodel_f(
rsofun_data,
par = params_modl
)
# we only have one site so we'll unnest
# the main model output
model_data <- output$data[[1]][[1]]
print(head(model_data))