1 The data

We will use a dataset of leaf nitrogen (N) content, measured in the field. The leaf N content is central for understanding the photosynthesis rates and biogeochemical cycling of N and C in terrestrial ecosystems. A rich body of literature has investigated global patterns of leaf N across the Earth’s biomes and the relationships of leaf N to environmental factors. In recent years, leaf N data collected in the field by a large number of individual campaigns, has been collated into homogenised and analysis-ready data compilations. “Small data” has been made “big”. Thanks to the fact that these data are geolocalised, covariate data from files with global coverage can be extracted and used to complement the observational leaf N data and to model leaf N on the basis of environmental covariates.

Research in the our group (GECO, Institute of Geography University of Bern) has generated such analysis-ready leaf N data, complemented with environmental covariates, and made openly accessible on GitHub.

Load the data directly from its online source on GitHub.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

df <- readr::read_csv("https://raw.githubusercontent.com/stineb/leafnp_data/main/data/leafnp_tian_et_al.csv")

Rows: 36414 Columns: 66
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): FunGroups, Dc_Db_Ec_Eb_Hf_Hg, tree_shrub_Herb, Family_New, Family,...
dbl (56): lon, lat, leafN, leafP, LeafNP, Lat_Di_check_final, Lon_Di_check_f...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

We will work with a limited subset of the variables available in the file, and with the data aggregated by sites (identified by their respective longitudes and latitudes):

leafN: leaf nitrogen content, in mass-based concentration units (gN gDM\(^{-1}\))
lon: longitude in decimal degrees east
lat: latitude in decimal degrees north
elv: Elevation above sea level (m)
mat: mean annual temperature (degrees Celsius)
map: mean annual precipitation (mm yr\(^{-1}\))
ndep: atmospheric nitrogen deposition g m\(^{-2}\) yr\(^{-1}\)
mai: mean annual daily irradiance \(\micro\)mol m\(^{-2}\) s\(^{-1}\)
Species: species name of the plant on which leaf N was measured

common_species <- df |> 
  group_by(Species) |> 
  summarise(count = n()) |> 
  arrange(desc(count)) |> 
  slice(1:50) |> 
  pull(Species)

dfs <- df |> 
  dplyr::select(leafN, lon, lat, elv, mat, map, ndep, mai, Species) |> 
  filter(Species %in% common_species)
  # group_by(lon, lat) |> 
  # summarise(across(where(is.numeric), mean))

# quick overview of data
skimr::skim(dfs)

Data summary
Name	dfs
Number of rows	22472
Number of columns	9
_______________________
Column type frequency:
character	1
numeric	8
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
Species	0	1	10	23	0	50	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
leafN	1	15.58	5.11	1.02	12.22	14.57	17.50	54.04	▂▇▂▁▁
lon	1	18.08	27.33	-157.79	5.24	13.62	19.95	140.59	▁▁▇▂▁
lat	1	48.35	9.55	-37.49	43.00	48.99	52.44	69.75	▁▁▁▆▇
elv	1	494.28	469.48	-5.00	135.00	357.00	716.00	4847.90	▇▂▁▁▁
mat	1	8.80	3.76	-4.88	6.97	8.64	10.30	29.96	▁▇▅▁▁
map	1	818.59	314.40	105.19	607.29	721.15	955.24	3641.73	▇▅▁▁▁
ndep	1	1.22	0.51	0.07	0.82	1.22	1.55	2.68	▂▅▇▃▁
mai	1	0.00	0.00	0.00	0.00	0.00	0.00	0.00	▅▇▃▁▁

# show missing data
visdat::vis_miss(dfs)