Configuration¶
Every ADELM run is controlled by a single YAML configuration file. The project
root contains a commented template config.yaml that you can copy and adapt.
See Workflow for complete workflow examples and launch patterns.
This page follows the structure of the top-level YAML template and explains each block in the same order you will see it in config.yaml.
Use it as a companion to the template: start with the shared sections, then move to the workflow-specific block that matches the script you want to run.
Warning
ADELM rejects duplicate YAML keys during config loading. If the same block
appears twice, for example two separate site_learning: mappings, ADELM
raises an error.
Section |
Description |
|---|---|
|
Model structure (soil geometry) |
|
Shared input file paths and variable mapping |
|
Per-parameter source declarations and NN architecture |
|
Site-scale forward simulation settings |
|
Site-scale learning, targets, training, cross-validation |
|
Grid-scale forward simulation settings |
Model¶
The model block defines shared land-model structure and optional scheme hooks.
model.device_id¶
model:
device_id: 0
Field |
Description |
|---|---|
|
GPU device index (0-based), or |
model.structure¶
model:
structure:
num_soil_layers: 6
num_runoff_generation_layers: 3
soil_layer_thicknesses: [0.10, 0.20, 0.30, 0.50, 0.80, 1.10] # metres
Field |
Description |
|---|---|
|
Total number of soil layers |
|
How many top layers participate in runoff generation |
|
Thickness of each layer in metres; must have exactly |
model.schemes¶
model:
schemes:
stomatal_conductance:
river_routing:
Optional process-scheme selectors reserved for future or experimental implementations. Leave empty (null) for the default scheme. These are placeholders and do not need to be set for standard runs.
Data¶
The data block collects shared input paths, static resources, and name
mappings. Multiple sections can point to the same combined NetCDF file.
data.site¶
data:
site:
drivers_path: path/to/data.nc
attris_path: path/to/data.nc
params_path: path/to/data.nc # optional
fcover_path: path/to/data.nc # optional; required for PFT-based parameters
print_driver_diagnostics: true
warn_on_suspicious_drivers: true
All files must share the same ordered site coordinate. Multiple sections can point to the same combined NetCDF file.
Field |
Description |
|---|---|
|
Time-varying meteorological forcing |
|
Static location attributes |
|
Fixed site-varying parameters |
|
Plant functional type cover fractions |
|
Print driver summary statistics after loading (default |
|
Warn when driver values look physically implausible (default |
data.grid¶
Used by the grid_simulation workflow.
data:
grid:
drivers_path:
ta_degC: "/path/to/t2m/{year:04d}{month:02d}.nc"
lai: ["/path/to/LAI_{year}_{month:02d}.nc", "LAI"]
attris_path: path/to/attris.nc
params_path: path/to/params.nc
fcover_path: path/to/fcover.nc
Field |
Description |
|---|---|
|
Per-driver monthly file templates. Each ADELM driver can point to its own file pattern. Values may be a string path template or |
|
Optional grid-scale attribute file |
|
Optional grid-scale fixed-parameter file |
|
Optional grid-scale PFT cover file |
Important
For grid drivers, ADELM still takes the default NetCDF variable name from
data.mapping.drivers. A drivers_path string only changes the file path.
If you use the two-element form
[path_template, variable_name], that explicit variable name overrides the
default for grid loading only. scale and offset still come from
data.mapping.drivers.
Site and grid workflows share the same default driver-name mapping from data.mapping.drivers.
Use a string drivers_path entry when only the file location changes.
Use [path_template, variable_name] when the grid file stores that driver under a different variable name.
data.resources¶
data:
resources:
pft_lut_path: path/to/PFT_LOOKUP.txt
Field |
Description |
|---|---|
|
Optional custom lookup table overriding the built-in PFT LUT |
data.mapping¶
Maps ADELM’s internal variable names to the names used in your NetCDF files, with an optional linear transformation.
Transformation format:
adelm_name: nc_name # no transform
adelm_name: [nc_name, scale] # value Ă— scale
adelm_name: [nc_name, scale, offset] # value Ă— scale + offset
The data.mapping block may contain:
driversparamsattrisfcover
For params, ADELM also supports per-layer configuration through params.layers.
Layer indices are 1-based, and every layer from 1 to num_soil_layers must
appear exactly once.
Site-level parameters in data.mapping.params can include fields such as
latitude_deg, longitude_deg, and mean_air_temperature.
Hint
data.mapping is the primary place to make a config dataset-specific; adapting ADELM to a new file layout is mostly a mapping exercise.
For drivers, the same mapping is shared by both site and grid workflows unless
grid loading explicitly overrides the NetCDF variable name in
data.grid.drivers_path.
This means a definition such as
ta_degC: [air_temperature_kelvin, 1.0, -273.15]
can remain the shared default mapping for both site and grid. Grid loading only
needs an explicit variable-name override when the gridded files expose a
different variable name such as t2m.
Parameterization¶
The parameterization block answers two questions:
how each physical parameter is sourced
what shared NN architecture should be used for feature-based parameterization
parameterization.parameters¶
Each entry declares how a parameter value is obtained at runtime. Four sources are available:
fixed— fixed value applied to all locations. Requiresvalue. Two forms are accepted:scalar — a single number broadcast to every site and every layer.
list — a list of exactly
num_soil_layersvalues (one per layer, top to bottom), broadcast across all sites. Useful for overriding pedotransfer-derived parameters (e.g.soil_saturated_moisture) with a known depth profile.
pft_based— value computed as a PFT-fraction-weighted average from the fcover data andPFT_LOOKUP.txt. Requiresfcover_pathto be set.nn_global— a single global scalar optimised during training. Requiresbounds: [lower, upper].nn_feature_based— a trainable parameter predicted by the MLP. Requiresbounds: [lower, upper]. Runtime behaviour depends on the target parameter shape:site-level parameters (
[n_entities]) useparameterization.nn.attri_featuresas inputslayer-wise parameters (
[n_entities, n_layers]) useparameterization.nn.attri_featuresplussoil_sand_fraction,soil_clay_fraction, andsoil_organic_matter_fraction
Static parameters loaded from data are configured under data.mapping.params,
not as a source value inside parameterization.parameters.
If a parameter is supplied through data.mapping.params, it must not also appear under parameterization.parameters.
Important
Parameter sources are exclusive. A parameter should come from exactly one place:
the registry default, an explicit parameterization.parameters entry, or
data.mapping.params.
parameterization:
parameters:
# Scalar fixed: one value broadcast to all sites and layers.
surface_emissivity:
source: fixed
value: 0.96
jarvis_temperature_optimum_bias:
source: fixed
value: 10.0
# Per-layer fixed: list of length num_soil_layers (top to bottom),
# broadcast across all sites. Overrides the pedotransfer-derived value.
soil_saturated_moisture:
source: fixed
value: [0.46, 0.44, 0.42, 0.40, 0.38, 0.37] # m3 m-3
jarvis_radiation_half_saturation:
source: nn_global
bounds: [50.0, 800.0]
photosynthesis_capacity_coefficient:
source: nn_feature_based
bounds: [5.0, 35.0]
soil_brooks_corey_b:
source: nn_feature_based
bounds: [0.1, 0.3]
Note
Layer-wise fixed parameters override the corresponding pedotransfer-derived
values. The full set of overridable derived parameters is:
soil_saturated_moisture, soil_field_capacity, soil_wilting_point,
soil_saturated_hydraulic_conductivity, soil_brooks_corey_a,
soil_brooks_corey_b, and soil_brooks_corey_bubbling_head.
If the list length does not match num_soil_layers, ADELM raises an error at
startup.
Parameters not listed here use their registered default values. Parameters with source: pft_based in the registry (e.g. canopy_height) are automatically resolved from the built-in lookup table when fcover_path is provided — no explicit entry is needed unless you want to override the source.
Each parameter uses one active source. If a parameter is not declared in
parameterization.parameters, ADELM uses its registry default source. If a
parameter is provided through data.mapping.params, it should not also be
declared in parameterization.parameters.
See also
See Model variables for the full list of registered parameters and their default values.
nn_global and nn_feature_based parameters are recomputed during each forward
pass using the current weights, so gradients flow back through the parameterization
network into the learned weights during training.
Note
nn_feature_based stays a single source name in the config for both site-wise
and layer-wise parameters.
parameterization.nn¶
Required when at least one parameter has source: nn_feature_based. This applies to both site-level and layer-wise nn_feature_based parameters.
parameterization:
nn:
attri_features:
- aridity_index
- elevation
hidden_dims: [64, 64]
dropout_rate: 0.2
Field |
Description |
|---|---|
|
List of attribute names (matching keys in |
|
Hidden layer widths, e.g. |
|
Dropout probability applied inside the MLP (default |
Site Learning¶
The site_learning block contains site selection, time window, spin-up,
outputs, training targets, optimiser settings, cross-validation, and
training-specific NN initialisation options.
site_learning.domain¶
site_learning:
domain:
selection: all
drop_invalid_sites: true
Field |
Description |
|---|---|
|
Selected sites: |
|
Whether to drop sites with invalid values in required site inputs |
site_learning.time¶
site_learning:
time:
start: 2001-01-01
end: 2005-12-31
Field |
Description |
|---|---|
|
Start of the main learning period ( |
|
End of the main learning period ( |
site_learning.spinup¶
site_learning:
spinup:
start: 2000-01-01
end: 2000-12-31
cycles: 3
Field |
Description |
|---|---|
|
Start of the spin-up window ( |
|
End of the spin-up window ( |
|
Number of spin-up repetitions |
site_learning.output_dir¶
site_learning:
output_dir: path/to/site_learning_outputs
Field |
Description |
|---|---|
|
Output directory for this site-learning run |
site_learning.targets¶
site_learning:
targets_path: path/to/data.nc
targets:
gpp_gCm2day:
mapping: observed_gpp
sample_loss_weight: 1
site_loss_weight: 1
total_et_mmday:
mapping: [observed_latent_heat_flux, 0.035274]
sample_loss_weight: 1
site_loss_weight: 1
soil_moisture:
mapping: [observed_soil_moisture_layer_1, 0.01]
layer: 1
sample_loss_weight: 1
site_loss_weight: 1
Field |
Description |
|---|---|
|
Observed learning targets. Target tensors may be |
|
Observed variable name, or |
|
Optional 1-based layer selector for layered ADELM outputs such as |
|
Unnormalised weight for this target’s sample-level loss in the total loss. |
|
Weight applied to this target’s optional site-level loss term before combining it with the sample-level loss. |
Important
site_learning.targets_path and site_learning.targets must be provided
together.
site_learning.training¶
site_learning:
training:
num_epochs: 100
lr: 0.001
seed: 42
train_chunk_size: 120
max_grad_norm: 1.0
weight_decay: 1.0e-4
debug: false
val_within_train_enabled: true
val_fraction: 0.3
early_stopping_enabled: true
early_stopping_patience: 8
early_stopping_min_delta: 0.0
reduce_lr_enabled: true
reduce_lr_patience: 3
reduce_lr_factor: 0.5
reduce_lr_min_delta: 0.0
reduce_lr_min_lr: 1.0e-6
min_sites_for_site_loss: 10
min_samples_per_site_for_site_loss: 365
Field |
Description |
|---|---|
|
Maximum number of optimisation epochs |
|
Optimiser learning rate |
|
Training seed used for NN initialisation and any in-training random splitting |
|
Number of timesteps per training chunk |
|
Gradient clipping threshold applied during training |
|
Weight decay strength controlling L2-style regularization |
|
When |
|
When |
|
Fraction of the current training targets reserved for validation when |
|
Enable early stopping based on the epoch monitor loss |
|
Patience in epochs for early stopping |
|
Minimum improvement required to reset early-stopping patience |
|
Enable learning-rate reduction on plateau |
|
Patience in epochs before reducing learning rate |
|
Multiplicative factor applied when reducing learning rate |
|
Minimum absolute monitor-loss improvement required to count as progress for the learning-rate scheduler |
|
Lower bound for the learning rate scheduler |
|
Minimum number of valid sites required before the optional site-level loss is applied |
|
Minimum number of valid samples a site must contribute before it participates in the site-level loss |
Important
When val_within_train_enabled: true, ADELM performs a fixed random split
inside the current training targets using site_learning.training.seed:
the remaining training targets are used for optimisation
the reserved validation subset is used only for epoch-level monitoring, learning-rate scheduling, and early stopping
This validation split comes from the training set. The outer cross-validation held-out fold remains reserved for final evaluation.
site_learning.cross_validation¶
site_learning:
cross_validation:
enabled: false
scheme: null # null | spatial | temporal
n_folds: 5
cv_seed: 42
spatial_mode: random # random | predefined
spatial_fold_path: null
shuffle: true
Field |
Description |
|---|---|
|
Whether to run cross-validation |
|
|
|
Number of folds |
|
Random seed used to construct CV folds. This is separate from |
|
|
|
Path to a plain-text file for predefined folds (one fold per line, format: |
|
Randomly shuffle locations before splitting (spatial CV only) |
Temporal CV divides the post-spin-up period (site_learning.time.start to site_learning.time.end) into n_folds contiguous blocks.
Each fold is run as a separate job. Select the fold to run with --fold N on the command line.
site_learning.save_final_inference¶
site_learning:
save_final_inference: true
Field |
Description |
|---|---|
|
If |
Note
This uses the final evaluation outputs already produced at the end of training.
ADELM does not launch a second site-simulation pass just to write the NetCDF.
The written file follows the same site_simulation.nc format used by the
site_simulation workflow.
site_learning.initialization¶
site_learning:
initialization:
init_nn_weights_path: path/to/nn_weights.pt
frozen_parameters:
- jarvis_vpd_sensitivity
Field |
Description |
|---|---|
|
Optional direct checkpoint file or glob pattern used to preload NN weights before training |
|
Optional list of active NN parameter names to freeze after loading |
site_learning.initialization holds training-specific NN preload and freeze
controls, separate from the shared parameterization.nn block.
init_nn_weights_path and frozen_parameters are needed only when continuing
from a previous run.
init_nn_weights_path supports two forms:
a direct file path, for example
path/to/no_cv_seed_100/nn_weights.pta glob pattern, for example
path/to/learning_outputs/**/nn_weights.pt
When a pattern is used, ADELM selects the checkpoint that matches the current workflow, fold, and seed:
no_cv_seed_{seed}spatial_cv_fold{fold}_seed_{seed}temporal_cv_fold{fold}_seed_{seed}
Hint
A glob pattern pointing at a family of learning outputs supports staged experiments without modifying config paths between runs.
When preloading and freezing:
If
init_nn_weights_pathis not set, all active NN parameters are randomly initialised.If
init_nn_weights_pathis set, ADELM tries to load matching checkpoint entries.Parameters found in the checkpoint are loaded when their tensor shapes are compatible with the current runtime model.
Parameters not found in the checkpoint are left randomly initialised.
Parameters found in the checkpoint but with incompatible shapes are skipped and left randomly initialised.
If
frozen_parametersis set, every frozen parameter must:be active in the current config
be present in the checkpoint
have a compatible checkpoint
If any frozen parameter fails those checks, ADELM raises an error and stops.
At runtime, ADELM reports:
which parameters were loaded from the checkpoint
which parameters were frozen
which parameters were skipped because of incompatible shapes
which active parameters remain randomly initialised
This information is printed to screen and also saved in the runtime summary in
the active workflow output directory, typically site_learning.output_dir.
Important
If a frozen parameter is missing from the checkpoint, or the checkpoint no longer matches the current model, ADELM stops and reports the mismatch instead of continuing with a partial freeze.
Example:
site_learning:
initialization:
init_nn_weights_path: path/to/nn_weights.pt
frozen_parameters:
- jarvis_max_stomatal_conductance
- jarvis_radiation_half_saturation
- jarvis_vpd_sensitivity
- jarvis_water_potential_midpoint
- jarvis_water_potential_steepness
Site Simulation¶
The site_simulation block contains site-selection, time window, spin-up,
site-simulation output directory, and optional pretrained NN weights for forward runs.
site_simulation.domain¶
site_simulation:
domain:
selection: all
drop_invalid_sites: true
Field |
Description |
|---|---|
|
Selected sites: |
|
Whether to drop sites with invalid values in required site inputs |
site_simulation.time¶
site_simulation:
time:
start: 2001-01-01
end: 2005-12-31
Field |
Description |
|---|---|
|
Start of the main forward-simulation period ( |
|
End of the main forward-simulation period ( |
site_simulation.spinup¶
site_simulation:
spinup:
start: 2000-01-01
end: 2000-12-31
cycles: 3
Field |
Description |
|---|---|
|
Start of the spin-up window ( |
|
End of the spin-up window ( |
|
Number of spin-up repetitions |
site_simulation.output_dir¶
site_simulation:
output_dir: path/to/site_sim_outputs
Field |
Description |
|---|---|
|
Output directory for this site-simulation run |
Note
Site simulation writes both a human-readable runtime_summary.txt and a
NetCDF export, typically site_simulation.nc, containing time-varying outputs
plus static parameters and final states.
site_simulation.nn_weights_path¶
site_simulation:
nn_weights_path: path/to/nn_weights.pt
Field |
Description |
|---|---|
|
Optional direct checkpoint file or glob pattern for pretrained NN weights used in forward simulation |
site_simulation.nn_weights_path supports two forms:
a direct file path, for example
path/to/no_cv_seed_100/nn_weights.pta glob pattern, for example
path/to/learning_outputs/**/nn_weights.pt
When a glob pattern matches multiple checkpoints, scripts/site_simulation.py
runs one simulation per checkpoint and writes each run into a subdirectory under
site_simulation.output_dir, named after the matched checkpoint’s parent
directory.
Hint
A single site-simulation config with a glob pattern fans out over multiple spatial_cv_fold*_seed* checkpoints, keeping outputs in separate subdirectories.
Example:
checkpoint match:
path/to/learning_outputs/spatial_cv_fold1_seed100/nn_weights.ptsite-simulation output root:
path/to/site_simulation_outputsresulting run directory:
path/to/site_simulation_outputs/spatial_cv_fold1_seed100/
These weights are intended for production-style forward simulation, so the learned NN parameterization structure should stay consistent with the training run that produced them.
changing fixed parameters is fine
changing a learned parameter from
nn_globaltonn_feature_based, or the reverse, will usually make the checkpoint incompatiblechanging
parameterization.nn.hidden_dimsor the feature layout used bynn_feature_basedusually makes the checkpoint incompatible
If a learned parameterization no longer matches the checkpoint structure, ADELM raises an error when weights are incompatible.
Warning
Use learned weights with the same learned parameterization structure that produced them. Changing fixed parameters is usually fine. Changing learned parameter sources, hidden dimensions, or feature layout usually is not.
Grid Simulation¶
Forward-simulation settings for grid workflows.
grid_simulation.domain¶
grid_simulation:
domain:
selection: all
Field |
Description |
|---|---|
|
Grid domain selection. Supported values are |
grid_simulation.time¶
grid_simulation:
time:
year_start: 2001
month_start: 1
year_end: 2020
month_end: 12
Field |
Description |
|---|---|
|
Start of the simulation period for grid workflows |
|
End of the simulation period for grid workflows |
grid_simulation.spinup¶
grid_simulation:
spinup:
year_start: 2000
month_start: 1
year_end: 2000
month_end: 12
cycles: 3
Field |
Description |
|---|---|
|
Start of the spin-up period for grid workflows |
|
End of the spin-up period for grid workflows |
|
Number of spin-up repetitions for grid workflows |
Grid-simulation output fields¶
grid_simulation:
grid_output_dir: path/to/grid_outputs
output_daily_vars: all
output_monthly_vars: [gpp_gCm2day, total_et_mmday]
Field |
Description |
|---|---|
|
Output directory for grid-scale forward runs |
|
|
|
|
Grid restart checkpoints are written automatically under:
grid_output_dir/checkpoints
Use the command-line --checkpoint-dir override only when you need a
different restart location for a particular run.
Note
Grid restart checkpoints are the warm-start files used by
scripts/grid_simulation.py --resume-from .... They store model states and
restart metadata so the next run can continue from the month after the saved
checkpoint. Daily and monthly NetCDF exports are for selected time-varying
variables only; restart states are not duplicated there.
If spin-up is enabled, ADELM also writes spin-up checkpoints in the same directory at the end of each spin-up month. Their names reflect the saved month and how many cycles remain after that save, for example:
spinup_200001_minus2.npzspinup_200002_minus2.npzspinup_200012_minus2.npzspinup_200012_minus1.npzspinup_200012.npz
grid_simulation.nn_weights_path¶
grid_simulation:
nn_weights_path: path/to/nn_weights.pt
Field |
Description |
|---|---|
|
Optional path to pretrained NN weights for forward simulation |
grid_simulation.nn_weights_path may be a direct file path or a glob pattern.
When a pattern matches multiple learned checkpoints, ADELM runs one grid
simulation per checkpoint and writes the results into subdirectories under
grid_output_dir, named after the matched checkpoint parent directory.
Full template example¶
The example below is included directly from the project-root config.yaml template so it stays
aligned with the codebase.
# =============================================================================
# ADELM Runtime Configuration
# =============================================================================
#
# This template reflects the proposed top-level redesign:
# - shared blocks: model, data, parameterization
# - workflow-specific blocks:
# site_simulation
# site_learning
# grid_simulation
#
# Workflow selection is intended to be decided by the script / entry point,
# not by a top-level `mode` field in the config.
#
# Copy this file and edit the paths and settings for your experiment.
# See the documentation site for a full description of every field.
# See examples/ for complete experiment configurations.
# -----------------------------------------------------------------------------
# model
# Structural settings of the land model.
#
# num_soil_layers total number of soil layers
# num_runoff_generation_layers number of (top) layers that participate in
# runoff generation
# soil_layer_thicknesses thickness of each layer in metres; must have
# exactly num_soil_layers values
# NOTE: soil evaporation is removed from the first
# layer only. Recommended first-layer thickness:
# 5-10 cm. Thinner layers deplete too quickly;
# thicker layers act as a bulk reservoir rather
# than an evaporative skin.
#
# schemes
# Optional process-scheme selectors reserved for future or experimental
# implementations.
# At present these are placeholders and should normally be left empty.
# The template only reserves the two scheme hooks most likely to be used
# first; other scheme entries may still exist in code.
# -----------------------------------------------------------------------------
model:
device_id: 0 # 0 | 1 | cpu
structure:
num_soil_layers: 6
num_runoff_generation_layers: 3
soil_layer_thicknesses: [0.10, 0.20, 0.30, 0.50, 0.80, 1.10] # metres
schemes:
stomatal_conductance:
river_routing:
# -----------------------------------------------------------------------------
# data
# Shared input datasets, static resources, and variable mappings.
#
# site
# Site-scale input datasets. In many workflows these can all point to the
# same combined NetCDF file.
#
# print_driver_diagnostics default: true
# warn_on_suspicious_drivers default: true
#
# drivers [site, time] time-varying meteorological forcings
# attris [site] static site attributes used by feature-based NN
# params [site] static site parameters supplied directly by data
# fcover [site, pft] PFT cover fractions for PFT-based parameters
#
# params_path optional
# fcover_path optional; required for PFT-based parameters
#
# grid
# Grid-scale input datasets. Used mainly in grid workflows.
#
# Preferred:
# drivers_path per-driver path templates. Each ADELM driver can point to
# its own monthly file pattern, for example:
# ta_degC: /path/to/drivers/ta_degC_{year:04d}{month:02d}.nc
# lai: [/path/to/drivers/lai_{year:04d}{month:02d}.nc, lai]
# String form means "path template only".
# Two-element form means "[path template, variable name]".
# Python-style placeholders such as {year:04d} and
# {month:02d} are preserved and used as-is.
# By default, the NetCDF variable name still comes from
# `data.mapping.drivers`. If a grid driver is written as
# [path_template, variable_name], that variable name
# overrides the default for grid loading only. Scale and
# offset still come from `data.mapping.drivers`.
#
# attris_path gridded static attributes
# params_path gridded static parameters
# fcover_path gridded PFT cover fractions
#
# resources
# Additional static resources not stored in the main data files.
# Currently only contains pft_lut_path, the optional custom PFT lookup table.
#
# mapping
# Maps ADELM variable names to dataset variable names with optional linear
# transformation:
#
# adelm_name: nc_name # no transform
# adelm_name: [nc_name, scale] # nc_value * scale
# adelm_name: [nc_name, scale, offset] # nc_value * scale + offset
#
# mapping.params
# Static parameters provided directly by data, e.g. latitude/longitude or
# layer-wise soil texture.
#
# IMPORTANT
# If a parameter is supplied via `mapping.params`, it must NOT also appear
# under `parameterization.parameters`. The two sources are treated as
# mutually exclusive and overlapping names should raise a config error.
#
# mapping.params.layers
# Layer-wise soil inputs. Layer indices are 1-based strings.
# Each model layer 1..num_soil_layers must appear in exactly one group.
#
# mapping.attris
# Free-form site attributes used by feature-based NN parameterization.
# Keys must match the names listed in `parameterization.nn.attri_features`.
#
# mapping.fcover
# PFT cover fractions used when parameters are sourced as `pft_based`.
# -----------------------------------------------------------------------------
data:
site:
drivers_path: path/to/data.nc
attris_path: path/to/data.nc
params_path: path/to/data.nc
fcover_path: path/to/data.nc # optional; required for pft_based parameters
print_driver_diagnostics: true
warn_on_suspicious_drivers: true
grid:
drivers_path:
ta_degC: "/path/to/drivers/ta_degC_{year:04d}{month:02d}.nc"
pr_mmday: "/path/to/drivers/pr_mmday_{year:04d}{month:02d}.nc"
swdown_Wm2: "/path/to/drivers/swdown_Wm2_{year:04d}{month:02d}.nc"
lwdown_Wm2: "/path/to/drivers/lwdown_Wm2_{year:04d}{month:02d}.nc"
lai: ["/path/to/drivers/lai_{year:04d}{month:02d}.nc", "lai"]
attris_path:
params_path:
fcover_path:
resources:
pft_lut_path: path/to/PFT_LOOKUP.txt
mapping:
drivers:
ta_degC: [air_temperature, 1.0, -273.15]
ta_min_degC: [air_temperature_min, 1.0, -273.15]
ta_max_degC: [air_temperature_max, 1.0, -273.15]
pr_mmday: precipitation
swdown_Wm2: [shortwave_radiation, 1.0]
lwdown_Wm2: longwave_radiation
wind_ms: wind_speed
vpd_kPa: vapour_pressure_deficit
lai: lai
co2_ppm: atmospheric_co2
params:
latitude_deg: latitude_deg
longitude_deg: longitude_deg
layers:
"1": {soil_sand_fraction: [soil_sand_fraction_layer_1, 0.01],
soil_clay_fraction: [soil_clay_fraction_layer_1, 0.01],
soil_organic_matter_fraction: [soil_organic_matter_layer_1, 0.01],
soil_bulk_density: soil_bulk_density_layer_1}
"2": {soil_sand_fraction: [soil_sand_fraction_layer_2, 0.01],
soil_clay_fraction: [soil_clay_fraction_layer_2, 0.01],
soil_organic_matter_fraction: [soil_organic_matter_layer_2, 0.01],
soil_bulk_density: soil_bulk_density_layer_2}
"3": {soil_sand_fraction: [soil_sand_fraction_layer_3, 0.01],
soil_clay_fraction: [soil_clay_fraction_layer_3, 0.01],
soil_organic_matter_fraction: [soil_organic_matter_layer_3, 0.01],
soil_bulk_density: soil_bulk_density_layer_3}
"4": {soil_sand_fraction: [soil_sand_fraction_layer_4, 0.01],
soil_clay_fraction: [soil_clay_fraction_layer_4, 0.01],
soil_organic_matter_fraction: [soil_organic_matter_layer_4, 0.01],
soil_bulk_density: soil_bulk_density_layer_4}
"5": {soil_sand_fraction: [soil_sand_fraction_layer_5, 0.01],
soil_clay_fraction: [soil_clay_fraction_layer_5, 0.01],
soil_organic_matter_fraction: [soil_organic_matter_layer_5, 0.01],
soil_bulk_density: soil_bulk_density_layer_5}
"6": {soil_sand_fraction: [soil_sand_fraction_layer_6, 0.01],
soil_clay_fraction: [soil_clay_fraction_layer_6, 0.01],
soil_organic_matter_fraction: [soil_organic_matter_layer_6, 0.01],
soil_bulk_density: soil_bulk_density_layer_6}
attris:
attr_feature_1: attr_feature_1
attr_feature_2: attr_feature_2
attr_feature_3: attr_feature_3
attr_feature_4: attr_feature_4
attr_feature_5: attr_feature_5
attr_feature_6: attr_feature_6
fcover:
BARE: [bare_fraction, 0.01]
BUILT: [built_fraction, 0.01]
GRASS-MAN: [managed_grass_fraction, 0.01]
GRASS-NAT: [natural_grass_fraction, 0.01]
SHRUBS-BD: [broadleaf_deciduous_shrub_fraction, 0.01]
SHRUBS-BE: [broadleaf_evergreen_shrub_fraction, 0.01]
SHRUBS-ND: [needleleaf_deciduous_shrub_fraction, 0.01]
SHRUBS-NE: [needleleaf_evergreen_shrub_fraction, 0.01]
SNOW-ICE: [snow_ice_fraction, 0.01]
TREES-BD: [broadleaf_deciduous_tree_fraction, 0.01]
TREES-BE: [broadleaf_evergreen_tree_fraction, 0.01]
TREES-ND: [needleleaf_deciduous_tree_fraction, 0.01]
TREES-NE: [needleleaf_evergreen_tree_fraction, 0.01]
WATER: [water_fraction, 0.01]
# -----------------------------------------------------------------------------
# parameterization
# Shared parameter-source declarations.
#
# Supported sources
# -----------------
# source: fixed
# A fixed value shared across all sites. Requires `value`.
# `value` may be:
# - a scalar: broadcast to every site and every layer
# - a list of length num_soil_layers: one value per layer, broadcast
# across all sites. Useful for overriding pedotransfer-derived
# parameters (e.g. soil_saturated_moisture) with known layer
# profiles. The list length must match num_soil_layers exactly.
#
# source: pft_based
# Resolved from `mapping.fcover` and the built-in PFT lookup table.
#
# source: nn_global
# One globally shared trainable scalar. Requires `bounds`.
#
# source: nn_feature_based
# One trainable parameter predicted by an MLP. Requires `bounds`.
# Internal behaviour depends on the target parameter shape:
# - site-wise parameters [n_site] use `parameterization.nn.attri_features`
# - siteĂ—layer parameters [n_site, n_layer] use
# `parameterization.nn.attri_features` + soil_sand_fraction +
# soil_clay_fraction + soil_organic_matter_fraction
#
# Parameters not listed here fall back to their registered ADELM defaults,
# unless they are supplied through `data.mapping.params`.
#
# nn
# Shared NN architecture settings.
# These are not training-specific.
#
# default hidden_dims: [64, 64] in code, though this template uses [16, 16]
# attri_features names of static attributes used by feature-based NNs
# hidden_dims hidden layer widths, e.g. [16, 16]
#
# Training-specific NN initialization settings such as init_nn_weights_path and
# frozen_parameters are configured under `site_learning.initialization`.
# -----------------------------------------------------------------------------
parameterization:
parameters:
# -- Fixed scalar (broadcast to all sites and layers) ---------------------
# surface_emissivity:
# source: fixed
# value: 0.96
# -- Fixed per-layer (overrides pedotransfer-derived parameters) ----------
# Provide exactly num_soil_layers values; the profile is shared across
# all sites. Only valid for layer-wise derived parameters such as
# soil_saturated_moisture, soil_field_capacity, soil_wilting_point,
# soil_saturated_hydraulic_conductivity, soil_brooks_corey_a/b/bubbling_head.
# Layer order matches soil_layer_thicknesses (top to bottom).
# soil_saturated_moisture:
# source: fixed
# value: [0.46, 0.44, 0.42, 0.40, 0.38, 0.37] # m3 m-3
# -- PFT-based parameter example ------------------------------------------
# canopy_height:
# source: pft_based
# -- Globally learned scalar example --------------------------------------
# jarvis_vpd_sensitivity:
# source: nn_global
# bounds: [0.05, 2.0]
# -- Feature-based parameter examples -------------------------------------
# jarvis_max_stomatal_conductance:
# source: nn_feature_based
# bounds: [50.0, 300.0]
# soil_brooks_corey_lambda:
# source: nn_feature_based
# bounds: [0.1, 0.3]
nn:
attri_features:
- attr_feature_1
- attr_feature_2
- attr_feature_3
- attr_feature_4
- attr_feature_5
- attr_feature_6
hidden_dims: [16, 16]
# -----------------------------------------------------------------------------
# site_simulation
# Site-scale forward simulation using fixed or pretrained parameterization.
#
# domain.selection
# default: all
# all every site in the data files
# [SITE_A, SITE_B] explicit list of site IDs
# path/to/sites.txt plain-text file, one site ID per line
#
# domain.drop_invalid_sites
# default: true
# if true, drop any site that contains NaN/Inf in drivers, attris, params,
# or fcover after loading and preprocessing
#
# time.start / time.end
# default: null
# Forward-simulation window (YYYY-MM-DD).
#
# spinup.start / spinup.end / spinup.cycles
# default: null / null / 0
# The spin-up window is repeated `cycles` times to equilibrate soil states
# before the main simulation begins.
#
# outputs.output_dir
# site-simulation-specific output directory
#
# nn_weights_path
# pretrained NN weights used for forward simulation
# can be:
# - a direct file path to nn_weights.pt
# - a glob pattern such as path/**/nn_weights.pt
# when a pattern is given, `scripts/site_simulation.py` runs one forward
# simulation per matched checkpoint and writes each run into
# output_dir/<matched_parent_dir>/
# keep the learned NN parameterization structure consistent with the
# training run that produced the checkpoint
# changing fixed parameters is fine
# changing learned parameter sources or NN architecture usually makes the
# checkpoint incompatible and should raise a strict loading error
# -----------------------------------------------------------------------------
site_simulation:
domain:
selection: # e.g., all | [SITE_A, SITE_B] | path/to/sites.txt
drop_invalid_sites: true
time:
start: # e.g., "2001-01-01"
end: # e.g., "2020-12-31"
spinup:
start: # e.g., "2000-01-01"
end: # e.g., "2000-12-31"
cycles: 0
output_dir: # e.g., path/to/site_sim_outputs
nn_weights_path: # e.g., path/to/nn_weights.pt | path/**/nn_weights.pt
# -----------------------------------------------------------------------------
# site_learning
# Site-scale calibration / parameter learning.
#
# output_dir
# calibration-specific output directory
#
# targets_path
# observation file used for calibration
#
# targets
# maps ADELM model outputs to observed variables in the target file
# `mapping` accepts a variable name or [variable_name, scale, offset]
# layered ADELM outputs can add `layer: N` (1-based)
# `sample_loss_weight` controls relative weighting of each target's
# sample-level loss across targets
# `site_loss_weight` adds an optional per-target site-mean loss term
#
# training
# num_epochs maximum training epochs
# lr initial learning rate
# seed training random seed
# also controls the internal train/val split
# when enabled
# train_chunk_size TBPTT chunk length in timesteps (days)
# max_grad_norm gradient clipping threshold
# weight_decay weight decay strength controlling L2-style regularization
# debug write debug.txt on non-finite failures
# val_within_train_enabled
# if true, reserve part of the current training
# targets for epoch-level monitoring only
# outer CV held-out data remain final evaluation
# val_fraction fraction of current training targets reserved for
# monitor_loss / monitor_r2
# early_stopping_enabled enable early stopping based on val loss
# early_stopping_patience epochs to wait before stopping
# early_stopping_min_delta absolute val-loss improvement required to
# count as progress
# reduce_lr_enabled enable ReduceLROnPlateau based on val loss
# reduce_lr_patience epochs to wait before reducing learning rate
# reduce_lr_factor multiplicative learning-rate decay factor
# reduce_lr_min_delta absolute val-loss improvement required to
# count as progress
# reduce_lr_min_lr lower bound for the learning rate
# min_sites_for_site_loss minimum number of valid sites required before the
# optional site-level loss is applied
# min_samples_per_site_for_site_loss
# minimum valid sample count required for one site
# to participate in the optional site-level loss
#
# cross_validation
# enabled whether to split into folds; default: false
# scheme null | spatial | temporal
# n_folds number of folds; default: 5
# cv_seed fold-construction seed; default: 42
# spatial_mode random | predefined
# spatial_fold_path path/to/folds.txt (predefined only)
# shuffle randomly shuffle sites before splitting; default: true
#
# nn
# training-specific NN controls
# default init_nn_weights_path: null
# default frozen_parameters : []
# init_nn_weights_path initialise and continue training from checkpoint
# can be:
# - a direct file path to nn_weights.pt
# - a glob pattern such as path/**/nn_weights.pt
# when a pattern is given, ADELM
# resolves the checkpoint using the current
# workflow / fold / seed context
# frozen_parameters freeze selected NN parameter groups after loading
#
# time.start / time.end
# default: null
# Training window (YYYY-MM-DD). Gradients are only accumulated over this
# period; the spin-up runs forward-only before it.
#
# spinup.start / spinup.end / spinup.cycles
# default: null / null / 0
# The spin-up window is repeated `cycles` times to equilibrate soil states
# before the main simulation begins.
# -----------------------------------------------------------------------------
site_learning:
domain:
selection: # e.g., all | [SITE_A, SITE_B] | path/to/sites.txt
drop_invalid_sites: true
time:
start: # e.g., "2001-01-01"
end: # e.g., "2020-12-31"
spinup:
start: # e.g., "2000-01-01"
end: # e.g., "2000-12-31"
cycles: 3
output_dir: # e.g., path/to/outputs
targets_path: # e.g., path/to/data.nc
targets:
gpp_gCm2day:
mapping: observed_gpp
sample_loss_weight: 1
site_loss_weight: 1
total_et_mmday:
mapping: [observed_latent_heat_flux, 0.035274]
sample_loss_weight: 1
site_loss_weight: 1
# soil_moisture:
# mapping: [SWC_1, 0.01]
# layer: 1
# sample_loss_weight: 1
# site_loss_weight: 1
training:
num_epochs: 100
lr: 0.001
seed: 42
train_chunk_size: 120
max_grad_norm: 1.0
weight_decay: 1.0e-4
debug: false
val_within_train_enabled: true
val_fraction: 0.3
early_stopping_enabled: true
early_stopping_patience: 8
early_stopping_min_delta: 0.0
reduce_lr_enabled: true
reduce_lr_patience: 3
reduce_lr_factor: 0.5
reduce_lr_min_delta: 0.0
reduce_lr_min_lr: 1.0e-6
min_sites_for_site_loss: 10
min_samples_per_site_for_site_loss: 365
cross_validation:
enabled: false
scheme: # e.g., null | spatial | temporal
n_folds: 5
cv_seed: 42
spatial_mode: # e.g., random | predefined
spatial_fold_path: # e.g., path/to/folds.txt (predefined only)
shuffle: true
save_final_inference: true # save final evaluation outputs as site_simulation.nc
initialization:
init_nn_weights_path: # e.g., path/to/**/nn_weights.pt
frozen_parameters:
- jarvis_vpd_sensitivity
- light_use_efficiency
# -----------------------------------------------------------------------------
# grid_simulation
# Grid-scale forward simulation.
#
# domain.selection
# default: all
# all full grid
# [lat_min, lon_min, lat_max, lon_max]
# bounding box for grid workflows
#
# grid_output_dir
# required output directory for grid forward runs
#
# checkpoints
# restart checkpoints are written automatically to
# grid_output_dir/checkpoints
#
# output_daily_vars
# [] = skip
# all = save all available outputs
# [a,b] = save named variables only
#
# output_monthly_vars
# [] = skip
# all = save all available outputs as monthly means
# [a,b] = save named variables as monthly means only
#
# nn_weights_path
# pretrained NN weights used for forward simulation
#
# time
# Main simulation period for grid workflows, specified by year/month.
# year_* should be YYYY, month_* should be 1-12.
#
# spinup
# Optional spin-up period for grid workflows, specified by year/month.
# spinup month fields also use 1-12.
# -----------------------------------------------------------------------------
grid_simulation:
domain:
selection: # e.g., all | [lat_min, lon_min, lat_max, lon_max]
time:
year_start: # e.g., 2001
month_start: # e.g., 1
year_end: # e.g., 2020
month_end: # e.g., 12
spinup:
year_start: # e.g., 2000
month_start: # e.g., 1
year_end: # e.g., 2000
month_end: # e.g., 12
cycles: 0
grid_output_dir: # e.g., path/to/grid_outputs
output_daily_vars: [] # e.g., [] | all | [var_a, var_b]
output_monthly_vars: [] # e.g., [] | all | [var_a, var_b]
nn_weights_path: # e.g., path/to/nn_weights.pt