Configuration¶

Every ADELM run is controlled by a single YAML configuration file. The project root contains a commented template config.yaml that you can copy and adapt. See Workflow for complete workflow examples and launch patterns.

This page follows the structure of the top-level YAML template and explains each block in the same order you will see it in config.yaml.

Use it as a companion to the template: start with the shared sections, then move to the workflow-specific block that matches the script you want to run.

Warning

ADELM rejects duplicate YAML keys during config loading. If the same block appears twice, for example two separate site_learning: mappings, ADELM raises an error.

Section	Description
`model`	Model structure (soil geometry)
`data`	Shared input file paths and variable mapping
`parameterization`	Per-parameter source declarations and NN architecture
`site_simulation`	Site-scale forward simulation settings
`site_learning`	Site-scale learning, targets, training, cross-validation
`grid_simulation`	Grid-scale forward simulation settings

Model¶

The model block defines shared land-model structure and optional scheme hooks.

`model.device_id`¶

model:
  device_id: 0

Field	Description
`device_id`	GPU device index (0-based), or `cpu` to force CPU execution for all workflows

`model.structure`¶

model:
  structure:
    num_soil_layers: 6
    num_runoff_generation_layers: 3
    soil_layer_thicknesses: [0.10, 0.20, 0.30, 0.50, 0.80, 1.10]  # metres

Field	Description
`num_soil_layers`	Total number of soil layers
`num_runoff_generation_layers`	How many top layers participate in runoff generation
`soil_layer_thicknesses`	Thickness of each layer in metres; must have exactly `num_soil_layers` values

`model.schemes`¶

model:
  schemes:
    stomatal_conductance:
    river_routing:

Optional process-scheme selectors reserved for future or experimental implementations. Leave empty (null) for the default scheme. These are placeholders and do not need to be set for standard runs.

Data¶

The data block collects shared input paths, static resources, and name mappings. Multiple sections can point to the same combined NetCDF file.

`data.site`¶

data:
  site:
    drivers_path: path/to/data.nc
    attris_path:  path/to/data.nc
    params_path:  path/to/data.nc    # optional
    fcover_path:  path/to/data.nc    # optional; required for PFT-based parameters
    print_driver_diagnostics: true
    warn_on_suspicious_drivers: true

All files must share the same ordered site coordinate. Multiple sections can point to the same combined NetCDF file.

Field	Description
`drivers_path`	Time-varying meteorological forcing `[site, time]`
`attris_path`	Static location attributes `[site]`
`params_path`	Fixed site-varying parameters `[site]` (optional)
`fcover_path`	Plant functional type cover fractions `[site, pft]` (optional)
`print_driver_diagnostics`	Print driver summary statistics after loading (default `true`)
`warn_on_suspicious_drivers`	Warn when driver values look physically implausible (default `true`)

`data.grid`¶

Used by the grid_simulation workflow.

data:
  grid:
    drivers_path:
      ta_degC: "/path/to/t2m/{year:04d}{month:02d}.nc"
      lai: ["/path/to/LAI_{year}_{month:02d}.nc", "LAI"]
    attris_path: path/to/attris.nc
    params_path: path/to/params.nc
    fcover_path: path/to/fcover.nc

Field	Description
`drivers_path`	Per-driver monthly file templates. Each ADELM driver can point to its own file pattern. Values may be a string path template or `[path_template, variable_name]`. Python-style placeholders such as `{year:04d}` and `{month:02d}` are preserved and used as-is.
`attris_path`	Optional grid-scale attribute file
`params_path`	Optional grid-scale fixed-parameter file
`fcover_path`	Optional grid-scale PFT cover file

Important

For grid drivers, ADELM still takes the default NetCDF variable name from data.mapping.drivers. A drivers_path string only changes the file path.

If you use the two-element form [path_template, variable_name], that explicit variable name overrides the default for grid loading only. scale and offset still come from data.mapping.drivers.

Site and grid workflows share the same default driver-name mapping from data.mapping.drivers. Use a string drivers_path entry when only the file location changes. Use [path_template, variable_name] when the grid file stores that driver under a different variable name.

`data.resources`¶

data:
  resources:
    pft_lut_path: path/to/PFT_LOOKUP.txt

Field	Description
`pft_lut_path`	Optional custom lookup table overriding the built-in PFT LUT

`data.mapping`¶

Maps ADELM’s internal variable names to the names used in your NetCDF files, with an optional linear transformation.

Transformation format:

adelm_name: nc_name                   # no transform
adelm_name: [nc_name, scale]          # value × scale
adelm_name: [nc_name, scale, offset]  # value × scale + offset

The data.mapping block may contain:

drivers
params
attris
fcover

For params, ADELM also supports per-layer configuration through params.layers. Layer indices are 1-based, and every layer from 1 to num_soil_layers must appear exactly once.

Site-level parameters in data.mapping.params can include fields such as latitude_deg, longitude_deg, and mean_air_temperature.

Hint

data.mapping is the primary place to make a config dataset-specific; adapting ADELM to a new file layout is mostly a mapping exercise.

For drivers, the same mapping is shared by both site and grid workflows unless grid loading explicitly overrides the NetCDF variable name in data.grid.drivers_path.

This means a definition such as ta_degC: [air_temperature_kelvin, 1.0, -273.15] can remain the shared default mapping for both site and grid. Grid loading only needs an explicit variable-name override when the gridded files expose a different variable name such as t2m.

Parameterization¶

The parameterization block answers two questions:

how each physical parameter is sourced
what shared NN architecture should be used for feature-based parameterization

`parameterization.parameters`¶

Each entry declares how a parameter value is obtained at runtime. Four sources are available:

fixed — fixed value applied to all locations. Requires value. Two forms are accepted:
- scalar — a single number broadcast to every site and every layer.
- list — a list of exactly num_soil_layers values (one per layer, top to bottom), broadcast across all sites. Useful for overriding pedotransfer-derived parameters (e.g. soil_saturated_moisture) with a known depth profile.
pft_based — value computed as a PFT-fraction-weighted average from the fcover data and PFT_LOOKUP.txt. Requires fcover_path to be set.
nn_global — a single global scalar optimised during training. Requires bounds: [lower, upper].
nn_feature_based — a trainable parameter predicted by the MLP. Requires bounds: [lower, upper]. Runtime behaviour depends on the target parameter shape:
- site-level parameters ([n_entities]) use parameterization.nn.attri_features as inputs
- layer-wise parameters ([n_entities, n_layers]) use parameterization.nn.attri_features plus soil_sand_fraction, soil_clay_fraction, and soil_organic_matter_fraction

Static parameters loaded from data are configured under data.mapping.params, not as a source value inside parameterization.parameters.

If a parameter is supplied through data.mapping.params, it must not also appear under parameterization.parameters.

Important

Parameter sources are exclusive. A parameter should come from exactly one place: the registry default, an explicit parameterization.parameters entry, or data.mapping.params.

parameterization:
  parameters:
    # Scalar fixed: one value broadcast to all sites and layers.
    surface_emissivity:
      source: fixed
      value: 0.96

    jarvis_temperature_optimum_bias:
      source: fixed
      value: 10.0

    # Per-layer fixed: list of length num_soil_layers (top to bottom),
    # broadcast across all sites. Overrides the pedotransfer-derived value.
    soil_saturated_moisture:
      source: fixed
      value: [0.46, 0.44, 0.42, 0.40, 0.38, 0.37]   # m3 m-3

    jarvis_radiation_half_saturation:
      source: nn_global
      bounds: [50.0, 800.0]

    photosynthesis_capacity_coefficient:
      source: nn_feature_based
      bounds: [5.0, 35.0]

    soil_brooks_corey_b:
      source: nn_feature_based
      bounds: [0.1, 0.3]

Note

Layer-wise fixed parameters override the corresponding pedotransfer-derived values. The full set of overridable derived parameters is: soil_saturated_moisture, soil_field_capacity, soil_wilting_point, soil_saturated_hydraulic_conductivity, soil_brooks_corey_a, soil_brooks_corey_b, and soil_brooks_corey_bubbling_head. If the list length does not match num_soil_layers, ADELM raises an error at startup.

Parameters not listed here use their registered default values. Parameters with source: pft_based in the registry (e.g. canopy_height) are automatically resolved from the built-in lookup table when fcover_path is provided — no explicit entry is needed unless you want to override the source.

Each parameter uses one active source. If a parameter is not declared in parameterization.parameters, ADELM uses its registry default source. If a parameter is provided through data.mapping.params, it should not also be declared in parameterization.parameters.

`parameterization.nn`¶

Required when at least one parameter has source: nn_feature_based. This applies to both site-level and layer-wise nn_feature_based parameters.

parameterization:
  nn:
    attri_features:
      - aridity_index
      - elevation
    hidden_dims: [64, 64]
    dropout_rate: 0.2

Field	Description
`attri_features`	List of attribute names (matching keys in `mapping.attris`) used as MLP inputs. For layer-wise `nn_feature_based` parameters, ADELM automatically augments these with soil texture.
`hidden_dims`	Hidden layer widths, e.g. `[32, 32]` for two layers of 32 units each
`dropout_rate`	Dropout probability applied inside the MLP (default `0.2`; must be in `[0, 1)`)

Site Learning¶

The site_learning block contains site selection, time window, spin-up, outputs, training targets, optimiser settings, cross-validation, and training-specific NN initialisation options.

`site_learning.domain`¶

site_learning:
  domain:
    selection: all
    drop_invalid_sites: true

Field	Description
`selection`	Selected sites: `all`, explicit site list, or a text file path
`drop_invalid_sites`	Whether to drop sites with invalid values in required site inputs

`site_learning.time`¶

site_learning:
  time:
    start: 2001-01-01
    end: 2005-12-31

Field	Description
`start`	Start of the main learning period (`YYYY-MM-DD`)
`end`	End of the main learning period (`YYYY-MM-DD`)

`site_learning.spinup`¶

site_learning:
  spinup:
    start: 2000-01-01
    end: 2000-12-31
    cycles: 3

Field	Description
`start`	Start of the spin-up window (`YYYY-MM-DD`)
`end`	End of the spin-up window (`YYYY-MM-DD`)
`cycles`	Number of spin-up repetitions

`site_learning.output_dir`¶

site_learning:
  output_dir: path/to/site_learning_outputs

Field	Description
`output_dir`	Output directory for this site-learning run

`site_learning.targets`¶

site_learning:
  targets_path: path/to/data.nc
  targets:
    gpp_gCm2day:
      mapping: observed_gpp
      sample_loss_weight: 1
      site_loss_weight: 1
    total_et_mmday:
      mapping: [observed_latent_heat_flux, 0.035274]
      sample_loss_weight: 1
      site_loss_weight: 1
    soil_moisture:
      mapping: [observed_soil_moisture_layer_1, 0.01]
      layer: 1
      sample_loss_weight: 1
      site_loss_weight: 1

Field	Description
`targets_path`	Observed learning targets. Target tensors may be `[site, time]` or `[site, time, layer]`.
`targets.<name>.mapping`	Observed variable name, or `[variable_name, scale, offset]`.
`targets.<name>.layer`	Optional 1-based layer selector for layered ADELM outputs such as `soil_moisture`.
`targets.<name>.sample_loss_weight`	Unnormalised weight for this target’s sample-level loss in the total loss.
`targets.<name>.site_loss_weight`	Weight applied to this target’s optional site-level loss term before combining it with the sample-level loss.

Important

site_learning.targets_path and site_learning.targets must be provided together.

`site_learning.training`¶

site_learning:
  training:
    num_epochs: 100
    lr: 0.001
    seed: 42
    train_chunk_size: 120
    max_grad_norm: 1.0
    weight_decay: 1.0e-4
    debug: false
    val_within_train_enabled: true
    val_fraction: 0.3
    early_stopping_enabled: true
    early_stopping_patience: 8
    early_stopping_min_delta: 0.0
    reduce_lr_enabled: true
    reduce_lr_patience: 3
    reduce_lr_factor: 0.5
    reduce_lr_min_delta: 0.0
    reduce_lr_min_lr: 1.0e-6
    min_sites_for_site_loss: 10
    min_samples_per_site_for_site_loss: 365

Field	Description
`num_epochs`	Maximum number of optimisation epochs
`lr`	Optimiser learning rate
`seed`	Training seed used for NN initialisation and any in-training random splitting
`train_chunk_size`	Number of timesteps per training chunk
`max_grad_norm`	Gradient clipping threshold applied during training
`weight_decay`	Weight decay strength controlling L2-style regularization
`debug`	When `true`, write `debug.txt` for non-finite failures
`val_within_train_enabled`	When `true`, ADELM reserves part of the training targets for validation during training
`val_fraction`	Fraction of the current training targets reserved for validation when `val_within_train_enabled: true`
`early_stopping_enabled`	Enable early stopping based on the epoch monitor loss
`early_stopping_patience`	Patience in epochs for early stopping
`early_stopping_min_delta`	Minimum improvement required to reset early-stopping patience
`reduce_lr_enabled`	Enable learning-rate reduction on plateau
`reduce_lr_patience`	Patience in epochs before reducing learning rate
`reduce_lr_factor`	Multiplicative factor applied when reducing learning rate
`reduce_lr_min_delta`	Minimum absolute monitor-loss improvement required to count as progress for the learning-rate scheduler
`reduce_lr_min_lr`	Lower bound for the learning rate scheduler
`min_sites_for_site_loss`	Minimum number of valid sites required before the optional site-level loss is applied
`min_samples_per_site_for_site_loss`	Minimum number of valid samples a site must contribute before it participates in the site-level loss

Important

When val_within_train_enabled: true, ADELM performs a fixed random split inside the current training targets using site_learning.training.seed:

the remaining training targets are used for optimisation
the reserved validation subset is used only for epoch-level monitoring, learning-rate scheduling, and early stopping

This validation split comes from the training set. The outer cross-validation held-out fold remains reserved for final evaluation.

`site_learning.cross_validation`¶

site_learning:
  cross_validation:
    enabled: false
    scheme: null           # null | spatial | temporal
    n_folds: 5
    cv_seed: 42
    spatial_mode: random   # random | predefined
    spatial_fold_path: null
    shuffle: true

Field	Description
`enabled`	Whether to run cross-validation
`scheme`	`spatial` for site-based folds; `temporal` for contiguous time blocks
`n_folds`	Number of folds
`cv_seed`	Random seed used to construct CV folds. This is separate from `site_learning.training.seed`.
`spatial_mode`	`random` to split locations automatically; `predefined` to read folds from a file
`spatial_fold_path`	Path to a plain-text file for predefined folds (one fold per line, format: `fold_1: DE-Hai, FI-Hyy`)
`shuffle`	Randomly shuffle locations before splitting (spatial CV only)

Temporal CV divides the post-spin-up period (site_learning.time.start to site_learning.time.end) into n_folds contiguous blocks.

Each fold is run as a separate job. Select the fold to run with --fold N on the command line.

`site_learning.save_final_inference`¶

site_learning:
  save_final_inference: true

Field	Description
`save_final_inference`	If `true`, reuse the final evaluation outputs and save a `site_simulation.nc` file at the end of site learning.

Note

This uses the final evaluation outputs already produced at the end of training. ADELM does not launch a second site-simulation pass just to write the NetCDF. The written file follows the same site_simulation.nc format used by the site_simulation workflow.

`site_learning.initialization`¶

site_learning:
  initialization:
    init_nn_weights_path: path/to/nn_weights.pt
    frozen_parameters:
      - jarvis_vpd_sensitivity

Field	Description
`initialization.init_nn_weights_path`	Optional direct checkpoint file or glob pattern used to preload NN weights before training
`initialization.frozen_parameters`	Optional list of active NN parameter names to freeze after loading

site_learning.initialization holds training-specific NN preload and freeze controls, separate from the shared parameterization.nn block. init_nn_weights_path and frozen_parameters are needed only when continuing from a previous run.

init_nn_weights_path supports two forms:

a direct file path, for example path/to/no_cv_seed_100/nn_weights.pt
a glob pattern, for example path/to/learning_outputs/**/nn_weights.pt

When a pattern is used, ADELM selects the checkpoint that matches the current workflow, fold, and seed:

no_cv_seed_{seed}
spatial_cv_fold{fold}_seed_{seed}
temporal_cv_fold{fold}_seed_{seed}

Hint

A glob pattern pointing at a family of learning outputs supports staged experiments without modifying config paths between runs.

When preloading and freezing:

If init_nn_weights_path is not set, all active NN parameters are randomly initialised.
If init_nn_weights_path is set, ADELM tries to load matching checkpoint entries.
Parameters found in the checkpoint are loaded when their tensor shapes are compatible with the current runtime model.
Parameters not found in the checkpoint are left randomly initialised.
Parameters found in the checkpoint but with incompatible shapes are skipped and left randomly initialised.
If frozen_parameters is set, every frozen parameter must:
- be active in the current config
- be present in the checkpoint
- have a compatible checkpoint
If any frozen parameter fails those checks, ADELM raises an error and stops.

At runtime, ADELM reports:

which parameters were loaded from the checkpoint
which parameters were frozen
which parameters were skipped because of incompatible shapes
which active parameters remain randomly initialised

This information is printed to screen and also saved in the runtime summary in the active workflow output directory, typically site_learning.output_dir.

Important

If a frozen parameter is missing from the checkpoint, or the checkpoint no longer matches the current model, ADELM stops and reports the mismatch instead of continuing with a partial freeze.

Example:

site_learning:
  initialization:
    init_nn_weights_path: path/to/nn_weights.pt
    frozen_parameters:
      - jarvis_max_stomatal_conductance
      - jarvis_radiation_half_saturation
      - jarvis_vpd_sensitivity
      - jarvis_water_potential_midpoint
      - jarvis_water_potential_steepness

Site Simulation¶

The site_simulation block contains site-selection, time window, spin-up, site-simulation output directory, and optional pretrained NN weights for forward runs.

`site_simulation.domain`¶

site_simulation:
  domain:
    selection: all
    drop_invalid_sites: true

Field	Description
`selection`	Selected sites: `all`, explicit site list, or a text file path
`drop_invalid_sites`	Whether to drop sites with invalid values in required site inputs

`site_simulation.time`¶

site_simulation:
  time:
    start: 2001-01-01
    end: 2005-12-31

Field	Description
`start`	Start of the main forward-simulation period (`YYYY-MM-DD`)
`end`	End of the main forward-simulation period (`YYYY-MM-DD`)

`site_simulation.spinup`¶

site_simulation:
  spinup:
    start: 2000-01-01
    end: 2000-12-31
    cycles: 3

Field	Description
`start`	Start of the spin-up window (`YYYY-MM-DD`)
`end`	End of the spin-up window (`YYYY-MM-DD`)
`cycles`	Number of spin-up repetitions

`site_simulation.output_dir`¶

site_simulation:
  output_dir: path/to/site_sim_outputs

Field	Description
`output_dir`	Output directory for this site-simulation run

Note

Site simulation writes both a human-readable runtime_summary.txt and a NetCDF export, typically site_simulation.nc, containing time-varying outputs plus static parameters and final states.

`site_simulation.nn_weights_path`¶

site_simulation:
  nn_weights_path: path/to/nn_weights.pt

Field	Description
`nn_weights_path`	Optional direct checkpoint file or glob pattern for pretrained NN weights used in forward simulation

site_simulation.nn_weights_path supports two forms:

a direct file path, for example path/to/no_cv_seed_100/nn_weights.pt
a glob pattern, for example path/to/learning_outputs/**/nn_weights.pt

When a glob pattern matches multiple checkpoints, scripts/site_simulation.py runs one simulation per checkpoint and writes each run into a subdirectory under site_simulation.output_dir, named after the matched checkpoint’s parent directory.

Hint

A single site-simulation config with a glob pattern fans out over multiple spatial_cv_fold*_seed* checkpoints, keeping outputs in separate subdirectories.

Example:

checkpoint match: path/to/learning_outputs/spatial_cv_fold1_seed100/nn_weights.pt
site-simulation output root: path/to/site_simulation_outputs
resulting run directory: path/to/site_simulation_outputs/spatial_cv_fold1_seed100/

These weights are intended for production-style forward simulation, so the learned NN parameterization structure should stay consistent with the training run that produced them.

changing fixed parameters is fine
changing a learned parameter from nn_global to nn_feature_based, or the reverse, will usually make the checkpoint incompatible
changing parameterization.nn.hidden_dims or the feature layout used by nn_feature_based usually makes the checkpoint incompatible

If a learned parameterization no longer matches the checkpoint structure, ADELM raises an error when weights are incompatible.

Warning

Use learned weights with the same learned parameterization structure that produced them. Changing fixed parameters is usually fine. Changing learned parameter sources, hidden dimensions, or feature layout usually is not.

Grid Simulation¶

Forward-simulation settings for grid workflows.

`grid_simulation.domain`¶

grid_simulation:
  domain:
    selection: all

Field	Description
`selection`	Grid domain selection. Supported values are `all` or a bounding box `[lat_min, lon_min, lat_max, lon_max]`.

`grid_simulation.time`¶

grid_simulation:
  time:
    year_start: 2001
    month_start: 1
    year_end: 2020
    month_end: 12

Field	Description
`year_start` / `month_start`	Start of the simulation period for grid workflows
`year_end` / `month_end`	End of the simulation period for grid workflows

`grid_simulation.spinup`¶

grid_simulation:
  spinup:
    year_start: 2000
    month_start: 1
    year_end: 2000
    month_end: 12
    cycles: 3

Field	Description
`year_start` / `month_start`	Start of the spin-up period for grid workflows
`year_end` / `month_end`	End of the spin-up period for grid workflows
`cycles`	Number of spin-up repetitions for grid workflows

Grid-simulation output fields¶

grid_simulation:
  grid_output_dir: path/to/grid_outputs
  output_daily_vars: all
  output_monthly_vars: [gpp_gCm2day, total_et_mmday]

Field	Description
`grid_output_dir`	Output directory for grid-scale forward runs
`output_daily_vars`	`[]` to skip, `all` to save all available time-varying outputs, or `[a, b]` to save named variables only
`output_monthly_vars`	`[]` to skip, `all` to save monthly means for all available time-varying outputs, or `[a, b]` to save named variables as monthly means

Grid restart checkpoints are written automatically under:

grid_output_dir/checkpoints

Use the command-line --checkpoint-dir override only when you need a different restart location for a particular run.

Note

Grid restart checkpoints are the warm-start files used by scripts/grid_simulation.py --resume-from .... They store model states and restart metadata so the next run can continue from the month after the saved checkpoint. Daily and monthly NetCDF exports are for selected time-varying variables only; restart states are not duplicated there.

If spin-up is enabled, ADELM also writes spin-up checkpoints in the same directory at the end of each spin-up month. Their names reflect the saved month and how many cycles remain after that save, for example:

spinup_200001_minus2.npz
spinup_200002_minus2.npz
spinup_200012_minus2.npz
spinup_200012_minus1.npz
spinup_200012.npz

`grid_simulation.nn_weights_path`¶

grid_simulation:
  nn_weights_path: path/to/nn_weights.pt

Field	Description
`nn_weights_path`	Optional path to pretrained NN weights for forward simulation

grid_simulation.nn_weights_path may be a direct file path or a glob pattern. When a pattern matches multiple learned checkpoints, ADELM runs one grid simulation per checkpoint and writes the results into subdirectories under grid_output_dir, named after the matched checkpoint parent directory.

Full template example¶

The example below is included directly from the project-root config.yaml template so it stays aligned with the codebase.

# =============================================================================
# ADELM Runtime Configuration
# =============================================================================
#
# This template reflects the proposed top-level redesign:
#   - shared blocks: model, data, parameterization
#   - workflow-specific blocks:
#       site_simulation
#       site_learning
#       grid_simulation
#
# Workflow selection is intended to be decided by the script / entry point,
# not by a top-level `mode` field in the config.
#
# Copy this file and edit the paths and settings for your experiment.
# See the documentation site for a full description of every field.
# See examples/ for complete experiment configurations.


# -----------------------------------------------------------------------------
# model
# Structural settings of the land model.
#
# num_soil_layers              total number of soil layers
# num_runoff_generation_layers number of (top) layers that participate in
#                              runoff generation
# soil_layer_thicknesses       thickness of each layer in metres; must have
#                              exactly num_soil_layers values
#                              NOTE: soil evaporation is removed from the first
#                              layer only. Recommended first-layer thickness:
#                              5-10 cm. Thinner layers deplete too quickly;
#                              thicker layers act as a bulk reservoir rather
#                              than an evaporative skin.
#
# schemes
#   Optional process-scheme selectors reserved for future or experimental
#   implementations.
#   At present these are placeholders and should normally be left empty.
#   The template only reserves the two scheme hooks most likely to be used
#   first; other scheme entries may still exist in code.
# -----------------------------------------------------------------------------

model:
  device_id: 0               # 0 | 1 | cpu
  structure:
    num_soil_layers: 6
    num_runoff_generation_layers: 3
    soil_layer_thicknesses: [0.10, 0.20, 0.30, 0.50, 0.80, 1.10] # metres
  schemes:
    stomatal_conductance:
    river_routing:


# -----------------------------------------------------------------------------
# data
# Shared input datasets, static resources, and variable mappings.
#
# site
#   Site-scale input datasets. In many workflows these can all point to the
#   same combined NetCDF file.
#
#   print_driver_diagnostics   default: true
#   warn_on_suspicious_drivers default: true
#
#   drivers  [site, time]  time-varying meteorological forcings
#   attris   [site]        static site attributes used by feature-based NN
#   params   [site]        static site parameters supplied directly by data
#   fcover   [site, pft]   PFT cover fractions for PFT-based parameters
#
#   params_path  optional
#   fcover_path  optional; required for PFT-based parameters
#
# grid
#   Grid-scale input datasets. Used mainly in grid workflows.
#
#   Preferred:
#     drivers_path   per-driver path templates. Each ADELM driver can point to
#                    its own monthly file pattern, for example:
#                    ta_degC: /path/to/drivers/ta_degC_{year:04d}{month:02d}.nc
#                    lai: [/path/to/drivers/lai_{year:04d}{month:02d}.nc, lai]
#                    String form means "path template only".
#                    Two-element form means "[path template, variable name]".
#                    Python-style placeholders such as {year:04d} and
#                    {month:02d} are preserved and used as-is.
#                    By default, the NetCDF variable name still comes from
#                    `data.mapping.drivers`. If a grid driver is written as
#                    [path_template, variable_name], that variable name
#                    overrides the default for grid loading only. Scale and
#                    offset still come from `data.mapping.drivers`.
#
#   attris_path      gridded static attributes
#   params_path      gridded static parameters
#   fcover_path      gridded PFT cover fractions
#
# resources
#   Additional static resources not stored in the main data files.
#   Currently only contains pft_lut_path, the optional custom PFT lookup table.
#
# mapping
#   Maps ADELM variable names to dataset variable names with optional linear
#   transformation:
#
#   adelm_name: nc_name                  # no transform
#   adelm_name: [nc_name, scale]         # nc_value * scale
#   adelm_name: [nc_name, scale, offset] # nc_value * scale + offset
#
# mapping.params
#   Static parameters provided directly by data, e.g. latitude/longitude or
#   layer-wise soil texture.
#
# IMPORTANT
#   If a parameter is supplied via `mapping.params`, it must NOT also appear
#   under `parameterization.parameters`. The two sources are treated as
#   mutually exclusive and overlapping names should raise a config error.
#
# mapping.params.layers
#   Layer-wise soil inputs. Layer indices are 1-based strings.
#   Each model layer 1..num_soil_layers must appear in exactly one group.
#
# mapping.attris
#   Free-form site attributes used by feature-based NN parameterization.
#   Keys must match the names listed in `parameterization.nn.attri_features`.
#
# mapping.fcover
#   PFT cover fractions used when parameters are sourced as `pft_based`.
# -----------------------------------------------------------------------------

data:
  site:
    drivers_path: path/to/data.nc
    attris_path:  path/to/data.nc
    params_path:  path/to/data.nc
    fcover_path:  path/to/data.nc    # optional; required for pft_based parameters
    print_driver_diagnostics: true
    warn_on_suspicious_drivers: true

  grid:
    drivers_path:
      ta_degC: "/path/to/drivers/ta_degC_{year:04d}{month:02d}.nc"
      pr_mmday: "/path/to/drivers/pr_mmday_{year:04d}{month:02d}.nc"
      swdown_Wm2: "/path/to/drivers/swdown_Wm2_{year:04d}{month:02d}.nc"
      lwdown_Wm2: "/path/to/drivers/lwdown_Wm2_{year:04d}{month:02d}.nc"
      lai: ["/path/to/drivers/lai_{year:04d}{month:02d}.nc", "lai"]
    attris_path:
    params_path:
    fcover_path:

  resources:
    pft_lut_path: path/to/PFT_LOOKUP.txt

  mapping:
    drivers:
      ta_degC:     [air_temperature, 1.0, -273.15]
      ta_min_degC: [air_temperature_min, 1.0, -273.15]
      ta_max_degC: [air_temperature_max, 1.0, -273.15]
      pr_mmday:    precipitation
      swdown_Wm2:  [shortwave_radiation, 1.0]
      lwdown_Wm2:  longwave_radiation
      wind_ms:     wind_speed
      vpd_kPa:     vapour_pressure_deficit
      lai:         lai
      co2_ppm:     atmospheric_co2

    params:
      latitude_deg: latitude_deg
      longitude_deg: longitude_deg
      layers:
        "1": {soil_sand_fraction: [soil_sand_fraction_layer_1, 0.01],
              soil_clay_fraction: [soil_clay_fraction_layer_1, 0.01],
              soil_organic_matter_fraction: [soil_organic_matter_layer_1, 0.01],
              soil_bulk_density: soil_bulk_density_layer_1}
        "2": {soil_sand_fraction: [soil_sand_fraction_layer_2, 0.01],
              soil_clay_fraction: [soil_clay_fraction_layer_2, 0.01],
              soil_organic_matter_fraction: [soil_organic_matter_layer_2, 0.01],
              soil_bulk_density: soil_bulk_density_layer_2}
        "3": {soil_sand_fraction: [soil_sand_fraction_layer_3, 0.01],
              soil_clay_fraction: [soil_clay_fraction_layer_3, 0.01],
              soil_organic_matter_fraction: [soil_organic_matter_layer_3, 0.01],
              soil_bulk_density: soil_bulk_density_layer_3}
        "4": {soil_sand_fraction: [soil_sand_fraction_layer_4, 0.01],
              soil_clay_fraction: [soil_clay_fraction_layer_4, 0.01],
              soil_organic_matter_fraction: [soil_organic_matter_layer_4, 0.01],
              soil_bulk_density: soil_bulk_density_layer_4}
        "5": {soil_sand_fraction: [soil_sand_fraction_layer_5, 0.01],
              soil_clay_fraction: [soil_clay_fraction_layer_5, 0.01],
              soil_organic_matter_fraction: [soil_organic_matter_layer_5, 0.01],
              soil_bulk_density: soil_bulk_density_layer_5}
        "6": {soil_sand_fraction: [soil_sand_fraction_layer_6, 0.01],
              soil_clay_fraction: [soil_clay_fraction_layer_6, 0.01],
              soil_organic_matter_fraction: [soil_organic_matter_layer_6, 0.01],
              soil_bulk_density: soil_bulk_density_layer_6}

    attris:
      attr_feature_1: attr_feature_1
      attr_feature_2: attr_feature_2
      attr_feature_3: attr_feature_3
      attr_feature_4: attr_feature_4
      attr_feature_5: attr_feature_5
      attr_feature_6: attr_feature_6

    fcover:
      BARE:      [bare_fraction, 0.01]
      BUILT:     [built_fraction, 0.01]
      GRASS-MAN: [managed_grass_fraction, 0.01]
      GRASS-NAT: [natural_grass_fraction, 0.01]
      SHRUBS-BD: [broadleaf_deciduous_shrub_fraction, 0.01]
      SHRUBS-BE: [broadleaf_evergreen_shrub_fraction, 0.01]
      SHRUBS-ND: [needleleaf_deciduous_shrub_fraction, 0.01]
      SHRUBS-NE: [needleleaf_evergreen_shrub_fraction, 0.01]
      SNOW-ICE:  [snow_ice_fraction, 0.01]
      TREES-BD:  [broadleaf_deciduous_tree_fraction, 0.01]
      TREES-BE:  [broadleaf_evergreen_tree_fraction, 0.01]
      TREES-ND:  [needleleaf_deciduous_tree_fraction, 0.01]
      TREES-NE:  [needleleaf_evergreen_tree_fraction, 0.01]
      WATER:     [water_fraction, 0.01]


# -----------------------------------------------------------------------------
# parameterization
# Shared parameter-source declarations.
#
# Supported sources
# -----------------
#   source: fixed
#     A fixed value shared across all sites. Requires `value`.
#     `value` may be:
#       - a scalar: broadcast to every site and every layer
#       - a list of length num_soil_layers: one value per layer, broadcast
#         across all sites. Useful for overriding pedotransfer-derived
#         parameters (e.g. soil_saturated_moisture) with known layer
#         profiles. The list length must match num_soil_layers exactly.
#
#   source: pft_based
#     Resolved from `mapping.fcover` and the built-in PFT lookup table.
#
#   source: nn_global
#     One globally shared trainable scalar. Requires `bounds`.
#
#   source: nn_feature_based
#     One trainable parameter predicted by an MLP. Requires `bounds`.
#     Internal behaviour depends on the target parameter shape:
#       - site-wise parameters [n_site] use `parameterization.nn.attri_features`
#       - site×layer parameters [n_site, n_layer] use
#         `parameterization.nn.attri_features` + soil_sand_fraction +
#         soil_clay_fraction + soil_organic_matter_fraction
#
# Parameters not listed here fall back to their registered ADELM defaults,
# unless they are supplied through `data.mapping.params`.
#
# nn
#   Shared NN architecture settings.
#   These are not training-specific.
#
#   default hidden_dims: [64, 64] in code, though this template uses [16, 16]
#   attri_features  names of static attributes used by feature-based NNs
#   hidden_dims     hidden layer widths, e.g. [16, 16]
#
# Training-specific NN initialization settings such as init_nn_weights_path and
# frozen_parameters are configured under `site_learning.initialization`.
# -----------------------------------------------------------------------------

parameterization:
  parameters:

    # -- Fixed scalar (broadcast to all sites and layers) ---------------------
    # surface_emissivity:
    #   source: fixed
    #   value: 0.96

    # -- Fixed per-layer (overrides pedotransfer-derived parameters) ----------
    # Provide exactly num_soil_layers values; the profile is shared across
    # all sites. Only valid for layer-wise derived parameters such as
    # soil_saturated_moisture, soil_field_capacity, soil_wilting_point,
    # soil_saturated_hydraulic_conductivity, soil_brooks_corey_a/b/bubbling_head.
    # Layer order matches soil_layer_thicknesses (top to bottom).
    # soil_saturated_moisture:
    #   source: fixed
    #   value: [0.46, 0.44, 0.42, 0.40, 0.38, 0.37]   # m3 m-3

    # -- PFT-based parameter example ------------------------------------------
    # canopy_height:
    #   source: pft_based

    # -- Globally learned scalar example --------------------------------------
    # jarvis_vpd_sensitivity:
    #   source: nn_global
    #   bounds: [0.05, 2.0]

    # -- Feature-based parameter examples -------------------------------------
    # jarvis_max_stomatal_conductance:
    #   source: nn_feature_based
    #   bounds: [50.0, 300.0]
    # soil_brooks_corey_lambda:
    #   source: nn_feature_based
    #   bounds: [0.1, 0.3]

  nn:
    attri_features:
      - attr_feature_1
      - attr_feature_2
      - attr_feature_3
      - attr_feature_4
      - attr_feature_5
      - attr_feature_6
    hidden_dims: [16, 16]


# -----------------------------------------------------------------------------
# site_simulation
# Site-scale forward simulation using fixed or pretrained parameterization.
#
# domain.selection
#   default: all
#   all                        every site in the data files
#   [SITE_A, SITE_B]           explicit list of site IDs
#   path/to/sites.txt          plain-text file, one site ID per line
#
# domain.drop_invalid_sites
#   default: true
#   if true, drop any site that contains NaN/Inf in drivers, attris, params,
#   or fcover after loading and preprocessing
#
# time.start / time.end
#   default: null
#   Forward-simulation window (YYYY-MM-DD).
#
# spinup.start / spinup.end / spinup.cycles
#   default: null / null / 0
#   The spin-up window is repeated `cycles` times to equilibrate soil states
#   before the main simulation begins.
#
# outputs.output_dir
#   site-simulation-specific output directory
#
# nn_weights_path
#   pretrained NN weights used for forward simulation
#   can be:
#     - a direct file path to nn_weights.pt
#     - a glob pattern such as path/**/nn_weights.pt
#   when a pattern is given, `scripts/site_simulation.py` runs one forward
#   simulation per matched checkpoint and writes each run into
#   output_dir/<matched_parent_dir>/
#   keep the learned NN parameterization structure consistent with the
#   training run that produced the checkpoint
#   changing fixed parameters is fine
#   changing learned parameter sources or NN architecture usually makes the
#   checkpoint incompatible and should raise a strict loading error
# -----------------------------------------------------------------------------

site_simulation:
  domain:
    selection:                 # e.g., all | [SITE_A, SITE_B] | path/to/sites.txt
    drop_invalid_sites: true

  time:
    start:                     # e.g., "2001-01-01"
    end:                       # e.g., "2020-12-31"

  spinup:
    start:                     # e.g., "2000-01-01"
    end:                       # e.g., "2000-12-31"
    cycles: 0

  output_dir:                 # e.g., path/to/site_sim_outputs

  nn_weights_path:            # e.g., path/to/nn_weights.pt | path/**/nn_weights.pt


# -----------------------------------------------------------------------------
# site_learning
# Site-scale calibration / parameter learning.
#
# output_dir
#   calibration-specific output directory
#
# targets_path
#   observation file used for calibration
#
# targets
#   maps ADELM model outputs to observed variables in the target file
#   `mapping` accepts a variable name or [variable_name, scale, offset]
#   layered ADELM outputs can add `layer: N` (1-based)
#   `sample_loss_weight` controls relative weighting of each target's
#   sample-level loss across targets
#   `site_loss_weight` adds an optional per-target site-mean loss term
#
# training
#   num_epochs               maximum training epochs
#   lr                       initial learning rate
#   seed                     training random seed
#                            also controls the internal train/val split
#                            when enabled
#   train_chunk_size         TBPTT chunk length in timesteps (days)
#   max_grad_norm            gradient clipping threshold
#   weight_decay             weight decay strength controlling L2-style regularization
#   debug                    write debug.txt on non-finite failures
#   val_within_train_enabled
#                            if true, reserve part of the current training
#                            targets for epoch-level monitoring only
#                            outer CV held-out data remain final evaluation
#   val_fraction         fraction of current training targets reserved for
#                            monitor_loss / monitor_r2
#   early_stopping_enabled   enable early stopping based on val loss
#   early_stopping_patience  epochs to wait before stopping
#   early_stopping_min_delta absolute val-loss improvement required to
#                            count as progress
#   reduce_lr_enabled        enable ReduceLROnPlateau based on val loss
#   reduce_lr_patience       epochs to wait before reducing learning rate
#   reduce_lr_factor         multiplicative learning-rate decay factor
#   reduce_lr_min_delta      absolute val-loss improvement required to
#                            count as progress
#   reduce_lr_min_lr         lower bound for the learning rate
#   min_sites_for_site_loss  minimum number of valid sites required before the
#                            optional site-level loss is applied
#   min_samples_per_site_for_site_loss
#                            minimum valid sample count required for one site
#                            to participate in the optional site-level loss
#
# cross_validation
#   enabled                  whether to split into folds; default: false
#   scheme                   null | spatial | temporal
#   n_folds                  number of folds; default: 5
#   cv_seed                  fold-construction seed; default: 42
#   spatial_mode             random | predefined
#   spatial_fold_path        path/to/folds.txt (predefined only)
#   shuffle                  randomly shuffle sites before splitting; default: true
#
# nn
#   training-specific NN controls
#   default init_nn_weights_path: null
#   default frozen_parameters   : []
#   init_nn_weights_path     initialise and continue training from checkpoint
#                             can be:
#                               - a direct file path to nn_weights.pt
#                               - a glob pattern such as path/**/nn_weights.pt
#                             when a pattern is given, ADELM
#                             resolves the checkpoint using the current
#                             workflow / fold / seed context
#   frozen_parameters        freeze selected NN parameter groups after loading
#
# time.start / time.end
#   default: null
#   Training window (YYYY-MM-DD). Gradients are only accumulated over this
#   period; the spin-up runs forward-only before it.
#
# spinup.start / spinup.end / spinup.cycles
#   default: null / null / 0
#   The spin-up window is repeated `cycles` times to equilibrate soil states
#   before the main simulation begins.
# -----------------------------------------------------------------------------

site_learning:
  domain:
    selection:                 # e.g., all | [SITE_A, SITE_B] | path/to/sites.txt
    drop_invalid_sites: true

  time:
    start:                     # e.g., "2001-01-01"
    end:                       # e.g., "2020-12-31"

  spinup:
    start:                     # e.g., "2000-01-01"
    end:                       # e.g., "2000-12-31"
    cycles: 3

  output_dir:                  # e.g., path/to/outputs

  targets_path:                # e.g., path/to/data.nc

  targets:
    gpp_gCm2day:
      mapping: observed_gpp
      sample_loss_weight: 1
      site_loss_weight: 1
    total_et_mmday:
      mapping: [observed_latent_heat_flux, 0.035274]
      sample_loss_weight: 1
      site_loss_weight: 1
    # soil_moisture:
    #   mapping: [SWC_1, 0.01]
    #   layer: 1
    #   sample_loss_weight: 1
    #   site_loss_weight: 1

  training:
    num_epochs: 100
    lr: 0.001
    seed: 42
    train_chunk_size: 120
    max_grad_norm: 1.0
    weight_decay: 1.0e-4
    debug: false
    val_within_train_enabled: true
    val_fraction: 0.3
    early_stopping_enabled: true
    early_stopping_patience: 8
    early_stopping_min_delta: 0.0
    reduce_lr_enabled: true
    reduce_lr_patience: 3
    reduce_lr_factor: 0.5
    reduce_lr_min_delta: 0.0
    reduce_lr_min_lr: 1.0e-6
    min_sites_for_site_loss: 10
    min_samples_per_site_for_site_loss: 365

  cross_validation:
    enabled: false
    scheme:                    # e.g., null | spatial | temporal
    n_folds: 5
    cv_seed: 42
    spatial_mode:              # e.g., random | predefined
    spatial_fold_path:         # e.g., path/to/folds.txt (predefined only)
    shuffle: true

  save_final_inference: true  # save final evaluation outputs as site_simulation.nc

  initialization:
    init_nn_weights_path:      # e.g., path/to/**/nn_weights.pt
    frozen_parameters:
      - jarvis_vpd_sensitivity
      - light_use_efficiency


# -----------------------------------------------------------------------------
# grid_simulation
# Grid-scale forward simulation.
#
# domain.selection
#   default: all
#   all                        full grid
#   [lat_min, lon_min, lat_max, lon_max]
#                              bounding box for grid workflows
#
# grid_output_dir
#   required output directory for grid forward runs
#
# checkpoints
#   restart checkpoints are written automatically to
#   grid_output_dir/checkpoints
#
# output_daily_vars
#   []    = skip
#   all   = save all available outputs
#   [a,b] = save named variables only
#
# output_monthly_vars
#   []    = skip
#   all   = save all available outputs as monthly means
#   [a,b] = save named variables as monthly means only
#
# nn_weights_path
#   pretrained NN weights used for forward simulation
#
# time
#   Main simulation period for grid workflows, specified by year/month.
#   year_* should be YYYY, month_* should be 1-12.
#
# spinup
#   Optional spin-up period for grid workflows, specified by year/month.
#   spinup month fields also use 1-12.
# -----------------------------------------------------------------------------

grid_simulation:
  domain:
    selection:                 # e.g., all | [lat_min, lon_min, lat_max, lon_max]

  time:
    year_start:                # e.g., 2001
    month_start:               # e.g., 1
    year_end:                  # e.g., 2020
    month_end:                 # e.g., 12

  spinup:
    year_start:                # e.g., 2000
    month_start:               # e.g., 1
    year_end:                  # e.g., 2000
    month_end:                 # e.g., 12
    cycles: 0

  grid_output_dir:            # e.g., path/to/grid_outputs
  output_daily_vars: []       # e.g., [] | all | [var_a, var_b]
  output_monthly_vars: []     # e.g., [] | all | [var_a, var_b]

  nn_weights_path:            # e.g., path/to/nn_weights.pt

Configuration¶

Model¶

model.device_id¶

model.structure¶

model.schemes¶

Data¶

data.site¶

data.grid¶

data.resources¶

data.mapping¶

Parameterization¶

parameterization.parameters¶

parameterization.nn¶

Site Learning¶

site_learning.domain¶

site_learning.time¶

site_learning.spinup¶

site_learning.output_dir¶

site_learning.targets¶

site_learning.training¶

site_learning.cross_validation¶

site_learning.save_final_inference¶

site_learning.initialization¶

Site Simulation¶

site_simulation.domain¶

site_simulation.time¶

site_simulation.spinup¶

site_simulation.output_dir¶

site_simulation.nn_weights_path¶

Grid Simulation¶

grid_simulation.domain¶

grid_simulation.time¶

grid_simulation.spinup¶

Grid-simulation output fields¶

grid_simulation.nn_weights_path¶

Full template example¶

`model.device_id`¶

`model.structure`¶

`model.schemes`¶

`data.site`¶

`data.grid`¶

`data.resources`¶

`data.mapping`¶

`parameterization.parameters`¶

`parameterization.nn`¶

`site_learning.domain`¶

`site_learning.time`¶

`site_learning.spinup`¶

`site_learning.output_dir`¶

`site_learning.targets`¶

`site_learning.training`¶

`site_learning.cross_validation`¶

`site_learning.save_final_inference`¶

`site_learning.initialization`¶

`site_simulation.domain`¶

`site_simulation.time`¶

`site_simulation.spinup`¶

`site_simulation.output_dir`¶

`site_simulation.nn_weights_path`¶

`grid_simulation.domain`¶

`grid_simulation.time`¶

`grid_simulation.spinup`¶

`grid_simulation.nn_weights_path`¶