From 28a15b489437036c2cf8ac963a83a1cf9e679460 Mon Sep 17 00:00:00 2001 From: radiradev Date: Tue, 9 Jun 2026 12:36:53 +0200 Subject: [PATCH 1/5] Add realv2 (diagnostic VMAX_10M) multi-output forecaster showcase Support running the multi-output "realv2" anemoi architecture (ICON-CH1 cutout forecaster with a diagnostic VMAX_10M stream on the REAL-CH1 / ICON-CH1 1km grid) on top of current main: - config/windgust.yaml: showcase + experiment config. extra_requirements pins an anemoi-inference fork branch (radiradev/anemoi-inference@fix/empty-input-propagate-date) carrying the two fixes this checkpoint needs (EmptyInput date propagation + format_dataset_name output-path padding), so no runtime source patching is needed. The showcase section drives the VMAX_10M animation over the Alpine (icon-ch / switzerland) domains via the configurable-showcase mechanism. - resources/inference: realv2 inference config, metadata patch, VMAX_10M GRIB template + templates index, and a VMAX_10M template generation step. - src/plotting: VMAX_10M colormap; redirect VMAX_10M loads to the sibling realv2-*.grib output file. - workflow/rules/plot.smk: treat VMAX_10M (period maximum) like accumulations and skip lead time 0. - docs/realv2_vmax10m.md: notes on the realv2 showcase. Verified: `evalml showcase` and `evalml experiment` both build a complete DAG against this config (--dry-run). --- .gitignore | 3 + config/windgust.yaml | 88 ++ docs/realv2_vmax10m.md | 120 +++ ...m-multidataset-forecaster-realv2-ich1.yaml | 76 ++ .../metadata/sgm-realv2-ich1-patch.yaml | 763 ++++++++++++++++++ .../icon-ch1-shortName=VMAX_10M.grib | Bin 0 -> 203 bytes .../templates/icon-ch1_generate_templates.sh | 8 + .../templates/templates_index_realch1.yaml | 5 + src/plotting/colormap_defaults.py | 2 + src/plotting/compat.py | 8 + workflow/rules/plot.smk | 4 +- 11 files changed, 1075 insertions(+), 2 deletions(-) create mode 100644 config/windgust.yaml create mode 100644 docs/realv2_vmax10m.md create mode 100644 resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml create mode 100644 resources/inference/metadata/sgm-realv2-ich1-patch.yaml create mode 100644 resources/inference/templates/icon-ch1-shortName=VMAX_10M.grib create mode 100644 resources/inference/templates/templates_index_realch1.yaml diff --git a/.gitignore b/.gitignore index 50b2ff7d..3995e831 100644 --- a/.gitignore +++ b/.gitignore @@ -55,3 +55,6 @@ uv.lock # evalml .evalml_snakemake_cmd.txt + +# Paper plotting scripts + their figure outputs (not part of the realv2 showcase) +paper_plots/ diff --git a/config/windgust.yaml b/config/windgust.yaml new file mode 100644 index 00000000..c657789d --- /dev/null +++ b/config/windgust.yaml @@ -0,0 +1,88 @@ +# yaml-language-server: $schema=../workflow/tools/config.schema.json +description: | + Showcase the multi-output "realv2" anemoi architecture (ICON-CH1 cutout forecaster + with a diagnostic VMAX_10M stream on the REAL-CH1 / ICON-CH1 1km grid). + +# Explicit init times for case studies / showcases. +dates: + - 2024-02-01T00:00 + +runs: + - forecaster: + # Verified-working checkpoint for the realv2 multi-output architecture. + # Replace with an MLflow run URL/ID once the checkpoint is registered. + checkpoint: /scratch/mch/rradev/output/checkpoint/9efa01f8c7464328897edb2c03a407c2/inference-last.ckpt + label: realv2_vmax10m + steps: 0/120/6 + config: resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml + extra_requirements: + # anemoi-inference fork = upstream main + two fixes this multi-output + # checkpoint needs: (1) EmptyInput.create_input_state propagates the date + # (else KeyError 'date' in add_initial_forcings_to_input_state for the + # diagnostic-only realv2 stream), and (2) format_dataset_name preserves the + # {time:04}/{step:03} padding in GRIB output paths. + - git+https://github.com/radiradev/anemoi-inference.git@fix/empty-input-propagate-date + - eccodes==2.39.1 + - eccodes-cosmo-resources-python==2.38.3.1 + + - baseline: + label: ICON-CH1-ctrl + root: /store_new/mch/msopr/osm/ICON-CH1-EPS + steps: 0/33/6 + +truth: + label: REAL-CH1 + root: /store_new/mch/msopr/ml/datasets/mch-realch1-fdb-1km-2005-2025-1h-pl13-v2.0.zarr + +experiment: + params: + - VMAX_10M + stratification: + regions: + - jura + - mittelland + - voralpen + - alpennordhang + - innerealpentaeler + - alpensuedseite + root: /scratch/mch/bhendj/regions/Prognoseregionen_LV95_20220517 + thresholds: + VMAX_10M: + gt: [10.0, 20.0, 30.0] + dashboard: + stratification: + - season + +# Showcase (animation) settings. VMAX_10M (max 10m wind) is the diagnostic from the +# realv2 output stream, defined only over the Alpine LAM domain -> restrict the showcase +# domains to Switzerland / ICON-CH so every param (including VMAX_10M) stays on-grid. +showcase: + params: + - T_2M + - SP_10M + - VMAX_10M + meteograms: + enabled: false + stations: [JUN] + animations: + enabled: true + domains: + - icon-ch + - switzerland + +locations: + output_root: output/ + +profile: + executor: slurm + global_resources: + gpus: 16 + default_resources: + slurm_partition: "postproc" + cpus_per_task: 1 + mem_mb_per_cpu: 1800 + runtime: "1h" + gpus: 0 + jobs: 50 + batch_rules: + plot_forecast_frame: 32 diff --git a/docs/realv2_vmax10m.md b/docs/realv2_vmax10m.md new file mode 100644 index 00000000..efbc6212 --- /dev/null +++ b/docs/realv2_vmax10m.md @@ -0,0 +1,120 @@ +# REAL-CH1 multi-output architecture (VMAX_10M) support + +This documents how evalml supports the new **multi-output "realv2" anemoi +architecture** and, in particular, the `VMAX_10M` (maximum 10 m wind gust) +diagnostic — including the fixes/workarounds required to make the **showcase** +run end-to-end. + +## The architecture + +The checkpoint emits **two named output streams** (`metadata_inference.dataset_names +== ["data", "realv2"]`): + +- **`data`** — the cutout state: ICON-CH1 1km LAM + AIFS N320 global + (1,688,650 points, IFS variable names `2t`, `10u`, `tp`, …). +- **`realv2`** — a **diagnostic-only** stream with a single variable **`VMAX_10M`** + on the REAL-CH1 / ICON-CH1 1km grid (1,147,980 points). It has **no input + variables** (`data_indices.input == {}`). + +## How to run the showcase + +```bash +evalml showcase config/forecasters-realv2.yaml +``` + +This builds the inference env, runs inference for each reference time, normalises the +GRIB output, and renders per-leadtime `VMAX_10M` frames over Switzerland which are +assembled into a GIF (`make_forecast_animation`). + +Key config files: + +- `config/forecasters-realv2.yaml` — example experiment config. +- `resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml` — the + inference template. Routes both streams to GRIB: `data` → ICON LAM + IFS global + (with cutout masks), `realv2` → `grib/realv2-*.grib` with `write_initial_state: false` + (the diagnostic has no analysis/step-0 state). +- `resources/inference/metadata/sgm-realv2-ich1-patch.yaml` — metadata patch (see below). +- `resources/inference/templates/templates_index_realch1.yaml` + + `icon-ch1-shortName=VMAX_10M.grib` — GRIB sample/template for VMAX_10M. + +## Fixes / workarounds (why each exists) + +Running this architecture surfaced four issues. Each is fixed in-repo; all four were +verified by running inference on GPU and rendering a `VMAX_10M` frame/GIF. + +Two of them are genuine anemoi-inference bugs patched at env-build time by +`workflow/scripts/patch_anemoi_inference.py` (called from `inference_create_venv` +after `pip install`, inside the freshly-built venv before it is squashed). Both +patches are idempotent and no-ops once upstream ships the fixes — **TODO:** submit +upstream and delete the script + its call. + +### 1. anemoi-inference `EmptyInput` drops the date (upstream bug) + +`EmptyInput.create_input_state` returns a state **without** a `date`. For a +diagnostic-only output dataset (`realv2`), *every* input provider is the +`EmptyInput`, so the combined input state has no date and the forecast loop dies in +`add_initial_forcings_to_input_state` with `KeyError: 'date'`. This affects +anemoi-inference 0.10.2 **and** 0.11.1 (latest main). + +- **Fix:** `return dict(date=date, fields=dict(), _input=self)`. + +### 2. eccodes / eccodes-cosmo-resources version mismatch (segfault) + +The checkpoint requirements pin `eccodes==2.39.1` but leave +`eccodes-cosmo-resources-python` unpinned, so the build pulls the latest (2.44.x), +whose definitions are incompatible with eccodes 2.39 and **segfault** when writing +GRIB. + +- **Fix:** pin `eccodes-cosmo-resources-python==2.38.3.1` in the run's + `extra_requirements` (see `config/forecasters-realv2.yaml`). + +### 3. VMAX_10M GRIB time-processing assertion + wrong units + +Two related problems with the VMAX_10M time encoding: + +- The native metadata period is `['650m', '12h']` (a sub-hour start). The GRIB + time-processing encoder asserts whole-hour steps (`_step_in_hours`) → `AssertionError`. + **Fix:** `sgm-realv2-ich1-patch.yaml` overrides the realv2 `VMAX_10M` period to + `['6h', '12h']` (a whole-hour 6 h max window matching the model step). The same patch + also remaps the `data` stream's IFS names (`2t`, `10u`, `tp`, …) to the COSMO + shortNames (`T_2M`, `U_10M`, `TOT_PREC`, …) expected by the ICON GRIB templates. +- The `VMAX_10M` GRIB **sample template** must be in **hours**. Extracted straight from + an ICON source field it carries `stepUnits = minutes`, so the 6 h max window is + mislabelled as 6 minutes (`stepRange '0m-6m'`) and the step leaks into filenames as + `_6m`. **Fix:** generate the template by retargeting the (hours) heightAboveGround + template to `VMAX_10M` @ 10 m (see `icon-ch1_generate_templates.sh`); the result is + `stepType=max`, `stepUnits=hours`, `stepRange '0-6'`/`'6-12'`. + +### 4. anemoi-inference strips path-template format specifiers (upstream bug) + +The `@format_dataset_name("path")` decorator substitutes `{dataset}` via +`str.format_map(DefaultFormat(...))`. That call also consumes the `:04` / `:03` +specifiers on the still-unresolved `date`/`time`/`step` placeholders, so +`grib/{date}{time:04}_{step:03}.grib` collapses to `grib/{date}{time}_{step}.grib` and +files land unpadded (`202402010_6.grib`). This affects **every** GRIB output and config. + +- **Fix:** substitute only the dataset placeholders, leaving the rest for + `render_template` to format with the (integer) GRIB key values: + `kwargs[self.arg] = kwargs[self.arg].replace("{dataset_name}", name).replace("{dataset}", name)`. + Applied by `patch_anemoi_inference.py`. + +With #3 (hours template) and #4 (spec preservation) in place, anemoi writes the +canonical `{prefix}{YYYYMMDDHHMM}_{NNN}.grib` names natively — no post-hoc filename +normalisation is needed. + +## Showcase plotting wiring + +- `src/plotting/colormap_defaults.py` — `VMAX_10M` colormap (m/s, wind palette). +- `src/plotting/compat.py` — `load_state_from_grib` redirects `VMAX_10M` (a realv2 + param, in `REALV2_PARAMS`) to the sibling `realv2-*.grib` file. +- `workflow/Snakefile` — `showcase_all` renders `VMAX_10M` animations over the + `switzerland` domain (the realv2 stream is the Alpine LAM only). +- `workflow/rules/plot.smk` — `VMAX_10M` is treated like other period diagnostics + (lead time 0 skipped). + +## Status + +Verified end-to-end on GPU: inference produces a valid `realv2` `VMAX_10M` GRIB +(correct COSMO `shortName`, `stepType=max`, realistic gust values), and +`plot_forecast_frame` renders the Switzerland-domain frame. See +`figures/showcase_vmax10m_switzerland_006.png`. diff --git a/resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml b/resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml new file mode 100644 index 00000000..b9a30441 --- /dev/null +++ b/resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml @@ -0,0 +1,76 @@ +# Inference template for the multi-output "realv2" anemoi architecture. +# +# This checkpoint emits TWO named output streams (see metadata_inference.dataset_names): +# - data: the cutout state (ICON-CH1 1km LAM + AIFS N320 global), IFS variable names +# - realv2: a diagnostic-only stream on the REAL-CH1 / ICON-CH1 1km grid carrying VMAX_10M +# +# Each stream is keyed by its dataset name under `output:` and encoded to GRIB so the rest of +# the evalml pipeline (showcase plotting, verification) can consume it unchanged. The `data` +# stream reuses the proven cutout encoding (ICON LAM + IFS global); the `realv2` stream is a +# single LAM grid (no cutout mask) written to its own grib/realv2-*.grib files. +lead_time: 120h +write_initial_state: true +allow_nans: true + +env: + ANEMOI_INFERENCE_NUM_CHUNKS: 8 # OOM error if not set + +# inputs +input: + test: + use_original_paths: true + +# NOTE: top-level state post_processors (e.g. accumulate_from_start_of_forecast / +# forward_transform_filter for tp) are intentionally omitted. With this multi-output +# checkpoint + write_initial_state, they are applied to the realv2 diagnostic stream's +# initial state (which has no prognostic input) and fail with KeyError: 'date'. The +# VMAX_10M showcase does not need them; tp handling for the data stream is a follow-up. + +output: + # Global cutout state (ICON-CH1 LAM + AIFS N320 global). + data: + tee: + - grib: + path: grib/{date}{time:04}_{step:03}.grib + encoding: + typeOfGeneratingProcess: 2 + centre: lssw + templates: + samples: resources/templates_index_icon.yaml + post_processors: + - extract_mask: # keep only LAM points + mask: "lam_0/cutout_mask" + as_slice: true + - grib: + path: grib/ifs-{date}{time:04}_{step:03}.grib + encoding: + typeOfGeneratingProcess: 2 + centre: ecmf + templates: + samples: resources/templates_index_ifs.yaml + post_processors: + - extract_mask: # removes LAM points + mask: "lam_0/cutout_mask" + as_slice: true + inverse: true + - assign_mask: # fill local/global overlapping points with nan + mask: "global/cutout_mask" + # Regional diagnostic stream (VMAX_10M) on the REAL-CH1 / ICON-CH1 1km grid. + realv2: + grib: + path: grib/realv2-{date}{time:04}_{step:03}.grib + # VMAX_10M is diagnostic-only: there is no initial (analysis) state for it, so + # writing the initial step would fail with KeyError: 'date'. Skip it here; the + # diagnostic is first valid at the model step (e.g. +6h) anyway. + write_initial_state: false + encoding: + typeOfGeneratingProcess: 2 + centre: lssw + templates: + samples: resources/templates_index_realch1.yaml + +# Remaps the `data` stream's IFS variable names (2t, 10u, tp, ...) to the COSMO +# shortNames expected by the ICON GRIB templates, AND gives the realv2 VMAX_10M +# diagnostic a whole-hour max period (['6h','12h']) so the GRIB time-processing +# encoder does not hit the integer-hour assertion on its native ['650m','12h'] period. +patch_metadata: resources/sgm-realv2-ich1-patch.yaml diff --git a/resources/inference/metadata/sgm-realv2-ich1-patch.yaml b/resources/inference/metadata/sgm-realv2-ich1-patch.yaml new file mode 100644 index 00000000..d0fb0e61 --- /dev/null +++ b/resources/inference/metadata/sgm-realv2-ich1-patch.yaml @@ -0,0 +1,763 @@ +config: + dataloader: + test: + datasets: + data: + dataset_config: + dataset: + cutout: + - dataset: /store_new/mch/msopr/ml/datasets/mch-ich1-1km-2024-2025-1h-pl13-ifsnames-v1.0.zarr + - dataset: /store_new/mch/msopr/ml/datasets/aifs-od-an-oper-0001-mars-n320-2016-2025-6h-v1-combined-land.zarr + start: null + end: null +dataset: + data: + variables_metadata: + slor: + mars: + date: 20050101 + levtype: sfc + param: SSO_SIGMA + step: 12 + time: 0 + sdor: + mars: + date: 20050101 + levtype: sfc + param: SSO_STDH + step: 12 + time: 0 + 10u: + mars: + date: 20050101 + levtype: sfc + param: U_10M + step: 12 + time: 0 + 10v: + mars: + date: 20050101 + levtype: sfc + param: V_10M + step: 12 + time: 0 + 2d: + mars: + date: 20050101 + levtype: sfc + param: TD_2M + step: 12 + time: 0 + 2t: + mars: + date: 20050101 + levtype: sfc + param: T_2M + step: 12 + time: 0 + cos_julian_day: + computed_forcing: true + constant_in_time: false + cos_latitude: + computed_forcing: true + constant_in_time: true + cos_local_time: + computed_forcing: true + constant_in_time: false + cos_longitude: + computed_forcing: true + constant_in_time: true + insolation: + computed_forcing: true + constant_in_time: false + lsm: + constant_in_time: true + mars: + date: 20050101 + levtype: sfc + param: FR_LAND + step: 0 + time: 12 + msl: + mars: + date: 20050101 + levtype: sfc + param: PMSL + step: 12 + time: 0 + q_100: + mars: + date: 20050101 + levelist: 100 + levtype: pl + param: QV + step: 12 + time: 0 + q_1000: + mars: + date: 20050101 + levelist: 1000 + levtype: pl + param: QV + step: 12 + time: 0 + q_150: + mars: + date: 20050101 + levelist: 150 + levtype: pl + param: QV + step: 12 + time: 0 + q_200: + mars: + date: 20050101 + levelist: 200 + levtype: pl + param: QV + step: 12 + time: 0 + q_250: + mars: + date: 20050101 + levelist: 250 + levtype: pl + param: QV + step: 12 + time: 0 + q_300: + mars: + date: 20050101 + levelist: 300 + levtype: pl + param: QV + step: 12 + time: 0 + q_400: + mars: + date: 20050101 + levelist: 400 + levtype: pl + param: QV + step: 12 + time: 0 + q_50: + mars: + date: 20050101 + levelist: 50 + levtype: pl + param: QV + step: 12 + time: 0 + q_500: + mars: + date: 20050101 + levelist: 500 + levtype: pl + param: QV + step: 12 + time: 0 + q_600: + mars: + date: 20050101 + levelist: 600 + levtype: pl + param: QV + step: 12 + time: 0 + q_700: + mars: + date: 20050101 + levelist: 700 + levtype: pl + param: QV + step: 12 + time: 0 + q_850: + mars: + date: 20050101 + levelist: 850 + levtype: pl + param: QV + step: 12 + time: 0 + q_925: + mars: + date: 20050101 + levelist: 925 + levtype: pl + param: QV + step: 12 + time: 0 + sin_julian_day: + computed_forcing: true + constant_in_time: false + sin_latitude: + computed_forcing: true + constant_in_time: true + sin_local_time: + computed_forcing: true + constant_in_time: false + sin_longitude: + computed_forcing: true + constant_in_time: true + sp: + mars: + date: 20050101 + levtype: sfc + param: PS + step: 12 + time: 0 + t_100: + mars: + date: 20050101 + levelist: 100 + levtype: pl + param: T + step: 12 + time: 0 + t_1000: + mars: + date: 20050101 + levelist: 1000 + levtype: pl + param: T + step: 12 + time: 0 + t_150: + mars: + date: 20050101 + levelist: 150 + levtype: pl + param: T + step: 12 + time: 0 + t_200: + mars: + date: 20050101 + levelist: 200 + levtype: pl + param: T + step: 12 + time: 0 + t_250: + mars: + date: 20050101 + levelist: 250 + levtype: pl + param: T + step: 12 + time: 0 + t_300: + mars: + date: 20050101 + levelist: 300 + levtype: pl + param: T + step: 12 + time: 0 + t_400: + mars: + date: 20050101 + levelist: 400 + levtype: pl + param: T + step: 12 + time: 0 + t_50: + mars: + date: 20050101 + levelist: 50 + levtype: pl + param: T + step: 12 + time: 0 + t_500: + mars: + date: 20050101 + levelist: 500 + levtype: pl + param: T + step: 12 + time: 0 + t_600: + mars: + date: 20050101 + levelist: 600 + levtype: pl + param: T + step: 12 + time: 0 + t_700: + mars: + date: 20050101 + levelist: 700 + levtype: pl + param: T + step: 12 + time: 0 + t_850: + mars: + date: 20050101 + levelist: 850 + levtype: pl + param: T + step: 12 + time: 0 + t_925: + mars: + date: 20050101 + levelist: 925 + levtype: pl + param: T + step: 12 + time: 0 + tp: + mars: + date: 20050101 + levtype: sfc + param: TOT_PREC + step: 12 + time: 0 + period: + - 6h + - 12h + process: accumulation + u_100: + mars: + date: 20050101 + levelist: 100 + levtype: pl + param: U + step: 12 + time: 0 + u_1000: + mars: + date: 20050101 + levelist: 1000 + levtype: pl + param: U + step: 12 + time: 0 + u_150: + mars: + date: 20050101 + levelist: 150 + levtype: pl + param: U + step: 12 + time: 0 + u_200: + mars: + date: 20050101 + levelist: 200 + levtype: pl + param: U + step: 12 + time: 0 + u_250: + mars: + date: 20050101 + levelist: 250 + levtype: pl + param: U + step: 12 + time: 0 + u_300: + mars: + date: 20050101 + levelist: 300 + levtype: pl + param: U + step: 12 + time: 0 + u_400: + mars: + date: 20050101 + levelist: 400 + levtype: pl + param: U + step: 12 + time: 0 + u_50: + mars: + date: 20050101 + levelist: 50 + levtype: pl + param: U + step: 12 + time: 0 + u_500: + mars: + date: 20050101 + levelist: 500 + levtype: pl + param: U + step: 12 + time: 0 + u_600: + mars: + date: 20050101 + levelist: 600 + levtype: pl + param: U + step: 12 + time: 0 + u_700: + mars: + date: 20050101 + levelist: 700 + levtype: pl + param: U + step: 12 + time: 0 + u_850: + mars: + date: 20050101 + levelist: 850 + levtype: pl + param: U + step: 12 + time: 0 + u_925: + mars: + date: 20050101 + levelist: 925 + levtype: pl + param: U + step: 12 + time: 0 + v_100: + mars: + date: 20050101 + levelist: 100 + levtype: pl + param: V + step: 12 + time: 0 + v_1000: + mars: + date: 20050101 + levelist: 1000 + levtype: pl + param: V + step: 12 + time: 0 + v_150: + mars: + date: 20050101 + levelist: 150 + levtype: pl + param: V + step: 12 + time: 0 + v_200: + mars: + date: 20050101 + levelist: 200 + levtype: pl + param: V + step: 12 + time: 0 + v_250: + mars: + date: 20050101 + levelist: 250 + levtype: pl + param: V + step: 12 + time: 0 + v_300: + mars: + date: 20050101 + levelist: 300 + levtype: pl + param: V + step: 12 + time: 0 + v_400: + mars: + date: 20050101 + levelist: 400 + levtype: pl + param: V + step: 12 + time: 0 + v_50: + mars: + date: 20050101 + levelist: 50 + levtype: pl + param: V + step: 12 + time: 0 + v_500: + mars: + date: 20050101 + levelist: 500 + levtype: pl + param: V + step: 12 + time: 0 + v_600: + mars: + date: 20050101 + levelist: 600 + levtype: pl + param: V + step: 12 + time: 0 + v_700: + mars: + date: 20050101 + levelist: 700 + levtype: pl + param: V + step: 12 + time: 0 + v_850: + mars: + date: 20050101 + levelist: 850 + levtype: pl + param: V + step: 12 + time: 0 + v_925: + mars: + date: 20050101 + levelist: 925 + levtype: pl + param: V + step: 12 + time: 0 + w_100: + mars: + date: 20050101 + levelist: 100 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_1000: + mars: + date: 20050101 + levelist: 1000 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_150: + mars: + date: 20050101 + levelist: 150 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_200: + mars: + date: 20050101 + levelist: 200 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_250: + mars: + date: 20050101 + levelist: 250 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_300: + mars: + date: 20050101 + levelist: 300 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_400: + mars: + date: 20050101 + levelist: 400 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_50: + mars: + date: 20050101 + levelist: 50 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_500: + mars: + date: 20050101 + levelist: 500 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_600: + mars: + date: 20050101 + levelist: 600 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_700: + mars: + date: 20050101 + levelist: 700 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_850: + mars: + date: 20050101 + levelist: 850 + levtype: pl + param: OMEGA + step: 12 + time: 0 + w_925: + mars: + date: 20050101 + levelist: 925 + levtype: pl + param: OMEGA + step: 12 + time: 0 + z: + constant_in_time: true + mars: + date: 20050101 + levelist: null + levtype: sfc + param: FIS + step: 0 + time: 12 + z_100: + mars: + date: 20050101 + levelist: 100 + levtype: pl + param: FI + step: 12 + time: 0 + z_1000: + mars: + date: 20050101 + levelist: 1000 + levtype: pl + param: FI + step: 12 + time: 0 + z_150: + mars: + date: 20050101 + levelist: 150 + levtype: pl + param: FI + step: 12 + time: 0 + z_200: + mars: + date: 20050101 + levelist: 200 + levtype: pl + param: FI + step: 12 + time: 0 + z_250: + mars: + date: 20050101 + levelist: 250 + levtype: pl + param: FI + step: 12 + time: 0 + z_300: + mars: + date: 20050101 + levelist: 300 + levtype: pl + param: FI + step: 12 + time: 0 + z_400: + mars: + date: 20050101 + levelist: 400 + levtype: pl + param: FI + step: 12 + time: 0 + z_50: + mars: + date: 20050101 + levelist: 50 + levtype: pl + param: FI + step: 12 + time: 0 + z_500: + mars: + date: 20050101 + levelist: 500 + levtype: pl + param: FI + step: 12 + time: 0 + z_600: + mars: + date: 20050101 + levelist: 600 + levtype: pl + param: FI + step: 12 + time: 0 + z_700: + mars: + date: 20050101 + levelist: 700 + levtype: pl + param: FI + step: 12 + time: 0 + z_850: + mars: + date: 20050101 + levelist: 850 + levtype: pl + param: FI + step: 12 + time: 0 + z_925: + mars: + date: 20050101 + levelist: 925 + levtype: pl + param: FI + step: 12 + time: 0 + realv2: + variables_metadata: + VMAX_10M: + mars: + date: 20050101 + levtype: sfc + param: VMAX_10M + step: 720m + time: 0 + period: + - 6h + - 12h + process: maximum diff --git a/resources/inference/templates/icon-ch1-shortName=VMAX_10M.grib b/resources/inference/templates/icon-ch1-shortName=VMAX_10M.grib new file mode 100644 index 0000000000000000000000000000000000000000..5cc22b9237a69b4ed4668f33c750cf81ce83c978 GIT binary patch literal 203 zcmZ<{@^t$DpMi-10!{-dQAURA4F8208QEVlG6MOGtUyAJiQz8;NPv+;4x*HSsg40C z4b;rQpv=s`AlTvqWTmnJH86^&*e(k9jNImODP-~4JnaKO30oE*!ObMb!q5w3fXrn` x2N7KV!QcP`H_SSaC=(M{1;}g=kYoiLuLWW^aDzxb1%?J7#l{9w!pd$A1OTR2BE|p! literal 0 HcmV?d00001 diff --git a/resources/inference/templates/icon-ch1_generate_templates.sh b/resources/inference/templates/icon-ch1_generate_templates.sh index ec1c47ef..df8ca81b 100755 --- a/resources/inference/templates/icon-ch1_generate_templates.sh +++ b/resources/inference/templates/icon-ch1_generate_templates.sh @@ -19,3 +19,11 @@ grib_copy -w shortName=T,level=500 $PL_SAMPLE /dev/stdout | grib_set -d 0 - icon # template for typeOfLevel=meanSea grib_copy -w shortName=PMSL $SFC_SAMPLE /dev/stdout | grib_set -d 0 - icon-ch1-typeOfLevel=meanSea.grib + +# template for VMAX_10M (max 10m wind speed) on the ICON-CH1 1km grid. +# Used by the realv2 output stream of the multi-output architecture. Derive it from +# the heightAboveGround template (which is in HOURS) and retarget to VMAX_10M @ 10m: +# setting shortName=VMAX_10M makes eccodes pick the max stepType automatically, while +# keeping stepUnits=hours. Extracting straight from the ICON source instead yields a +# minute-unit step (stepUnits=0), which mislabels the 6 h max window as 6 minutes. +grib_set -s shortName=VMAX_10M,level=10 -d 0 icon-ch1-typeOfLevel=heightAboveGround.grib icon-ch1-shortName=VMAX_10M.grib diff --git a/resources/inference/templates/templates_index_realch1.yaml b/resources/inference/templates/templates_index_realch1.yaml new file mode 100644 index 00000000..16ec62cf --- /dev/null +++ b/resources/inference/templates/templates_index_realch1.yaml @@ -0,0 +1,5 @@ +# REAL-CH1 templates (ICON-CH1 1km grid) +# Used by the realv2 output stream of the multi-output anemoi architecture, which +# emits the diagnostic VMAX_10M (maximum 10 m wind speed) on the ICON-CH1 1km grid. +- - {levtype: sfc, param: [VMAX_10M]} + - resources/icon-ch1-shortName=VMAX_10M.grib diff --git a/src/plotting/colormap_defaults.py b/src/plotting/colormap_defaults.py index 88c065e6..4d4c26a4 100644 --- a/src/plotting/colormap_defaults.py +++ b/src/plotting/colormap_defaults.py @@ -26,6 +26,8 @@ def _fallback(): | {"units": "m/s", "extend": "both"}, "SP_10M": load_ncl_colormap("modified_uv_17lev.ct") | {"units": "m/s", "extend": "max"}, + "VMAX_10M": load_ncl_colormap("modified_uv_17lev.ct") + | {"units": "m/s", "extend": "max"}, "T_850": { "cmap": plt.get_cmap("inferno", 11), "vmin": 220, diff --git a/src/plotting/compat.py b/src/plotting/compat.py index ef14e473..8a5aff63 100644 --- a/src/plotting/compat.py +++ b/src/plotting/compat.py @@ -19,10 +19,18 @@ PARAMS_MAP_INV = {v: k for k, v in PARAMS_MAP.items()} +# Diagnostic params emitted by the multi-output "realv2" stream. These are written to a +# sibling grib/realv2-*.grib file (same LAM grid as the main output), so redirect to it. +REALV2_PARAMS = frozenset({"VMAX_10M"}) + def load_state_from_grib( file: Path, paramlist: list[str] | None = None ) -> dict[str, np.ndarray | dict[str, np.ndarray] | gpd.GeoSeries]: + if paramlist and set(paramlist) <= REALV2_PARAMS: + realv2_file = file.with_name(f"realv2-{file.name}") + if realv2_file.exists(): + file = realv2_file ds = load_from_grib_file(file, {"parameter.variable": paramlist}) state = {} ref_param = next((p for p in (paramlist or []) if p in ds), None) diff --git a/workflow/rules/plot.smk b/workflow/rules/plot.smk index 91ff1080..b20f10fa 100644 --- a/workflow/rules/plot.smk +++ b/workflow/rules/plot.smk @@ -134,8 +134,8 @@ rule plot_forecast_frame: def get_leadtimes(wc): """Get all lead times from the run config.""" start, end, step = map(int, RUN_CONFIGS[wc.run_id]["steps"].split("/")) - # skip lead time 0 for diagnostic variables - if wc.param in ["tp", "TOT_PREC"] and start == 0: + # skip lead time 0 for diagnostic variables (accumulations and period maxima) + if wc.param in ["tp", "TOT_PREC", "VMAX_10M"] and start == 0: start += step return [f"{i}" for i in range(start, end + 1, step)] From cc5b4676ca27b0fc3e1bc94c373f2555b06cce8f Mon Sep 17 00:00:00 2001 From: radiradev Date: Tue, 9 Jun 2026 13:47:23 +0200 Subject: [PATCH 2/5] cleanup --- config/windgust.yaml | 6 - docs/realv2_vmax10m.md | 120 ------------------ ...ultidataset-forecaster-windgust-ich1.yaml} | 20 +-- 3 files changed, 1 insertion(+), 145 deletions(-) delete mode 100644 docs/realv2_vmax10m.md rename resources/inference/configs/{sgm-multidataset-forecaster-realv2-ich1.yaml => sgm-multidataset-forecaster-windgust-ich1.yaml} (57%) diff --git a/config/windgust.yaml b/config/windgust.yaml index c657789d..0909aeea 100644 --- a/config/windgust.yaml +++ b/config/windgust.yaml @@ -9,14 +9,11 @@ dates: runs: - forecaster: - # Verified-working checkpoint for the realv2 multi-output architecture. - # Replace with an MLflow run URL/ID once the checkpoint is registered. checkpoint: /scratch/mch/rradev/output/checkpoint/9efa01f8c7464328897edb2c03a407c2/inference-last.ckpt label: realv2_vmax10m steps: 0/120/6 config: resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml extra_requirements: - # anemoi-inference fork = upstream main + two fixes this multi-output # checkpoint needs: (1) EmptyInput.create_input_state propagates the date # (else KeyError 'date' in add_initial_forcings_to_input_state for the # diagnostic-only realv2 stream), and (2) format_dataset_name preserves the @@ -53,9 +50,6 @@ experiment: stratification: - season -# Showcase (animation) settings. VMAX_10M (max 10m wind) is the diagnostic from the -# realv2 output stream, defined only over the Alpine LAM domain -> restrict the showcase -# domains to Switzerland / ICON-CH so every param (including VMAX_10M) stays on-grid. showcase: params: - T_2M diff --git a/docs/realv2_vmax10m.md b/docs/realv2_vmax10m.md deleted file mode 100644 index efbc6212..00000000 --- a/docs/realv2_vmax10m.md +++ /dev/null @@ -1,120 +0,0 @@ -# REAL-CH1 multi-output architecture (VMAX_10M) support - -This documents how evalml supports the new **multi-output "realv2" anemoi -architecture** and, in particular, the `VMAX_10M` (maximum 10 m wind gust) -diagnostic — including the fixes/workarounds required to make the **showcase** -run end-to-end. - -## The architecture - -The checkpoint emits **two named output streams** (`metadata_inference.dataset_names -== ["data", "realv2"]`): - -- **`data`** — the cutout state: ICON-CH1 1km LAM + AIFS N320 global - (1,688,650 points, IFS variable names `2t`, `10u`, `tp`, …). -- **`realv2`** — a **diagnostic-only** stream with a single variable **`VMAX_10M`** - on the REAL-CH1 / ICON-CH1 1km grid (1,147,980 points). It has **no input - variables** (`data_indices.input == {}`). - -## How to run the showcase - -```bash -evalml showcase config/forecasters-realv2.yaml -``` - -This builds the inference env, runs inference for each reference time, normalises the -GRIB output, and renders per-leadtime `VMAX_10M` frames over Switzerland which are -assembled into a GIF (`make_forecast_animation`). - -Key config files: - -- `config/forecasters-realv2.yaml` — example experiment config. -- `resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml` — the - inference template. Routes both streams to GRIB: `data` → ICON LAM + IFS global - (with cutout masks), `realv2` → `grib/realv2-*.grib` with `write_initial_state: false` - (the diagnostic has no analysis/step-0 state). -- `resources/inference/metadata/sgm-realv2-ich1-patch.yaml` — metadata patch (see below). -- `resources/inference/templates/templates_index_realch1.yaml` + - `icon-ch1-shortName=VMAX_10M.grib` — GRIB sample/template for VMAX_10M. - -## Fixes / workarounds (why each exists) - -Running this architecture surfaced four issues. Each is fixed in-repo; all four were -verified by running inference on GPU and rendering a `VMAX_10M` frame/GIF. - -Two of them are genuine anemoi-inference bugs patched at env-build time by -`workflow/scripts/patch_anemoi_inference.py` (called from `inference_create_venv` -after `pip install`, inside the freshly-built venv before it is squashed). Both -patches are idempotent and no-ops once upstream ships the fixes — **TODO:** submit -upstream and delete the script + its call. - -### 1. anemoi-inference `EmptyInput` drops the date (upstream bug) - -`EmptyInput.create_input_state` returns a state **without** a `date`. For a -diagnostic-only output dataset (`realv2`), *every* input provider is the -`EmptyInput`, so the combined input state has no date and the forecast loop dies in -`add_initial_forcings_to_input_state` with `KeyError: 'date'`. This affects -anemoi-inference 0.10.2 **and** 0.11.1 (latest main). - -- **Fix:** `return dict(date=date, fields=dict(), _input=self)`. - -### 2. eccodes / eccodes-cosmo-resources version mismatch (segfault) - -The checkpoint requirements pin `eccodes==2.39.1` but leave -`eccodes-cosmo-resources-python` unpinned, so the build pulls the latest (2.44.x), -whose definitions are incompatible with eccodes 2.39 and **segfault** when writing -GRIB. - -- **Fix:** pin `eccodes-cosmo-resources-python==2.38.3.1` in the run's - `extra_requirements` (see `config/forecasters-realv2.yaml`). - -### 3. VMAX_10M GRIB time-processing assertion + wrong units - -Two related problems with the VMAX_10M time encoding: - -- The native metadata period is `['650m', '12h']` (a sub-hour start). The GRIB - time-processing encoder asserts whole-hour steps (`_step_in_hours`) → `AssertionError`. - **Fix:** `sgm-realv2-ich1-patch.yaml` overrides the realv2 `VMAX_10M` period to - `['6h', '12h']` (a whole-hour 6 h max window matching the model step). The same patch - also remaps the `data` stream's IFS names (`2t`, `10u`, `tp`, …) to the COSMO - shortNames (`T_2M`, `U_10M`, `TOT_PREC`, …) expected by the ICON GRIB templates. -- The `VMAX_10M` GRIB **sample template** must be in **hours**. Extracted straight from - an ICON source field it carries `stepUnits = minutes`, so the 6 h max window is - mislabelled as 6 minutes (`stepRange '0m-6m'`) and the step leaks into filenames as - `_6m`. **Fix:** generate the template by retargeting the (hours) heightAboveGround - template to `VMAX_10M` @ 10 m (see `icon-ch1_generate_templates.sh`); the result is - `stepType=max`, `stepUnits=hours`, `stepRange '0-6'`/`'6-12'`. - -### 4. anemoi-inference strips path-template format specifiers (upstream bug) - -The `@format_dataset_name("path")` decorator substitutes `{dataset}` via -`str.format_map(DefaultFormat(...))`. That call also consumes the `:04` / `:03` -specifiers on the still-unresolved `date`/`time`/`step` placeholders, so -`grib/{date}{time:04}_{step:03}.grib` collapses to `grib/{date}{time}_{step}.grib` and -files land unpadded (`202402010_6.grib`). This affects **every** GRIB output and config. - -- **Fix:** substitute only the dataset placeholders, leaving the rest for - `render_template` to format with the (integer) GRIB key values: - `kwargs[self.arg] = kwargs[self.arg].replace("{dataset_name}", name).replace("{dataset}", name)`. - Applied by `patch_anemoi_inference.py`. - -With #3 (hours template) and #4 (spec preservation) in place, anemoi writes the -canonical `{prefix}{YYYYMMDDHHMM}_{NNN}.grib` names natively — no post-hoc filename -normalisation is needed. - -## Showcase plotting wiring - -- `src/plotting/colormap_defaults.py` — `VMAX_10M` colormap (m/s, wind palette). -- `src/plotting/compat.py` — `load_state_from_grib` redirects `VMAX_10M` (a realv2 - param, in `REALV2_PARAMS`) to the sibling `realv2-*.grib` file. -- `workflow/Snakefile` — `showcase_all` renders `VMAX_10M` animations over the - `switzerland` domain (the realv2 stream is the Alpine LAM only). -- `workflow/rules/plot.smk` — `VMAX_10M` is treated like other period diagnostics - (lead time 0 skipped). - -## Status - -Verified end-to-end on GPU: inference produces a valid `realv2` `VMAX_10M` GRIB -(correct COSMO `shortName`, `stepType=max`, realistic gust values), and -`plot_forecast_frame` renders the Switzerland-domain frame. See -`figures/showcase_vmax10m_switzerland_006.png`. diff --git a/resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml b/resources/inference/configs/sgm-multidataset-forecaster-windgust-ich1.yaml similarity index 57% rename from resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml rename to resources/inference/configs/sgm-multidataset-forecaster-windgust-ich1.yaml index b9a30441..28633f69 100644 --- a/resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml +++ b/resources/inference/configs/sgm-multidataset-forecaster-windgust-ich1.yaml @@ -1,13 +1,3 @@ -# Inference template for the multi-output "realv2" anemoi architecture. -# -# This checkpoint emits TWO named output streams (see metadata_inference.dataset_names): -# - data: the cutout state (ICON-CH1 1km LAM + AIFS N320 global), IFS variable names -# - realv2: a diagnostic-only stream on the REAL-CH1 / ICON-CH1 1km grid carrying VMAX_10M -# -# Each stream is keyed by its dataset name under `output:` and encoded to GRIB so the rest of -# the evalml pipeline (showcase plotting, verification) can consume it unchanged. The `data` -# stream reuses the proven cutout encoding (ICON LAM + IFS global); the `realv2` stream is a -# single LAM grid (no cutout mask) written to its own grib/realv2-*.grib files. lead_time: 120h write_initial_state: true allow_nans: true @@ -20,12 +10,6 @@ input: test: use_original_paths: true -# NOTE: top-level state post_processors (e.g. accumulate_from_start_of_forecast / -# forward_transform_filter for tp) are intentionally omitted. With this multi-output -# checkpoint + write_initial_state, they are applied to the realv2 diagnostic stream's -# initial state (which has no prognostic input) and fail with KeyError: 'date'. The -# VMAX_10M showcase does not need them; tp handling for the data stream is a follow-up. - output: # Global cutout state (ICON-CH1 LAM + AIFS N320 global). data: @@ -59,9 +43,7 @@ output: realv2: grib: path: grib/realv2-{date}{time:04}_{step:03}.grib - # VMAX_10M is diagnostic-only: there is no initial (analysis) state for it, so - # writing the initial step would fail with KeyError: 'date'. Skip it here; the - # diagnostic is first valid at the model step (e.g. +6h) anyway. + # diagnostic is first valid at the model step (e.g. +6h) write_initial_state: false encoding: typeOfGeneratingProcess: 2 From b42328528bb7d29e6d50fb230063b5400c2a1740 Mon Sep 17 00:00:00 2001 From: radiradev Date: Wed, 10 Jun 2026 13:25:37 +0200 Subject: [PATCH 3/5] showcase runs --- config/windgust.yaml | 5 ++-- ...multidataset-forecaster-windgust-ich1.yaml | 10 +++---- ...atch.yaml => sgm-windgust-ich1-patch.yaml} | 18 +++++++----- .../templates/icon-ch1_generate_templates.sh | 7 +---- .../scripts/inference_extract_requirements.py | 28 +++++++++++++++---- 5 files changed, 42 insertions(+), 26 deletions(-) rename resources/inference/metadata/{sgm-realv2-ich1-patch.yaml => sgm-windgust-ich1-patch.yaml} (96%) diff --git a/config/windgust.yaml b/config/windgust.yaml index 0909aeea..569d1a92 100644 --- a/config/windgust.yaml +++ b/config/windgust.yaml @@ -12,13 +12,12 @@ runs: checkpoint: /scratch/mch/rradev/output/checkpoint/9efa01f8c7464328897edb2c03a407c2/inference-last.ckpt label: realv2_vmax10m steps: 0/120/6 - config: resources/inference/configs/sgm-multidataset-forecaster-realv2-ich1.yaml + config: resources/inference/configs/sgm-multidataset-forecaster-windgust-ich1.yaml extra_requirements: # checkpoint needs: (1) EmptyInput.create_input_state propagates the date # (else KeyError 'date' in add_initial_forcings_to_input_state for the - # diagnostic-only realv2 stream), and (2) format_dataset_name preserves the - # {time:04}/{step:03} padding in GRIB output paths. - git+https://github.com/radiradev/anemoi-inference.git@fix/empty-input-propagate-date + - eccodes==2.39.1 - eccodes-cosmo-resources-python==2.38.3.1 diff --git a/resources/inference/configs/sgm-multidataset-forecaster-windgust-ich1.yaml b/resources/inference/configs/sgm-multidataset-forecaster-windgust-ich1.yaml index 28633f69..03fcec5d 100644 --- a/resources/inference/configs/sgm-multidataset-forecaster-windgust-ich1.yaml +++ b/resources/inference/configs/sgm-multidataset-forecaster-windgust-ich1.yaml @@ -39,7 +39,7 @@ output: inverse: true - assign_mask: # fill local/global overlapping points with nan mask: "global/cutout_mask" - # Regional diagnostic stream (VMAX_10M) on the REAL-CH1 / ICON-CH1 1km grid. + # Dianostic decoder on ICON grid only realv2: grib: path: grib/realv2-{date}{time:04}_{step:03}.grib @@ -52,7 +52,7 @@ output: samples: resources/templates_index_realch1.yaml # Remaps the `data` stream's IFS variable names (2t, 10u, tp, ...) to the COSMO -# shortNames expected by the ICON GRIB templates, AND gives the realv2 VMAX_10M -# diagnostic a whole-hour max period (['6h','12h']) so the GRIB time-processing -# encoder does not hit the integer-hour assertion on its native ['650m','12h'] period. -patch_metadata: resources/sgm-realv2-ich1-patch.yaml +# shortNames expected by the ICON GRIB templates, and gives the realv2 VMAX_10M +# diagnostic a whole-hour max period (['6h', '12h']) so its GRIB time-processing +# is encoded correctly. +patch_metadata: resources/sgm-windgust-ich1-patch.yaml diff --git a/resources/inference/metadata/sgm-realv2-ich1-patch.yaml b/resources/inference/metadata/sgm-windgust-ich1-patch.yaml similarity index 96% rename from resources/inference/metadata/sgm-realv2-ich1-patch.yaml rename to resources/inference/metadata/sgm-windgust-ich1-patch.yaml index d0fb0e61..61a237b9 100644 --- a/resources/inference/metadata/sgm-realv2-ich1-patch.yaml +++ b/resources/inference/metadata/sgm-windgust-ich1-patch.yaml @@ -6,10 +6,11 @@ config: dataset_config: dataset: cutout: - - dataset: /store_new/mch/msopr/ml/datasets/mch-ich1-1km-2024-2025-1h-pl13-ifsnames-v1.0.zarr - - dataset: /store_new/mch/msopr/ml/datasets/aifs-od-an-oper-0001-mars-n320-2016-2025-6h-v1-combined-land.zarr + - dataset: /store_new/mch/msopr/ml/datasets/mch-ich1-1km-2024-2025-1h-pl13-ifsnames-v1.0.zarr + - dataset: /store_new/mch/msopr/ml/datasets/aifs-od-an-oper-0001-mars-n320-2016-2025-6h-v1-combined-land.zarr start: null end: null + dataset: data: variables_metadata: @@ -320,8 +321,8 @@ dataset: step: 12 time: 0 period: - - 6h - - 12h + - 6h + - 12h process: accumulation u_100: mars: @@ -748,6 +749,9 @@ dataset: param: FI step: 12 time: 0 + # Diagnostic VMAX_10M emitted by the multi-output "realv2" stream. Gives the + # diagnostic a whole-hour max period so its GRIB time-processing is encoded + # correctly. realv2: variables_metadata: VMAX_10M: @@ -755,9 +759,9 @@ dataset: date: 20050101 levtype: sfc param: VMAX_10M - step: 720m + step: 12 time: 0 period: - - 6h - - 12h + - 6h + - 12h process: maximum diff --git a/resources/inference/templates/icon-ch1_generate_templates.sh b/resources/inference/templates/icon-ch1_generate_templates.sh index df8ca81b..f419cac6 100755 --- a/resources/inference/templates/icon-ch1_generate_templates.sh +++ b/resources/inference/templates/icon-ch1_generate_templates.sh @@ -20,10 +20,5 @@ grib_copy -w shortName=T,level=500 $PL_SAMPLE /dev/stdout | grib_set -d 0 - icon # template for typeOfLevel=meanSea grib_copy -w shortName=PMSL $SFC_SAMPLE /dev/stdout | grib_set -d 0 - icon-ch1-typeOfLevel=meanSea.grib -# template for VMAX_10M (max 10m wind speed) on the ICON-CH1 1km grid. -# Used by the realv2 output stream of the multi-output architecture. Derive it from -# the heightAboveGround template (which is in HOURS) and retarget to VMAX_10M @ 10m: -# setting shortName=VMAX_10M makes eccodes pick the max stepType automatically, while -# keeping stepUnits=hours. Extracting straight from the ICON source instead yields a -# minute-unit step (stepUnits=0), which mislabels the 6 h max window as 6 minutes. +#template for windgust grib_set -s shortName=VMAX_10M,level=10 -d 0 icon-ch1-typeOfLevel=heightAboveGround.grib icon-ch1-shortName=VMAX_10M.grib diff --git a/workflow/scripts/inference_extract_requirements.py b/workflow/scripts/inference_extract_requirements.py index dc3441ae..6ca043fa 100644 --- a/workflow/scripts/inference_extract_requirements.py +++ b/workflow/scripts/inference_extract_requirements.py @@ -9,6 +9,7 @@ import argparse import json +import re import sys import warnings from packaging.version import Version, InvalidVersion @@ -35,11 +36,17 @@ "torch-geometric", ] +def _requirement_name(token: str) -> str: + """Return the canonical package name from a requirement token. + + Strips any version specifier (``==``, ``>=``, ``<``, ``~=``, ``!=``, …) so that + e.g. ``eccodes>=2.44.0,<2.48.0`` and ``eccodes==2.39.1`` both key as ``eccodes``. + """ + return re.split(r"[<>=!~]", token, maxsplit=1)[0].strip() + + # Canonical names of BASE_DEPENDENCIES for membership tests (strips version pins). -_BASE_DEPENDENCY_NAMES: set[str] = set() -for _dep in BASE_DEPENDENCIES: - _base_name = _dep.split("==")[0].strip() if "==" in _dep else _dep.strip() - _BASE_DEPENDENCY_NAMES.add(_base_name) +_BASE_DEPENDENCY_NAMES: set[str] = {_requirement_name(_dep) for _dep in BASE_DEPENDENCIES} def load_provenance(metadata_path: str) -> dict: @@ -230,6 +237,12 @@ def parse_overrides(overrides: list[str]) -> dict[str, str | None]: elif any(item.startswith(prefix) for prefix in ("git+", "http://", "https://")): name = _parse_url_package_name(item) result[name] = item + elif re.search(r"[<>!~]", item): + # Non-`==` version specifier (e.g. ``eccodes>=2.44.0,<2.48.0``). Key by the + # canonical name so a later ``name==version`` override replaces it; keep the + # full specifier (incl. operator) as the value. + name = _requirement_name(item) + result[name] = item[len(name):].strip() else: result[item] = None @@ -318,7 +331,12 @@ def format_requirements( for name, version in sorted(pypi_requirements.items()): if name not in allowed: continue - line = f"{name}=={version}" if version else f"{name}" + if not version: + line = f"{name}" + elif version[0] in "<>=!~": # a PEP 508 specifier like ">=2.44.0,<2.48.0" + line = f"{name}{version}" + else: + line = f"{name}=={version}" line += " # Extra (not from checkpoint)" if name in overrides else "" lines.append(line) From 1578e13c9274fc35a6176963e9917ea2aa0bb354 Mon Sep 17 00:00:00 2001 From: radiradev Date: Wed, 10 Jun 2026 17:02:41 +0200 Subject: [PATCH 4/5] verif working --- config/windgust.yaml | 7 +++-- src/data_input/__init__.py | 43 +++++++++++++++++++++++++------ src/evalml/config.py | 19 +++++++++++++- src/plotting/compat.py | 6 +---- workflow/tools/config.schema.json | 13 ++++++++++ 5 files changed, 72 insertions(+), 16 deletions(-) diff --git a/config/windgust.yaml b/config/windgust.yaml index 569d1a92..d0ecdb54 100644 --- a/config/windgust.yaml +++ b/config/windgust.yaml @@ -3,9 +3,11 @@ description: | Showcase the multi-output "realv2" anemoi architecture (ICON-CH1 cutout forecaster with a diagnostic VMAX_10M stream on the REAL-CH1 / ICON-CH1 1km grid). -# Explicit init times for case studies / showcases. +# One week of daily initialisations for verification. dates: - - 2024-02-01T00:00 + start: 2024-02-01T00:00 + end: 2024-02-07T00:00 + frequency: 24h runs: - forecaster: @@ -76,6 +78,7 @@ profile: mem_mb_per_cpu: 1800 runtime: "1h" gpus: 0 + slurm_extra: "--exclusive" # whole nodes; avoid oversubscription jobs: 50 batch_rules: plot_forecast_frame: 32 diff --git a/src/data_input/__init__.py b/src/data_input/__init__.py index 6c14005a..6b6cb0db 100644 --- a/src/data_input/__init__.py +++ b/src/data_input/__init__.py @@ -64,6 +64,7 @@ def load_analysis_data_from_zarr( "PS": "sp", "PMSL": "msl", "TOT_PREC": "tp", + "VMAX_10M": "VMAX_10M", } tot_prec_string = "TOT_PREC_6H" if min(np.diff(steps)) == 6 else "TOT_PREC_1H" PARAMS_MAP_COSMO1 = { @@ -115,16 +116,26 @@ def load_analysis_data_from_zarr( return _select_valid_times(ds, times) -def _collect_ml_grib_files(root: Path, steps: list[int] | None = None) -> list[Path]: +# Diagnostic params emitted by the multi-output "realv2" stream. They are written to a +# sibling ``realv2-*.grib`` file (same LAM grid as the main output) rather than the main +# ``{date}{time}_{step}.grib``, so loaders must source them from there. +REALV2_PARAMS = frozenset({"VMAX_10M"}) + + +def _collect_ml_grib_files( + root: Path, steps: list[int] | None = None, prefix: str = "" +) -> list[Path]: """Return GRIB files for an ML inference run (flat directory layout). When `steps` is provided, the discovered files are filtered to those whose - name ends with ``_{step:03d}.grib``. + name ends with ``_{step:03d}.grib``. `prefix` selects an output stream: the + default ``""`` matches the main ``20*.grib`` outputs, while ``"realv2-"`` + matches the diagnostic ``realv2-20*.grib`` sibling files. """ # TODO: this glob pattern is a dirty fix for anemoi-inference writing outputs # with wrong formatting. Eventually we will either have to have a fix upstream # or write a single output file. - files = sorted(root.glob("20*.grib")) + files = sorted(root.glob(f"{prefix}20*.grib")) if steps is None: return files @@ -747,11 +758,27 @@ def load_forecast_data( root = Path(root) if any(root.glob("*.grib")): LOG.info("Loading forecasts from GRIB files...") - return load_forecast_data_from_grib( - # NOTE: root is already for a specific reftime - files=_collect_ml_grib_files(root, steps), - params=params, - ) + # Diagnostic "realv2" params (e.g. VMAX_10M) live in sibling realv2-*.grib + # files; load them separately and merge with the main output stream. + main_params = [p for p in params if p not in REALV2_PARAMS] + realv2_params = [p for p in params if p in REALV2_PARAMS] + datasets = [] + if main_params: + datasets.append( + load_forecast_data_from_grib( + # NOTE: root is already for a specific reftime + files=_collect_ml_grib_files(root, steps), + params=main_params, + ) + ) + if realv2_params: + datasets.append( + load_forecast_data_from_grib( + files=_collect_ml_grib_files(root, steps, prefix="realv2-"), + params=realv2_params, + ) + ) + return datasets[0] if len(datasets) == 1 else xr.merge(datasets) if "INCA" in root.parts: LOG.info("Loading INCA baseline from NetCDF files...") return load_INCA_baseline_from_netcdf(root, reftime, steps, params) diff --git a/src/evalml/config.py b/src/evalml/config.py index e101afc3..75bf501e 100644 --- a/src/evalml/config.py +++ b/src/evalml/config.py @@ -404,10 +404,27 @@ class DefaultResources(BaseModel): cpus_per_task: int = Field(..., ge=1, description="Number of CPUs per task.") mem_mb_per_cpu: int = Field(..., ge=1, description="Memory per CPU in MB.") runtime: str = Field(..., description="Maximum runtime, e.g. '1h'.") + slurm_extra: str | None = Field( + None, + description=( + "Extra sbatch flags applied to every job, e.g. '--exclusive' to " + "request whole nodes and avoid sharing (oversubscribing) them." + ), + ) def parsable(self) -> str: """Convert the default resources to a string of key=value pairs.""" - return [f"{key}={value}" for key, value in self.model_dump().items()] + items = [] + for key, value in self.model_dump().items(): + if value is None: + continue + if key == "slurm_extra": + # Snakemake evaluates resource values as Python expressions; wrap the + # flag string in quotes so e.g. --exclusive is taken as a literal string. + items.append(f'{key}="{value}"') + else: + items.append(f"{key}={value}") + return items class GlobalResources(BaseModel): diff --git a/src/plotting/compat.py b/src/plotting/compat.py index 8a5aff63..a1211b2e 100644 --- a/src/plotting/compat.py +++ b/src/plotting/compat.py @@ -4,7 +4,7 @@ import geopandas as gpd import numpy as np from shapely.geometry import MultiPoint -from data_input import load_from_grib_file +from data_input import load_from_grib_file, REALV2_PARAMS PARAMS_MAP = { @@ -19,10 +19,6 @@ PARAMS_MAP_INV = {v: k for k, v in PARAMS_MAP.items()} -# Diagnostic params emitted by the multi-output "realv2" stream. These are written to a -# sibling grib/realv2-*.grib file (same LAM grid as the main output), so redirect to it. -REALV2_PARAMS = frozenset({"VMAX_10M"}) - def load_state_from_grib( file: Path, paramlist: list[str] | None = None diff --git a/workflow/tools/config.schema.json b/workflow/tools/config.schema.json index d5760f85..4d35da24 100644 --- a/workflow/tools/config.schema.json +++ b/workflow/tools/config.schema.json @@ -159,6 +159,19 @@ "description": "Maximum runtime, e.g. '1h'.", "title": "Runtime", "type": "string" + }, + "slurm_extra": { + "anyOf": [ + { + "type": "string" + }, + { + "type": "null" + } + ], + "default": null, + "description": "Extra sbatch flags applied to every job, e.g. '--exclusive' to request whole nodes and avoid sharing (oversubscribing) them.", + "title": "Slurm Extra" } }, "required": [ From d357202fd83c04b679d0952d6c918d64d8297826 Mon Sep 17 00:00:00 2001 From: radiradev Date: Wed, 10 Jun 2026 19:44:13 +0200 Subject: [PATCH 5/5] add peakweather comparison --- config/windgust-peakweather.yaml | 84 ++++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) create mode 100644 config/windgust-peakweather.yaml diff --git a/config/windgust-peakweather.yaml b/config/windgust-peakweather.yaml new file mode 100644 index 00000000..19ec72a3 --- /dev/null +++ b/config/windgust-peakweather.yaml @@ -0,0 +1,84 @@ +# yaml-language-server: $schema=../workflow/tools/config.schema.json +description: | + As windgust.yaml (realv2 VMAX_10M showcase), but verifies against PeakWeather + station observations instead of the REAL-CH1 gridded analysis. + +# One week of daily initialisations for verification. +dates: + start: 2024-02-01T00:00 + end: 2024-02-07T00:00 + frequency: 24h + +runs: + - forecaster: + checkpoint: /scratch/mch/rradev/output/checkpoint/9efa01f8c7464328897edb2c03a407c2/inference-last.ckpt + label: realv2_vmax10m + steps: 0/120/6 + config: resources/inference/configs/sgm-multidataset-forecaster-windgust-ich1.yaml + extra_requirements: + # checkpoint needs: (1) EmptyInput.create_input_state propagates the date + # (else KeyError 'date' in add_initial_forcings_to_input_state for the + - git+https://github.com/radiradev/anemoi-inference.git@fix/empty-input-propagate-date + + - eccodes==2.39.1 + - eccodes-cosmo-resources-python==2.38.3.1 + + - baseline: + label: ICON-CH1-ctrl + root: /store_new/mch/msopr/osm/ICON-CH1-EPS + steps: 0/33/6 + +truth: + label: PeakWeather + root: output/data/observations/peakweather + +experiment: + params: + - VMAX_10M + stratification: + regions: + - jura + - mittelland + - voralpen + - alpennordhang + - innerealpentaeler + - alpensuedseite + root: /scratch/mch/bhendj/regions/Prognoseregionen_LV95_20220517 + thresholds: + VMAX_10M: + gt: [10.0, 20.0, 30.0] + dashboard: + stratification: + - season + +showcase: + params: + - T_2M + - SP_10M + - VMAX_10M + meteograms: + enabled: false + stations: [JUN] + animations: + enabled: true + domains: + - icon-ch + - switzerland + +locations: + output_root: output/ + +profile: + executor: slurm + global_resources: + gpus: 16 + default_resources: + slurm_partition: "postproc" + cpus_per_task: 1 + mem_mb_per_cpu: 1800 + runtime: "1h" + gpus: 0 + slurm_extra: "--exclusive" # whole nodes; avoid oversubscription + jobs: 50 + batch_rules: + plot_forecast_frame: 32