[OpenSTEF 4.0] Periodic hyperparameter retuning in the backtesting baseline

As a benchmark operator
I want to configure the openstef4 backtesting baseline to retune hyperparameters on a schedule (e.g. every 7 days of backtest time)
So that benchmark results reflect realistic periodic model maintenance and capture the benefit of adaptive tuning over a static config

❗Priority (What if we don't do this?/Are there any deadlines? etc.)

Currently the backtest always uses a fixed hyperparameter config. Real production deployments periodically retune. This skews benchmark results optimistically for tuned models and makes it hard to evaluate whether periodic retuning actually improves forecast quality.

Definition of Done:

✅ Acceptance criteria

- [ ] `TuningSchedule` config class added to `openstef-models` (alongside `HyperparameterTuner`); controls `retune_every`, `n_trials`, `metric_name`
- [ ] `OpenSTEF4BacktestForecaster` accepts an optional `tuning_schedule` field; when set, runs `HyperparameterTuner` before each `fit()` that falls on the schedule
- [ ] Works for both single-model and ensemble configs: for ensembles, each base-model member is tuned independently and the ensemble config is reassembled with tuned sub-configs
- [ ] Tuned config is cached between retune windows (does not retune on every fit)
- [ ] `TuningSchedule` is optional — existing users who don't set it see zero behaviour change
- [ ] `openstef-beam[baselines,tuning]` extras cover all new dependencies (optuna stays optional)

📄 Documentation criteria:

- Update the openstef4 baseline docstring / README section
- Add example usage to the benchmarking tutorial or a new tutorial showing backtesting + tuning

🧪 Test criteria:

- Unit test: `TuningSchedule.is_due()` with various `retune_every` values
- Unit test: `OpenSTEF4BacktestForecaster.fit()` with tuning schedule — verify tuner is called only when due, carries forward tuned config between retune windows
- Unit test: ensemble path — verify each member is tuned separately

⌛ Dependencies:

- #851 (Bayesian hyperparameter tuning via Optuna) must be merged first — `HyperparameterTuner` lives there

🚀 Releasing:

Part of OpenSTEF 4.0; no separate release needed

Other information:

🌍 Background

Proposed design (from design discussion):

- `TuningSchedule` is a Pydantic `BaseModel` (or frozen base-class-based design if generics become unwieldy). It owns:
  - `retune_every: timedelta` — how often to retune
  - `n_trials: int` — optuna trial budget per retune
  - `metric_name: str` — objective metric
  - `direction: Literal["minimize", "maximize"]` — objective direction
  - `is_due(horizon, last_tuned_at)` and `mark_done(horizon)` behaviour (via private state or passed-in state)
- In `OpenSTEF4BacktestForecaster.fit()`, before `workflow.fit()`:
  1. Check `tuning_schedule.is_due(data.horizon)`
  2. If yes → run `HyperparameterTuner(config=current_config, train_dataset=training_data, ...)`
  3. Cache `result.best_config` as `_tuned_config`
  4. Build the workflow from `_tuned_config` (instead of `workflow_template.config`)
- For ensembles: the baseline iterates `ensemble_config.members`, tunes each `member.config` separately, reassembles

The split/tune/reassemble logic for ensembles belongs in the baseline, not in `TuningSchedule`, keeping each layer scoped to what it knows.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenSTEF 4.0] Periodic hyperparameter retuning in the backtesting baseline #852

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[OpenSTEF 4.0] Periodic hyperparameter retuning in the backtesting baseline #852

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions