Add per-quantile-level reporting for quantile metrics by shchur · Pull Request #156 · autogluon/fev

shchur · 2026-06-24T09:39:46Z

Summary

Lets fev report quantile metrics (MQL / WQL / SQL) both overall (as today) and broken down per quantile level. With quantile_levels=[0.1, 0.5, 0.9] and per_quantile_scores=True, the summary gains SQL[0.1], SQL[0.5], SQL[0.9] alongside the overall SQL.

task.evaluation_summary(preds, model_name="m", per_quantile_scores=True)
# -> {..., "SQL": ..., "SQL[0.1]": ..., "SQL[0.5]": ..., "SQL[0.9]": ...}

Design

QuantileMetric base class for MQL/WQL/SQL. Subclasses implement only _per_quantile_level(...) -> np.ndarray ([Q]); the overall score is defined as mean over levels. So the overall score always equals the mean of the per-level scores by construction — single code path, cannot drift.
Metric.compute_scores(...) -> dict[str, float] is the new emission entry point. The base returns {self.name: self.compute(...)}; QuantileMetric overrides it to optionally add the per-level keys. compute() keeps returning a scalar, so test_error / leaderboards are unchanged.
Reporting is a call-time choice, not task state: per_quantile_scores is a kwarg on evaluation_summary (threaded to compute_metrics → compute_scores). It deliberately does not become a Task field, so it stays out of to_dict() / YAML / the task fingerprint.
metrics_per_window switched to a defaultdict(list) since the per-level keys aren't known up front.

Compatibility

Default per_quantile_scores=False → summary schema is identical to before.
Non-quantile metrics are completely unaffected (inherit the single-key compute_scores).
Verified the refactored MQL/WQL/SQL compute() matches the previous implementations exactly, including with NaNs and too-short histories.

Tests

All existing metric tests pass (incl. the AutoGluon cross-check for every metric).
New tests: overall == mean of per-level for each quantile metric; no per-level keys when disabled; non-quantile metrics get no breakdown.

apointa · 2026-06-24T14:16:12Z

+            seasonality=seasonality,
+            quantile_levels=quantile_levels,
+        )  # [Q]
+        return float(np.mean(per_level))


Not sure if intended but slight change to the previous logic as before the mean over the quantiles was nan safe and here not.

This should have no effect since per_level already shouldn't contain NaNs. The NaNs might be present at some time steps in the target (never in predictions), so after we average across time & items [T, N] there should be no NaNs left in the array of shape [Q].

okay make sense.
just for my understanding: it would become nan when ALL predictions of a specific quantiles are nan right? but in this case we don"t want to ignore it as it would mean you could get a better SQL by don"t providing the hard quantiles. which is also why the prediction are not allowed to be nan in general (as it would "sub-select" the aggregated based on the provided ones right?)

Currently we have a check here

fev/src/fev/task.py

Line 827 in 516786b

if not pc.all(pc.is_finite(flat)).as_py():

that raises an error if there are any NaNs in the predictions, so only NaNs in the target are permitted.

This means the only scenarios where the quantile loss is NaN for one quantile is when all target values are NaN, but then loss will be NaN for all quantiles and metrics in general, which is easy to spot.

shchur added 2 commits June 24, 2026 09:39

Add per-quantile-level reporting for quantile metrics

e4b59e4

Trim docstrings, assert per-level length

00bf161

shchur requested a review from apointa June 24, 2026 09:58

apointa reviewed Jun 24, 2026

View reviewed changes

apointa approved these changes Jun 24, 2026

View reviewed changes

shchur merged commit 100695e into autogluon:main Jun 24, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add per-quantile-level reporting for quantile metrics#156

Add per-quantile-level reporting for quantile metrics#156
shchur merged 2 commits into
autogluon:mainfrom
shchur:per-quantile-metrics

shchur commented Jun 24, 2026

Uh oh!

apointa Jun 24, 2026

Uh oh!

shchur Jun 24, 2026

Uh oh!

apointa Jun 24, 2026 •

edited

Loading

Uh oh!

shchur Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

shchur commented Jun 24, 2026

Summary

Design

Compatibility

Tests

Uh oh!

apointa Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

shchur Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

apointa Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shchur Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

apointa Jun 24, 2026 •

edited

Loading