Skip to content

Reported calibration uncertainties are systematically too narrow #231

@bdestombe

Description

@bdestombe

Two independent bugs that both bias calibration confidence intervals downward. Users who rely on tmpf_var, tmpb_var, tmpw_var, or MC percentile CIs for uncertainty quantification are getting overconfident estimates.

1. Wrong variance propagation for fixed parameters

When a user passes fix_gamma=(value, var), fix_alpha=(value, var), or fix_dalpha=(value, var) to single- or double-ended calibration, the helper functions inflate the WLS weights by adding the parameter variance contribution to 1/w. The formula used is:

w = 1 / (1 / w + fix_gamma[1] * X_gamma)

This is dimensionally and statistically wrong. The propagated variance from a fixed parameter θ with variance σ²_θ is (∂y/∂θ)² · σ²_θ, i.e. the Jacobian column squared times the variance, not the column itself times the variance. The correct form is:

w = 1 / (1 / w + (X_gamma ** 2) * fix_gamma[1])

Affected lines (all use the wrong coef · Var form):

  • src/dtscalibration/calibrate_utils.py:127 — single-ended fix_gamma
  • :136 — single-ended fix_alpha
  • :152-163 — single-ended fix_dalpha
  • :634, :661 — double-ended fix_gamma
  • :778, :812 — double-ended fix_alpha
  • :984 — double-ended (further fix_* branch)

Symptom: whenever any parameter is fixed, the inflated measurement uncertainty caused by parameter uncertainty is underestimated, so reported tmpf_var / tmpb_var are too small. Existing tests for fix_gamma (e.g. tests/test_dtscalibration.py:707) only check recovered temperatures, never the propagated variance, so this has gone undetected.

2. ddof ignores number of fitted parameters in residual variance

src/dtscalibration/variance_helpers.py:35 (constant) and :129 (exponential) compute residual variance as:

var_I = resid.var(ddof=1)

The number of fitted parameters in variance_stokes_constant_helper is nt + nxs (one scale per time step, one per spatial location); in variance_stokes_exponential_helper it is n_sections + nt * n_sections. The unbiased denominator should be N − npar, not N − 1. The author was aware — the comment at variance_helpers.py:34 says "originally thought it was npar" — but the choice was deliberately set to 1.

Symptom: the estimated Stokes-noise variance is biased downward. Calibration weights are too large, and downstream MC confidence intervals are too narrow. The ddof variable computed at :75 is even dead code: it is used only to size the statsmodels design matrix at :89 and is never used for the variance itself in the scipy branch.

Fix: use np.sum(resid ** 2) / (N - npar) with the correct npar for each estimator.

Suggested tests

  • Analytic variance recovery under fix_gamma: synthesize data with known gamma, calibrate with fix_gamma=(gamma_true, sigma_gamma**2) for a non-zero sigma_gamma**2, then assert tmpf_var equals the analytical expression (T**2 / gamma)**2 * sigma_gamma**2 + measurement_term to rtol=1e-12. Will fail on current code and demonstrate the wrong exponent.
  • ddof bias check: with synthetic IID Gaussian residuals of known variance sigma**2, assert that variance_stokes_constant_helper recovers sigma**2 to within the expected sampling error of an unbiased estimator (std-err ≈ sigma**2 * sqrt(2 / (N - npar))). Tighten existing decimal=1 tolerances in tests/test_variance_stokes.py to expose the bias.
  • Limiting case: with nt=2, nxs=2 and a small synthetic dataset, the (N-1) vs (N-npar) discrepancy is large enough to verify exactly against a hand-computed expected variance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions