Variance propagation has silent biases in calc_alpha_double, tmpw averaging, and percentile/var disagreement

Three independent variance-propagation defects across `calibrate_utils.py` and `dts_accessor.py` that are not covered by #231. Each silently produces a wrong number for downstream uncertainty.

These bite the user-facing fields `param_covs["alpha"]`, `tmpf_var`, `tmpb_var`, `tmpw_var`, `tmpw_mc_avg1_var`, `tmpw_mc_avgx1_var`, and `tmpf/tmpb_mc_avg1_var` / `_avgx1_var`. The directions of bias differ — see each item.

## 1. `calc_alpha_double`: `Var(E)` is 2× too large outside reference sections

**Location:** `src/dtscalibration/calibrate_utils.py:2203, 2241, 2245`

The estimator is

```python
A = (i_bw - i_fw) / 2 + (D_B - D_F) / 2 + (ta_arr_bw - ta_arr_fw) / 2
```

so each independent ingredient enters with coefficient ±1/2 and standard first-order error propagation gives `Var(A) = (1/4) · sum(Var(each))`. The code instead computes:

```python
A_var = (i_var_fw + i_var_bw + D_B_var + D_F_var + ta_arr_fw_var + ta_arr_bw_var) / 2
```

i.e. `sum/2` instead of `sum/4`. Same factor in the `mode="guess"` branch (line 2203) and the no-trans-att branch (line 2245).

After the inverse-variance-weighted mean over time at lines 2248–2249, the output `E_var` is exactly 2× the correct value. This propagates 1:1 into `param_covs["alpha"]` for every `x` outside the reference sections (inside the reference sections the WLS solver supplies the variance directly and is unaffected). Downstream, `tmpf_var`, `tmpb_var`, `tmpw_var` are inflated by the alpha contribution × 2.

**Fix:** replace `/ 2` with `/ 4` at lines 2203, 2241, 2245.

**Worked check (nt=2, all input vars=1, no D, no TA):** correct `Var(A) = 0.5`, `Var(E) = 1/(2·(1/0.5)) = 0.25`. Code returns `Var(A) = 1`, `Var(E) = 0.5`.

## 2. `tmpw_mc_avg1_var` and `tmpw_mc_avgx1_var` skip centering and `ddof=1`

**Location:** `src/dtscalibration/dts_accessor.py:2690-2692, 2744`

For `tmpf` and `tmpb` the same averaging code (loop body at ~`:2613-2619`, single-ended at `:2199-2214`) computes

```python
q = mc_set - avgsec
qvar = q.var(dim=["mc", time_dim2], ddof=1)
```

— centered residual variance with the unbiased denominator. The `tmpw` analog at line 2690 is

```python
out["tmpw_mc_avg1_var"] = mcparams["tmpw_mc_set"].var(dim=["mc", time_dim2])
```

— uncentered, default `ddof=0`. Line 2744 has the same shape for the x-axis variant: it also drops the `mc` dim from the variance reduction (only `dim=x_dim2`), so the quantity is not even of the same kind as the other three.

Decomposing `tmpw_mc_set[m,x,t] = tmpw_avgsec(x,t) + noise(m,x,t)` with zero-mean noise per `(x,t)`:
- centered `Var_{m,t}(noise)`: pure MC-noise variance, marginalized over t — what the per-channel paths report.
- uncentered `Var_t(tmpw_avgsec) + Var_{m,t}(noise)`: spatially/temporally varying signal plus noise.

The reported `tmpw_mc_avg1_var` is inflated by `Var_t(tmpw_avgsec(x,t))` and is not comparable in magnitude with `tmpf_mc_avg1_var` / `tmpb_mc_avg1_var` even on noise-free data.

**Fix:** match the per-channel form:

```python
q = mcparams["tmpw_mc_set"] - out["tmpw_avgsec"]
out["tmpw_mc_avg1_var"]  = q.var(dim=["mc", time_dim2], ddof=1)
out["tmpw_mc_avgx1_var"] = q.var(dim=["mc", x_dim2],    ddof=1)
```

## 3. `_avg*_var` (centered residual variance) vs. `_avg*` percentile (uncentered): the two report different quantities

**Location:** `src/dtscalibration/dts_accessor.py:2199-2214` (single-ended `_avgx1`), `:2250-2265` (single-ended `_avg1`), `:2566-2581` and `:2617-2632` (double-ended counterparts)

Adjacent in the same code block, the variance and the percentile are computed from different statistics:

- `qvar = (T_mc − T_pt).var(dim=[mc, x_dim2], ddof=1)` — centered, projects out `Var_x(T_pt)`
- `np.percentile(T_mc, q=conf_ints, axis=(mc, x))` — uncentered, includes `Var_x(T_pt)`

Under the docstring's stated use case ("CI for a new measurement at a random location with parameter uncertainty"), the percentile is the right quantity; the variance under-reports by exactly `Var_x(T_pt)` (or `Var_t(T_pt)` for `_avg1_var`). Whichever interpretation is intended, exactly one of the two outputs is wrong, and they aren't comparable.

**Fix:** pick the interpretation that matches the docstring. If the percentile is canonical, drop the centering: `qvar = T_mc.var(dim=[mc, x_dim2], ddof=1)`. If a "spatial-mean variance" is intended, the formula needs to be `T_mc.mean(x_dim2).var(mc)` instead. The current mixed form matches neither.

## Suggested tests

- Synthetic `nt=2` dataset with all input variances = 1, no D, no TA → assert `Var(E) == 0.25` exactly. Currently returns 0.5.
- Synthetic dataset with `T_pt(x,t)` non-trivially varying in time → assert `tmpw_mc_avg1_var ≈ tmpf_mc_avg1_var ≈ tmpb_mc_avg1_var` in magnitude under symmetric noise. Currently the tmpw value is inflated by `Var_t(tmpw_avgsec)`.
- Same dataset with `T_pt` constant in x → assert `_avgx1_var` and the spread of `_avgx1` percentile across MC are consistent.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variance propagation has silent biases in calc_alpha_double, tmpw averaging, and percentile/var disagreement #236

1. `calc_alpha_double`: `Var(E)` is 2× too large outside reference sections

2. `tmpw_mc_avg1_var` and `tmpw_mc_avgx1_var` skip centering and `ddof=1`

3. `_avg_var` (centered residual variance) vs. `_avg` percentile (uncentered): the two report different quantities

Suggested tests

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Variance propagation has silent biases in calc_alpha_double, tmpw averaging, and percentile/var disagreement #236

Description

1. calc_alpha_double: Var(E) is 2× too large outside reference sections

2. tmpw_mc_avg1_var and tmpw_mc_avgx1_var skip centering and ddof=1

3. _avg*_var (centered residual variance) vs. _avg* percentile (uncentered): the two report different quantities

Suggested tests

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `calc_alpha_double`: `Var(E)` is 2× too large outside reference sections

2. `tmpw_mc_avg1_var` and `tmpw_mc_avgx1_var` skip centering and `ddof=1`

3. `_avg_var` (centered residual variance) vs. `_avg` percentile (uncentered): the two report different quantities