Skip to content

Fix holdout state restore and seed-namespace collision#97

Merged
MaxGhenis merged 1 commit into
mainfrom
fix/holdout-state-restore
Apr 17, 2026
Merged

Fix holdout state restore and seed-namespace collision#97
MaxGhenis merged 1 commit into
mainfrom
fix/holdout-state-restore

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Two related bugs in the holdout code paths:

  • Finding Add tests on impossible synthetic data #9 (LOW): evaluate_holdout_robustness (and the corresponding tune_l0_hyperparameters finally block) restored captured state only when the captured value was not None. Callers that started with excluded_targets=None therefore had the last holdout set's target list silently stuck on the calibrator, and any subsequent calibrate() call trained on the last holdout's train split.

    Restoration is now unconditional: if the attribute started as None we restore it to None. exclude_targets() is called unconditionally after restoration to refresh derived state.

  • Finding Add synthetic data #10 (LOW): Tuning called _create_holdout_sets(..., seed) and robustness evaluation called _create_holdout_sets(..., seed + 1). Since _create_holdout_sets iterates base_seed + i, tuning's holdout Add repo documentation, tests #1 and robustness's holdout #0 both reduced to seed + 1, yielding identical sets — "independent" evaluation reused a tuning holdout.

    Robustness now uses an orthogonal namespace (seed + 10_000), eliminating the deterministic collision.

Test plan

  • Add tests/test_holdout_state_restore.py: None-valued excluded_targets is restored to None after robustness evaluation; the index-aligned tuning/robustness holdout pairs are no longer deterministically identical.
  • All existing tests pass (uv run pytest tests -x -q -> 17 passed).

🤖 Generated with Claude Code

Two related bugs in the holdout code paths:

Finding #9: ``evaluate_holdout_robustness`` (and the corresponding
``tune_l0_hyperparameters`` finally block) restored captured state
only when the captured value was not None. Callers that started with
``excluded_targets=None`` therefore had the last holdout set's target
list silently stuck on the calibrator, and any subsequent
``calibrate()`` call trained on the last holdout's train split.

Restoration is now unconditional: if the attribute started as None we
restore it to None. ``exclude_targets()`` is called unconditionally
after restoration to refresh derived state (targets / target_names /
estimate matrix) regardless of whether excluded_targets is None or a
list.

Finding #10: tuning called ``_create_holdout_sets(..., seed)`` and
robustness evaluation called ``_create_holdout_sets(..., seed + 1)``.
Since ``_create_holdout_sets`` iterates ``base_seed + i``, tuning's
holdout #1 and robustness's holdout #0 both reduced to ``seed + 1``,
yielding identical sets. Robustness now uses an orthogonal namespace
(``seed + 10_000``), eliminating the deterministic collision that
leaked tuning data into reportedly-independent evaluation.

Adds tests/test_holdout_state_restore.py covering (a) None-valued
``excluded_targets`` is restored to None after robustness evaluation,
and (b) the index-aligned tuning/robustness holdout pairs are no
longer deterministically identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
microcalibrate Ready Ready Preview, Comment Apr 17, 2026 0:41am

Request Review

Copy link
Copy Markdown
Contributor Author

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (cannot self-approve; posting as comment).

Finding #9 (state restore):

  • evaluate_holdout_robustness finally block: setattr is unconditional — excluded_targets=None originals are correctly restored to None (previously stuck on the last holdout's list).
  • exclude_targets() is called unconditionally after restore. Verified safe: the method has an internal if self.excluded_targets: guard and the None branch correctly resets self.estimate_matrix, self.estimate_function, self.targets, self.target_names from originals.
  • tune_l0_hyperparameters finally block: direct assignment with None-safe .copy() handles both None and ndarray originals.

Finding #10 (seed namespace):

  • Robustness evaluation now uses seed + 10_000 (vs tuning's seed + i range). The previous collision (seed + 1) == ((seed + 1) + 0) for tuning[i] vs robustness[i-1] no longer fires — test_robustness_uses_orthogonal_seed_namespace asserts index-aligned pairs differ.
  • calibration.seed + 10_000 if calibration.seed is not None else None correctly propagates the None case.
  • Orthogonal to tuning seed (10_000 offset is far larger than practical n_holdout_sets).

Test file is thorough — covers None restoration, post-holdout state integrity, and seed orthogonality.

@MaxGhenis MaxGhenis merged commit 1c68841 into main Apr 17, 2026
6 checks passed
@MaxGhenis MaxGhenis deleted the fix/holdout-state-restore branch April 17, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant