Fix holdout state restore and seed-namespace collision by MaxGhenis · Pull Request #97 · PolicyEngine/microcalibrate

MaxGhenis · 2026-04-17T12:41:05Z

Summary

Two related bugs in the holdout code paths:

Finding Add tests on impossible synthetic data #9 (LOW): evaluate_holdout_robustness (and the corresponding tune_l0_hyperparameters finally block) restored captured state only when the captured value was not None. Callers that started with excluded_targets=None therefore had the last holdout set's target list silently stuck on the calibrator, and any subsequent calibrate() call trained on the last holdout's train split.

Restoration is now unconditional: if the attribute started as None we restore it to None. exclude_targets() is called unconditionally after restoration to refresh derived state.
Finding Add synthetic data #10 (LOW): Tuning called _create_holdout_sets(..., seed) and robustness evaluation called _create_holdout_sets(..., seed + 1). Since _create_holdout_sets iterates base_seed + i, tuning's holdout Add repo documentation, tests #1 and robustness's holdout #0 both reduced to seed + 1, yielding identical sets — "independent" evaluation reused a tuning holdout.

Robustness now uses an orthogonal namespace (seed + 10_000), eliminating the deterministic collision.

Test plan

Add tests/test_holdout_state_restore.py: None-valued excluded_targets is restored to None after robustness evaluation; the index-aligned tuning/robustness holdout pairs are no longer deterministically identical.
All existing tests pass (uv run pytest tests -x -q -> 17 passed).

🤖 Generated with Claude Code

Two related bugs in the holdout code paths: Finding #9: ``evaluate_holdout_robustness`` (and the corresponding ``tune_l0_hyperparameters`` finally block) restored captured state only when the captured value was not None. Callers that started with ``excluded_targets=None`` therefore had the last holdout set's target list silently stuck on the calibrator, and any subsequent ``calibrate()`` call trained on the last holdout's train split. Restoration is now unconditional: if the attribute started as None we restore it to None. ``exclude_targets()`` is called unconditionally after restoration to refresh derived state (targets / target_names / estimate matrix) regardless of whether excluded_targets is None or a list. Finding #10: tuning called ``_create_holdout_sets(..., seed)`` and robustness evaluation called ``_create_holdout_sets(..., seed + 1)``. Since ``_create_holdout_sets`` iterates ``base_seed + i``, tuning's holdout #1 and robustness's holdout #0 both reduced to ``seed + 1``, yielding identical sets. Robustness now uses an orthogonal namespace (``seed + 10_000``), eliminating the deterministic collision that leaked tuning data into reportedly-independent evaluation. Adds tests/test_holdout_state_restore.py covering (a) None-valued ``excluded_targets`` is restored to None after robustness evaluation, and (b) the index-aligned tuning/robustness holdout pairs are no longer deterministically identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-04-17T12:41:11Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
microcalibrate	Ready	Preview, Comment	Apr 17, 2026 0:41am

MaxGhenis

LGTM (cannot self-approve; posting as comment).

Finding #9 (state restore):

evaluate_holdout_robustness finally block: setattr is unconditional — excluded_targets=None originals are correctly restored to None (previously stuck on the last holdout's list).
exclude_targets() is called unconditionally after restore. Verified safe: the method has an internal if self.excluded_targets: guard and the None branch correctly resets self.estimate_matrix, self.estimate_function, self.targets, self.target_names from originals.
tune_l0_hyperparameters finally block: direct assignment with None-safe .copy() handles both None and ndarray originals.

Finding #10 (seed namespace):

Robustness evaluation now uses seed + 10_000 (vs tuning's seed + i range). The previous collision (seed + 1) == ((seed + 1) + 0) for tuning[i] vs robustness[i-1] no longer fires — test_robustness_uses_orthogonal_seed_namespace asserts index-aligned pairs differ.
calibration.seed + 10_000 if calibration.seed is not None else None correctly propagates the None case.
Orthogonal to tuning seed (10_000 offset is far larger than practical n_holdout_sets).

Test file is thorough — covers None restoration, post-holdout state integrity, and seed orthogonality.

MaxGhenis commented Apr 17, 2026

View reviewed changes

MaxGhenis merged commit 1c68841 into main Apr 17, 2026
6 checks passed

MaxGhenis deleted the fix/holdout-state-restore branch April 17, 2026 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix holdout state restore and seed-namespace collision#97

Fix holdout state restore and seed-namespace collision#97
MaxGhenis merged 1 commit into
mainfrom
fix/holdout-state-restore

MaxGhenis commented Apr 17, 2026

Uh oh!

vercel Bot commented Apr 17, 2026

Uh oh!

MaxGhenis left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 17, 2026

Summary

Test plan

Uh oh!

vercel Bot commented Apr 17, 2026

Uh oh!

MaxGhenis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant