Fix holdout state restore and seed-namespace collision#97
Merged
Conversation
Two related bugs in the holdout code paths: Finding #9: ``evaluate_holdout_robustness`` (and the corresponding ``tune_l0_hyperparameters`` finally block) restored captured state only when the captured value was not None. Callers that started with ``excluded_targets=None`` therefore had the last holdout set's target list silently stuck on the calibrator, and any subsequent ``calibrate()`` call trained on the last holdout's train split. Restoration is now unconditional: if the attribute started as None we restore it to None. ``exclude_targets()`` is called unconditionally after restoration to refresh derived state (targets / target_names / estimate matrix) regardless of whether excluded_targets is None or a list. Finding #10: tuning called ``_create_holdout_sets(..., seed)`` and robustness evaluation called ``_create_holdout_sets(..., seed + 1)``. Since ``_create_holdout_sets`` iterates ``base_seed + i``, tuning's holdout #1 and robustness's holdout #0 both reduced to ``seed + 1``, yielding identical sets. Robustness now uses an orthogonal namespace (``seed + 10_000``), eliminating the deterministic collision that leaked tuning data into reportedly-independent evaluation. Adds tests/test_holdout_state_restore.py covering (a) None-valued ``excluded_targets`` is restored to None after robustness evaluation, and (b) the index-aligned tuning/robustness holdout pairs are no longer deterministically identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
MaxGhenis
commented
Apr 17, 2026
Contributor
Author
MaxGhenis
left a comment
There was a problem hiding this comment.
LGTM (cannot self-approve; posting as comment).
Finding #9 (state restore):
evaluate_holdout_robustnessfinally block:setattris unconditional —excluded_targets=Noneoriginals are correctly restored to None (previously stuck on the last holdout's list).exclude_targets()is called unconditionally after restore. Verified safe: the method has an internalif self.excluded_targets:guard and the None branch correctly resetsself.estimate_matrix,self.estimate_function,self.targets,self.target_namesfrom originals.tune_l0_hyperparametersfinally block: direct assignment with None-safe.copy()handles both None and ndarray originals.
Finding #10 (seed namespace):
- Robustness evaluation now uses
seed + 10_000(vs tuning'sseed + irange). The previous collision(seed + 1) == ((seed + 1) + 0)for tuning[i] vs robustness[i-1] no longer fires —test_robustness_uses_orthogonal_seed_namespaceasserts index-aligned pairs differ. calibration.seed + 10_000 if calibration.seed is not None else Nonecorrectly propagates the None case.- Orthogonal to tuning seed (10_000 offset is far larger than practical
n_holdout_sets).
Test file is thorough — covers None restoration, post-holdout state integrity, and seed orthogonality.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related bugs in the holdout code paths:
Finding Add tests on impossible synthetic data #9 (LOW):
evaluate_holdout_robustness(and the correspondingtune_l0_hyperparametersfinally block) restored captured state only when the captured value was notNone. Callers that started withexcluded_targets=Nonetherefore had the last holdout set's target list silently stuck on the calibrator, and any subsequentcalibrate()call trained on the last holdout's train split.Restoration is now unconditional: if the attribute started as
Nonewe restore it toNone.exclude_targets()is called unconditionally after restoration to refresh derived state.Finding Add synthetic data #10 (LOW): Tuning called
_create_holdout_sets(..., seed)and robustness evaluation called_create_holdout_sets(..., seed + 1). Since_create_holdout_setsiteratesbase_seed + i, tuning's holdout Add repo documentation, tests #1 and robustness's holdout #0 both reduced toseed + 1, yielding identical sets — "independent" evaluation reused a tuning holdout.Robustness now uses an orthogonal namespace (
seed + 10_000), eliminating the deterministic collision.Test plan
tests/test_holdout_state_restore.py:None-valuedexcluded_targetsis restored toNoneafter robustness evaluation; the index-aligned tuning/robustness holdout pairs are no longer deterministically identical.uv run pytest tests -x -q-> 17 passed).🤖 Generated with Claude Code