Guard loss() and pct_close() denominators and catch Inf, not just NaN#96
Merged
Conversation
Two related problems in src/microcalibrate/utils/metrics.py: 1. Both functions divide by `target + 1`, which crosses zero at `target == -1` -- producing Inf and silently corrupting the loss. The NaN guard in loss() was specifically `torch.isnan`, so Inf passed through undetected and calibration diverged without any diagnostic. 2. Any other upstream non-finite value (e.g. an Inf estimate from a broken estimate function or dropout misconfiguration) slipped past the same isnan check. Factor out a `_safe_denominator` helper that clamps `target + 1` to `eps` (1e-12) so the value is unchanged for any well-behaved target but never zero. Switch the guard to `torch.isfinite(...).all()` so Inf is rejected alongside NaN, with an error message pointing at likely causes. Adds tests/test_metrics_regression.py covering target == -1 (no Inf, result is finite), explicit Inf estimate (now raises), value preservation for normal targets, and pct_close with the same guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
MaxGhenis
commented
Apr 17, 2026
Contributor
Author
MaxGhenis
left a comment
There was a problem hiding this comment.
LGTM (cannot self-approve; posting as comment).
_safe_denominator(targets)clampstargets + 1toeps = 1e-12, preventing divide-by-zero whentarget == -1. For any well-behaved target (target + 1 >= eps) the numeric result is unchanged — verified bytest_loss_preserves_values_for_well_behaved_targets.- Guard broadened from
torch.isnan(rel_error).any()tonot torch.isfinite(rel_error).all(), so Inf is now caught (was silently passing through before). Error message is informative. pct_closeuses the same_safe_denominator, so the metric stays finite on pathological targets.
Scope note (not blocking): finding #7 also flagged the "magnitude asymmetry" of the +1 smoothing for sub-unit targets. This PR deliberately preserves the original +1 semantics and only guards the denominator floor — a narrow, conservative fix. Revisiting the principled scale (e.g. max(|target|, eps)) is a bigger behaviour change best handled separately. Leaving as-is is fine.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Finding #7 (MED). Both functions divide by
target + 1, which crosses zero attarget == -1— producing Inf and silently corrupting the loss. The NaN guard inloss()was specificallytorch.isnan, so Inf passed through undetected and calibration diverged without any diagnostic. Any other upstream non-finite value (e.g. an Inf estimate from a broken estimate function) slipped past the same check.Factored out
_safe_denominator(targets)that clampstarget + 1toeps = 1e-12so the value is unchanged for any well-behaved target but never zero. Switched the guard totorch.isfinite(...).all()so Inf is rejected alongside NaN, with an error message pointing at likely causes.Test plan
tests/test_metrics_regression.py:target == -1no longer produces Inf, explicit Inf estimates now raise, value preservation for normal targets, andpct_closegets the same guard.uv run pytest tests -x -q-> 19 passed).🤖 Generated with Claude Code