fix #307: record_test_baseline timeout is fail-safe, not fail-open#315
Conversation
When record_test_baseline times out the subprocess never finishes, so baseline_failures was always [] — indistinguishable from a genuinely clean suite. Any pre-existing failure was silently treated as "not pre-existing", defeating the regression-vs-pre-existing distinction. Changes: - status is now "timed_out" (not "baseline_failures") when the subprocess times out, so the two cases have distinct values - new baseline_complete: bool field — false on timeout, true otherwise; downstream code can check this before trusting an empty baseline - list_baseline_failures propagates baseline_complete, timed_out, and emits a "warning" key when the stored baseline is incomplete - default timeout_seconds raised from 120 → 600 (10 min) to give most suites room to finish; --timeout flag still accepts an explicit value - two regression tests added: one for the timeout path, one asserting baseline_complete is true on a normal run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_015ncF4ESXM8YoxbH4U9VyYY
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthrough
Baseline Timeout Handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Fixes #307.
record_test_baselinedefaulted to a 120 s subprocess timeout. On any repo whose full suite takes longer, the process was killed and the function returned{"status":"baseline_failures","baseline_failures":[]}— an empty failures list that was indistinguishable from a genuinely clean run. Any pre-existing failure was silently treated as "not pre-existing", making the regression-vs-pre-existing gate unreliable.Changes
record_test_baselinestatusis now"timed_out"(instead of"baseline_failures") when the subprocess is killed by the timeout, giving callers a distinct value to branch on.baseline_complete: boolfield —falsewhen the run timed out,trueotherwise. Downstream code should check this before trusting an emptybaseline_failureslist.timeout_secondsraised from 120 → 600 (10 min) so most test suites complete in time; the--timeoutCLI flag still accepts an explicit override.list_baseline_failuresbaseline_completeandtimed_outfrom the stored JSON."warning"key (human-readable explanation) when the stored baseline is incomplete, so callers that log the full report surface the problem automatically.Tests (
tests/test_map_step_runner.py)test_baseline_timeout_is_unknown_not_clean— monkeypatchessubprocess.runto raiseTimeoutExpired, verifiesstatus=="timed_out",baseline_complete==False, and thatlist_baseline_failurespropagates the warning.test_baseline_complete_true_on_normal_run— verifiesbaseline_complete==Trueand nowarningkey on a successful run.All 3 225 existing tests pass;
ruff,pyright, andmake check-renderare clean.Generated by Claude Code
Summary by CodeRabbit