Skip to content

fix #307: record_test_baseline timeout is fail-safe, not fail-open#315

Merged
azalio merged 1 commit into
mainfrom
claude/compassionate-cerf-6l674i
Jun 30, 2026
Merged

fix #307: record_test_baseline timeout is fail-safe, not fail-open#315
azalio merged 1 commit into
mainfrom
claude/compassionate-cerf-6l674i

Conversation

@azalio

@azalio azalio commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes #307.

record_test_baseline defaulted to a 120 s subprocess timeout. On any repo whose full suite takes longer, the process was killed and the function returned {"status":"baseline_failures","baseline_failures":[]} — an empty failures list that was indistinguishable from a genuinely clean run. Any pre-existing failure was silently treated as "not pre-existing", making the regression-vs-pre-existing gate unreliable.

Changes

record_test_baseline

  • status is now "timed_out" (instead of "baseline_failures") when the subprocess is killed by the timeout, giving callers a distinct value to branch on.
  • New baseline_complete: bool field — false when the run timed out, true otherwise. Downstream code should check this before trusting an empty baseline_failures list.
  • Default timeout_seconds raised from 120 → 600 (10 min) so most test suites complete in time; the --timeout CLI flag still accepts an explicit override.

list_baseline_failures

  • Propagates baseline_complete and timed_out from the stored JSON.
  • Adds a "warning" key (human-readable explanation) when the stored baseline is incomplete, so callers that log the full report surface the problem automatically.

Tests (tests/test_map_step_runner.py)

  • test_baseline_timeout_is_unknown_not_clean — monkeypatches subprocess.run to raise TimeoutExpired, verifies status=="timed_out", baseline_complete==False, and that list_baseline_failures propagates the warning.
  • test_baseline_complete_true_on_normal_run — verifies baseline_complete==True and no warning key on a successful run.

All 3 225 existing tests pass; ruff, pyright, and make check-render are clean.


Generated by Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Increased the baseline test timeout to give long-running suites more time to finish.
    • Improved baseline reporting to clearly distinguish between a timeout and actual test failures.
    • Added clearer warnings when baseline results are incomplete, so empty failure lists are not mistaken for a clean run.
    • Baseline results now include completion and timeout status for better visibility.

When record_test_baseline times out the subprocess never finishes, so
baseline_failures was always [] — indistinguishable from a genuinely
clean suite. Any pre-existing failure was silently treated as "not
pre-existing", defeating the regression-vs-pre-existing distinction.

Changes:
- status is now "timed_out" (not "baseline_failures") when the
  subprocess times out, so the two cases have distinct values
- new baseline_complete: bool field — false on timeout, true otherwise;
  downstream code can check this before trusting an empty baseline
- list_baseline_failures propagates baseline_complete, timed_out, and
  emits a "warning" key when the stored baseline is incomplete
- default timeout_seconds raised from 120 → 600 (10 min) to give most
  suites room to finish; --timeout flag still accepts an explicit value
- two regression tests added: one for the timeout path, one asserting
  baseline_complete is true on a normal run

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_015ncF4ESXM8YoxbH4U9VyYY
@azalio azalio merged commit a650894 into main Jun 30, 2026
1 of 2 checks passed
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 64c63171-1e3d-48c6-a4d1-82a67b510b15

📥 Commits

Reviewing files that changed from the base of the PR and between 0502a1d and c373fd3.

📒 Files selected for processing (4)
  • .map/scripts/map_step_runner.py
  • src/mapify_cli/templates/map/scripts/map_step_runner.py
  • src/mapify_cli/templates_src/map/scripts/map_step_runner.py.jinja
  • tests/test_map_step_runner.py

📝 Walkthrough

Walkthrough

record_test_baseline default timeout increases from 120s to 600s. On timeout, status is now timed_out instead of baseline_failures, and the saved payload gains baseline_complete and timed_out fields. list_baseline_failures reads those fields and emits a warning when the baseline timed out. Changes are mirrored across the jinja source, rendered template, live script, and covered by two new regression tests.

Baseline Timeout Handling

Layer / File(s) Summary
Jinja source: timeout logic in record/list functions
src/mapify_cli/templates_src/map/scripts/map_step_runner.py.jinja
Default timeout_seconds raised to 600; record_test_baseline now sets status="timed_out" on TimeoutExpired and writes baseline_complete/timed_out into the payload; list_baseline_failures reads those fields and adds a warning when timed_out is true; CLI default also raised to 600.
Rendered template mirror
src/mapify_cli/templates/map/scripts/map_step_runner.py
Identical logic changes applied to the rendered copy.
Live .map script mirror
.map/scripts/map_step_runner.py
Identical logic changes applied to the live deployed copy.
Regression tests
tests/test_map_step_runner.py
Two tests added: timeout path asserts status="timed_out", baseline_complete=False, timed_out=True, empty baseline_failures, and warning from list_baseline_failures; normal path asserts baseline_complete=True, timed_out=False, no warning.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 The clock used to tick at two minutes flat,
Now ten whole minutes — imagine that!
timed_out is noted, baseline_complete set,
No silent clean slate from a suite that's not met.
Warnings hop loudly when things don't conclude ~

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/compassionate-cerf-6l674i

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

record_test_baseline fail-open on >120s suite: timeout yields empty baseline indistinguishable from clean

2 participants