fix(test-infra): bound test invocations with a fail-fast watchdog + CI timeouts by gsxdsm · Pull Request #1669 · Runfusion/Fusion

gsxdsm · 2026-06-13T07:48:43Z

Summary

Reliability-focused work from docs/plans/2026-06-13-001-fix-test-timeout-failures-plan.md: make test timeout failures fail fast and diagnosably instead of hanging to the CI 6h ceiling, and lay the groundwork to root-cause the cross-package flake cluster.

This PR lands the guardrail layer (plan units U1, U2, and KTD-6). The higher-risk surgery (U3 WORKER_ROOT fix + flake rescue, U4 guard attribution, U5 serialized-project containment, U6 shard guardrail) is recorded as residual follow-up work below — it needs iterative full-suite verification that shouldn't be rushed into one autonomous pass.

What's in this PR (verified)

Shared watchdog (scripts/lib/run-vitest-watchdog.mjs, U1/U2): generalizes the dashboard heap-runner's detached-process-group SIGTERM→SIGKILL lifecycle. Per-class budget bands (shard/changed/dashboard-lane) are the safety net; timings only tighten within a generous ceiling, and only when the snapshot is fresh — a stale test-timings.json can never produce a too-tight (false-kill) budget. On timeout it emits inline hang diagnostics (elapsed, last heartbeat, wrapper handle summary).
Both live runners converted from blocking spawnSync to async watchdog spawn (U1): scripts/ci-test-shard.mjs (per-command budget from aggregated package weights) and scripts/test-changed.mjs (full/affected runs get a generous 60min backstop; the quick gate uses the changed ceiling). In test-changed.mjs the watched runner throws on failure/timeout with .exitCode so the existing finally cleanup (isolated-HOME prune + isolation post-check) still reaps any leaked HOME on a watchdog kill.
Dashboard runner delegates to the shared helper (run-vitest-with-heap.mjs), behavior preserved (6144MiB heap, 15min default, 5s grace, signal forwarding, exit 124).
CI timeout-minutes backstops on all full-suite.yml test jobs (KTD-6): test-shards/test-slow = 60, test-inventory-guard = 20 — strictly above the L2 watchdog ceiling so L2 fires first, and far below the silent 6h default.

Verification

scripts/__tests__/run-vitest-watchdog.test.mjs — 11/11 (budget derivation, timeout→SIGTERM→SIGKILL→124, signal forwarding, listener cleanup).
Dashboard runner integration test — 3/3 (group reaping on SIGTERM/SIGINT, timeout→124).
ci-test-shard — 37/38 (the 1 failure, --dry-run dashboard-lane expansion, is pre-existing on main).
test-changed — 84/84.
End-to-end: pnpm test ran the gate suite (664 tests) through the new async watchdog path and passed, with the [watchdog] still running heartbeat confirming the wrapper is active.

Residual / follow-up work (remaining plan units)

U3 — root-cause the cross-package WORKER_ROOT cluster (highest value). The smoke run deterministically reproduced the leak: a gate run left 6 leaked /tmp/fusion-test-workers-* dirs, tripping the isolation guard. This is the pre-existing bug (documented in scripts/lib/test-quarantine.json on main, dated 2026-06-12, across 5 core + several engine entries), not a regression from this PR — the mechanism lives entirely inside the vitest process tree (packages/core/src/__test-utils__/vitest-teardown.ts globalSetup + per-fork WORKER_ROOT in vitest-setup.ts), unaffected by whether the parent pnpm is spawned detached. Worker forks that don't inherit FUSION_TEST_WORKER_ROOT mint their own roots, and busy-dir rmSync failures on teardown leak them. Fix: make the worker-root lifetime reliably per-invocation-shared and robustly cleaned, then rescue/delete the quarantined cluster per the ratchet.
U4 — subprocess-guard attribution logging in vitest-setup.ts (structured record naming the owning test/argv/elapsed on a kill; budget scaling stays deferred as a data-gated follow-up per the plan's anti-appeasement constraint).
U5 — serialized single-worker project containment (engine-reliability/engine-slow); decision-gated, may be zero-code if the U1 watchdog already contains the wedge.
U6 — shard-vs-ceiling guardrail extending the existing DEFAULT_BALANCE_TOLERANCE/TIMINGS_STALENESS_DAYS/--check-timings-staleness machinery in ci-test-shard.mjs.

Notes

No changeset: internal test/CI tooling, no shipped-package runtime behavior change.
The branch was cut from a local main that was 2 commits ahead of origin/main (FN-6344, FN-6339); those will reconcile when local main lands.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved CI and local test reliability by adding safeguards against long-running or hung test runs.
- Added clearer timeout handling and diagnostics so failed runs are easier to understand.
New Features
- Introduced a shared watchdog for test execution with bounded timeouts and graceful shutdown behavior.
Documentation
- Added a new reliability plan outlining the test-timeout prevention approach and rollout steps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

U1/U2/KTD-6: extract the dashboard heap-runner's process-group kill lifecycle into a shared scripts/lib/run-vitest-watchdog.mjs with per-class budget bands (timings only tighten within a generous ceiling) and an inline hang-diagnostics snapshot. Delegate run-vitest-with-heap.mjs to it (behavior preserved). Add timeout-minutes backstops to all full-suite.yml test jobs so a wedged run can no longer hang to GitHub's 6h ceiling. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Convert ci-test-shard.mjs's vitest invocation path from blocking spawnSync to async spawn through scripts/lib/run-vitest-watchdog.mjs. Each shard command gets a per-class budget (shard ceiling 30min) tightened by aggregated timings only when the snapshot is fresh; quick non-test commands stay synchronous. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…g (U1) Convert test-changed.mjs's invocation path from blocking spawnSync to async spawn through the shared watchdog. runWatchedTest throws on failure/timeout with .exitCode so the existing catch and the runMaybeIsolated finally (prune + isolation post-check) still run — a watchdog kill reaps any leaked isolated HOME instead of leaving it for the guard to flag. Full/affected runs use a generous 60min backstop (dashboard lanes are already inner-watchdog'd); the quick gate uses the changed-class ceiling. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-13T07:48:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 91e53952-eeb1-4f4a-8867-ae9e2aaebc93

📥 Commits

Reviewing files that changed from the base of the PR and between a49450e and e51d05b.

📒 Files selected for processing (2)

scripts/ci-test-shard.mjs
scripts/test-changed.mjs

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/ci-test-shard.mjs

📝 Walkthrough

Walkthrough

Adds a shared Vitest watchdog, switches dashboard and CI/local test runners to async watchdog execution with derived budgets, adds CI job timeout backstops, updates watchdog-focused tests, and introduces a detailed timeout-reliability plan document.

Changes

Test hang prevention and runner consolidation

Layer / File(s)	Summary
Shared watchdog library and tests `scripts/lib/run-vitest-watchdog.mjs`, `scripts/__tests__/run-vitest-watchdog.test.mjs`	Adds shared timeout-budget, hang-diagnostics, and process-group supervision helpers, with tests for budget calculation, timeout escalation, signal forwarding, cleanup, and spawn wiring.
Dashboard vitest heap wrapper migration `packages/dashboard/scripts/run-vitest-with-heap.mjs`, `packages/dashboard/scripts/__tests__/run-vitest-with-heap.test.ts`	Replaces wrapper-local process supervision with `runWithWatchdog`, parses timeout/grace env values, and updates stderr assertions for the new watchdog log and timeout output.
Shard runner async watchdog execution `scripts/ci-test-shard.mjs`	Moves shard command execution to async watchdog runs, adds `weightMs` to generated commands, derives per-command budgets from fresh timings, and updates CLI error handling.
Changed and gate runner watchdog wrapping `scripts/test-changed.mjs`	Routes full, gate, and changed-mode test commands through the shared watchdog while preserving isolation checks and cleanup, and makes the entrypoint async.
Workflow job timeout backstops `.github/workflows/full-suite.yml`	Adds explicit job-level `timeout-minutes` limits for `test-shards`, `test-inventory-guard`, and `test-slow`.
Reliability planning document `docs/plans/2026-06-13-001-fix-test-timeout-failures-plan.md`	Documents the timeout and hang failure modes, layered watchdog strategy, implementation units U1–U6, sequencing, risks, and research references.

Sequence Diagram(s)

sequenceDiagram
  participant CI as ci-test-shard.mjs
  participant Watchdog as runWithWatchdog
  participant Vitest as vitest child group
  CI->>Watchdog: run command with label and budget
  Watchdog->>Vitest: spawn detached process group
  Watchdog->>Watchdog: track heartbeat and timeout budget
  alt command finishes in time
    Vitest-->>Watchdog: close(code=0 or nonzero)
    Watchdog-->>CI: result with code/signal
  else budget exceeded
    Watchdog->>Vitest: SIGTERM
    Watchdog->>Watchdog: log hang diagnostics
    Watchdog->>Vitest: SIGKILL after grace
    Vitest-->>Watchdog: close(timeout exit)
    Watchdog-->>CI: timedOut result
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Runfusion/Fusion#1453: Also changes scripts/test-changed.mjs execution flow in CI-facing test routing.
Runfusion/Fusion#51: Also touches scripts/ci-test-shard.mjs and scripts/test-changed.mjs in the test-running pipeline.

Poem

🐇 I thumped a clock beside the tests,
So wandering runs now face their rests.
With watchdog hops and heartbeats bright,
The shards and gates keep better time tonight.
Carrots for the logs that tell the tale!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly and specifically describes the main change: adding a fail-fast watchdog with timeouts to bound test invocations and prevent hanging to CI ceiling.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/test-timeout-failures

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

greptile-apps · 2026-06-13T07:56:34Z

Greptile Summary

This PR introduces a shared watchdog runner (scripts/lib/run-vitest-watchdog.mjs) that bounds every test invocation with a SIGTERM→SIGKILL lifecycle, converts ci-test-shard.mjs and test-changed.mjs from blocking spawnSync to async watchdog-wrapped spawns, delegates run-vitest-with-heap.mjs to the shared helper, and adds explicit timeout-minutes to all full-suite.yml test jobs.

New watchdog helper (run-vitest-watchdog.mjs): detached process-group spawn, per-class budget bands (shard 5–30 min, changed 2–20 min, dashboard-lane 15–30 min), optional timings-tightening within bands, inline hang diagnostics on timeout (elapsed, heartbeat delta, wrapper handle summary), signal forwarding, and per-invocation handler cleanup so loops don't accumulate listeners.
Both live runners converted async: ci-test-shard.mjs derives budgets from aggregated package weights + timings freshness check; test-changed.mjs uses a 60 min backstop for full/affected runs and the changed-class ceiling for gate-only runs; isolated-HOME finally cleanup is preserved on watchdog kill.
CI backstops: test-shards and test-slow now have timeout-minutes: 60; test-inventory-guard has timeout-minutes: 20 — all above the 30 min L2 watchdog ceiling so L2 fires first.

Confidence Score: 5/5

Safe to merge — the watchdog adds a new guardrail layer without changing any test assertions or broadening timeouts, and all three converted runners fall back to the conservative per-class ceiling when timings are absent or stale.

The core watchdog logic is well-isolated behind injected dependencies (spawn, killGroup, now) with 11 unit tests covering the full lifecycle. The spawnSync→async conversions preserve every existing exit/signal/cleanup contract, and the CI timeout values are sized safely above the L2 watchdog ceiling. The only committed noise is a development-artifact comment in test-changed.mjs that has no runtime effect.

scripts/test-changed.mjs — contains an accidentally committed FNXC:TestInfrastructure annotation that should be removed before merging.

Important Files Changed

Filename	Overview
scripts/lib/run-vitest-watchdog.mjs	New shared watchdog implementation — clean design with injected spawn/killGroup for testability, correct settled flag, proper handler cleanup, and inline hang diagnostics.
scripts/tests/run-vitest-watchdog.test.mjs	11 tests covering budget derivation, timeout SIGTERM→SIGKILL→124 sequence, signal forwarding, clean exit propagation, listener cleanup (SIGTERM and exit), cwd pass-through, and child error rejection.
scripts/ci-test-shard.mjs	spawnSync→async watchdog conversion for test invocations; correctly threads timings freshness from buildScheduleUnits, aggregates per-command weightMs, and preserves fail-fast exit behavior.
scripts/test-changed.mjs	spawnSync→async watchdog conversion with isolated-HOME finally preserved; contains a committed development artifact comment (FNXC:TestInfrastructure) and a void spawnOptions suppression that silently drops unrecognized options.
packages/dashboard/scripts/run-vitest-with-heap.mjs	Delegates cleanly to runWithWatchdog while preserving heap flag, 15 min default, 5 s grace, signal forwarding, and exit-124 behavior; adds positiveIntEnv guard against NaN-disabling the killer.
packages/dashboard/scripts/tests/run-vitest-with-heap.test.ts	Timeout bumped to 2000ms/200ms to allow child startup; log prefix assertions updated to [watchdog]; all 3 integration cases still covered.
.github/workflows/full-suite.yml	Adds timeout-minutes: 60 to test-shards and test-slow, and 20 to test-inventory-guard — all above the 30 min L2 watchdog ceiling, correct backstop ordering.

Sequence Diagram

sequenceDiagram
    participant Runner as ci-test-shard / test-changed
    participant WD as run-vitest-watchdog
    participant Child as vitest process group
    participant CI as GitHub Actions (L3)

    Runner->>WD: runWithWatchdog(command, budgetMs)
    WD->>Child: spawn detached (own pgid)
    WD-->>WD: arm watchdog timer (budget)
    WD-->>WD: install SIGINT/SIGTERM/SIGHUP forwarders

    alt child exits within budget
        Child-->>WD: close(code, signal)
        WD-->>Runner: "resolve { code, signal, timedOut:false }"
        Runner->>Runner: exit(code)
    else watchdog fires (budget exceeded)
        WD-->>WD: captureHangDiagnostics → log
        WD->>Child: SIGTERM (process group)
        WD-->>WD: arm grace timer
        WD->>Child: SIGKILL after grace
        Child-->>WD: close(null, SIGKILL)
        WD-->>Runner: "resolve { code:124, timedOut:true }"
        Runner->>Runner: exit(124)
    else external SIGTERM/SIGINT reaches wrapper
        Runner-->>WD: signal forwarded
        WD->>Child: forward signal to group
        WD-->>WD: arm grace timer → SIGKILL
        Child-->>WD: close(null, signal)
        WD-->>Runner: "resolve { signal }"
    else L2 watchdog itself wedges
        CI-->>Runner: timeout-minutes fires
        CI->>Runner: job cancelled
    end

_{Reviews (4): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

packages/dashboard/scripts/__tests__/run-vitest-with-heap.test.ts (1)
150-154: ⚡ Quick win

Keep this timeout-path spec off a 2s real-clock wait.

Bumping the wrapper timeout to 2000ms makes this case sit on wall-clock time before it can pass. The shared watchdog already covers timeout semantics directly, so this wrapper spec should stay at a narrower seam — e.g. a stub that becomes observable immediately with a sub-second budget, or a helper-level timeout assertion instead of a 2s end-to-end wait.

As per coding guidelines, **/*.test.{ts,tsx,js}: "Do not add slow tests. Prefer narrow seams, in-memory fakes, shared harnesses, and targeted assertions. Prefer fake timers over real polling/time waits."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dashboard/scripts/__tests__/run-vitest-with-heap.test.ts` around
lines 150 - 154, The test should not rely on a 2s real-clock wait; replace the
long end-to-end timeout with a narrow seam: either (A) reduce
FUSION_RUN_VITEST_TIMEOUT_MS to a sub-second value (e.g., 200–500) and keep
FUSION_RUN_VITEST_KILL_GRACE_MS proportionally small so the stub process becomes
observable immediately, or (B) avoid real waiting by using fake timers or a test
helper to simulate/advance the watchdog timeout and assert the timeout->exit
behavior; update the environment variables FUSION_RUN_VITEST_TIMEOUT_MS and
FUSION_RUN_VITEST_KILL_GRACE_MS (or convert the test to use fake timers) in
run-vitest-with-heap.test.ts accordingly.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/dashboard-guide.md`:
- Line 731: The docs string that lists steerable in-review statuses currently
reads "(reviewing/merging/fixing)"; update that phrase to include the missing
status `merging-fix` (for example "(reviewing/merging/merging-fix/fixing)") so
the documentation matches the implementation. Locate the exact sentence
containing the parenthetical and insert `merging-fix` into the list.

In `@docs/plans/2026-06-13-001-fix-test-timeout-failures-plan.md`:
- Around line 115-116: Update the plan text to match the implemented contract in
deriveBudgetMs: replace references to a “median fallback” with a per-class
ceiling fallback and explicitly state that deriveBudgetMs defaults to the
per-class ceiling when timings are missing and only tightens within the
per-class floor/ceiling band when fresh timings exist; also call out that
timelines must be refreshed via scripts/ci-test-shard.mjs --write-timings before
U1 so budgets are derived from fresh data.
- Around line 131-151: The fenced code block with the project tree is missing a
language/tag for the Markdown fence (tripled backticks); fix by changing the
opening ``` to a labelled fence such as ```text (or another appropriate
language) so markdownlint MD040 is satisfied — update the unlabeled
triple-backtick fence that precedes the block starting with "scripts/" to use
```text.
- Around line 21-24: Summary: the paragraph reads as if watchdogs/backstops are
not yet present but those changes are already landing; either mark it explicitly
as historical context or update to the landed state. Fix: edit the paragraph to
(a) if keeping as background, prefix or reword it to clearly state it's
historical context (use past tense, e.g. "Historically, only the dashboard
runner had a watchdog; since then we added a shared watchdog and job-level
backstops elsewhere"), or (b) if making it current, update wording to
present/perfect tense to reflect that the shared watchdog and job-level
backstops are now in place and remove the implication they're missing; keep the
technical references (shared watchdog, job-level backstops, shared WORKER_ROOT
temp-isolation mechanism) so readers can correlate to the rest of the plan.
Ensure the final paragraph unambiguously states whether the described behavior
is corrected (landed) or retained as background.

In `@packages/dashboard/scripts/run-vitest-with-heap.mjs`:
- Around line 21-22: The parsed env values for timeoutMs and graceMs can become
NaN and silently disable the watchdog; update the parsing logic so that after
Number.parseInt you validate and clamp the values before they are used by
runWithWatchdog(): for timeoutMs, if the parsed value is NaN or <= 0, set it to
the default 900000 (and ensure it is finite); for graceMs, if the parsed value
is NaN or < 0, set it to the default 5000 (and ensure it is finite); then pass
these validated/clamped timeoutMs and graceMs into runWithWatchdog() so budgetMs
is always a positive finite number.

In `@scripts/lib/run-vitest-watchdog.mjs`:
- Around line 222-230: The custom signal handlers installed in forwardedSignals
only forward the signal to the child group (signalGroup) and prevent Node's
default termination behavior, which can hang runWithWatchdog(); update the
handler (the function created in the loop that is stored in signalHandlers) so
that after forwarding to signalGroup it also restores default behavior and
re-emits the original signal to the process (remove each handler from process
via process.removeListener or process.off using signalHandlers, then re-send the
same signal with process.kill(process.pid, sig) so Node's default exit semantics
apply); if you need to preserve the existing grace period logic, re-emit the
signal after the same watchdog grace timeout used for child termination (use
setTimeout with that timeout) so external cancellation follows the same grace
ladder.

In `@scripts/test-changed.mjs`:
- Around line 211-233: The new async path drops the child's cwd so tests run
from process.cwd() instead of the intended rootDir; fix by preserving and
forwarding cwd through the call chain: ensure runMaybeIsolated and
runWatchedTest (and the internal run() call they use) accept and pass through a
cwd option (defaulting to the existing rootDir when present) instead of
discarding spawnOptions, and update the shared helper (run-vitest-watchdog) to
accept a cwd parameter and pass it into its internal spawn(...) invocation so
spawned processes run from rootDir when provided.

---

Nitpick comments:
In `@packages/dashboard/scripts/__tests__/run-vitest-with-heap.test.ts`:
- Around line 150-154: The test should not rely on a 2s real-clock wait; replace
the long end-to-end timeout with a narrow seam: either (A) reduce
FUSION_RUN_VITEST_TIMEOUT_MS to a sub-second value (e.g., 200–500) and keep
FUSION_RUN_VITEST_KILL_GRACE_MS proportionally small so the stub process becomes
observable immediately, or (B) avoid real waiting by using fake timers or a test
helper to simulate/advance the watchdog timeout and assert the timeout->exit
behavior; update the environment variables FUSION_RUN_VITEST_TIMEOUT_MS and
FUSION_RUN_VITEST_KILL_GRACE_MS (or convert the test to use fake timers) in
run-vitest-with-heap.test.ts accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c27a3d7b-d3a6-4a2f-87df-6359c6fa5f22

📥 Commits

Reviewing files that changed from the base of the PR and between 0765913 and 03a5d1e.

📒 Files selected for processing (12)

.github/workflows/full-suite.yml
docs/dashboard-guide.md
docs/plans/2026-06-13-001-fix-test-timeout-failures-plan.md
packages/dashboard/app/components/TaskChatTab.css
packages/dashboard/app/components/TaskChatTab.tsx
packages/dashboard/app/components/__tests__/TaskChatTab.test.tsx
packages/dashboard/scripts/__tests__/run-vitest-with-heap.test.ts
packages/dashboard/scripts/run-vitest-with-heap.mjs
scripts/__tests__/run-vitest-watchdog.test.mjs
scripts/ci-test-shard.mjs
scripts/lib/run-vitest-watchdog.mjs
scripts/test-changed.mjs

- watchdog: escalate forwarded SIGINT/SIGTERM/SIGHUP to SIGKILL after grace so external cancellation can't hang for the full budget (coderabbit major) - watchdog: route onProcExit through signalGroup for injection consistency (greptile) - watchdog: add cwd option; test-changed passes rootDir so pnpm runs from repo root regardless of invocation cwd (coderabbit major — preserved original run() cwd) - dashboard runner: validate/clamp FUSION_RUN_VITEST_* env so a malformed value can't NaN-disable the watchdog (coderabbit) - tests: verify exit-listener cleanup, forwarded-signal escalation, cwd passthrough - plan doc: per-class-ceiling fallback wording (not median); label Output Structure fence Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

packages/dashboard/scripts/run-vitest-with-heap.mjs (1)
37-58: 💤 Low value

Consider wrapping JSON.parse for clearer error messaging.

If FUSION_RUN_VITEST_SPAWN_OVERRIDE contains malformed JSON, JSON.parse() throws a generic SyntaxError that doesn't mention the env var name. Since this is a test seam and the outer .catch() will still log and exit 1, this is minor—but a wrapped parse would make debugging easier when the override is misconfigured.
🔧 Optional improvement
   const override = process.env.FUSION_RUN_VITEST_SPAWN_OVERRIDE;
   if (!override) {
     return { command: "pnpm", args: ["exec", "vitest", ...vitestArgs] };
   }
 
-  const parsedOverride = JSON.parse(override);
+  let parsedOverride;
+  try {
+    parsedOverride = JSON.parse(override);
+  } catch (err) {
+    throw new Error(`FUSION_RUN_VITEST_SPAWN_OVERRIDE is not valid JSON: ${err.message}`);
+  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dashboard/scripts/run-vitest-with-heap.mjs` around lines 37 - 58,
Wrap the JSON.parse call in resolveSpawnCommand so malformed
FUSION_RUN_VITEST_SPAWN_OVERRIDE values throw a clearer Error that names the env
var; keep the existing validation and return shape unchanged, and make the new
parse failure message more actionable than the raw SyntaxError.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/dashboard/scripts/run-vitest-with-heap.mjs`:
- Around line 37-58: Wrap the JSON.parse call in resolveSpawnCommand so
malformed FUSION_RUN_VITEST_SPAWN_OVERRIDE values throw a clearer Error that
names the env var; keep the existing validation and return shape unchanged, and
make the new parse failure message more actionable than the raw SyntaxError.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 3c3d8f52-9f39-4785-a315-453787e27866

📥 Commits

Reviewing files that changed from the base of the PR and between 03a5d1e and a49450e.

📒 Files selected for processing (5)

docs/plans/2026-06-13-001-fix-test-timeout-failures-plan.md
packages/dashboard/scripts/run-vitest-with-heap.mjs
scripts/__tests__/run-vitest-watchdog.test.mjs
scripts/lib/run-vitest-watchdog.mjs
scripts/test-changed.mjs

✅ Files skipped from review due to trivial changes (1)

docs/plans/2026-06-13-001-fix-test-timeout-failures-plan.md

🚧 Files skipped from review as they are similar to previous changes (2)

scripts/lib/run-vitest-watchdog.mjs
scripts/test-changed.mjs

…ures # Conflicts: # scripts/test-changed.mjs

gsxdsm and others added 4 commits June 13, 2026 00:24

docs(plan): add test-timeout-failures reliability plan

c2ddf53

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

fix(review): drop redundant /* global */ comments (eslint no-redeclare)

03a5d1e

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread scripts/lib/run-vitest-watchdog.mjs

Comment thread scripts/__tests__/run-vitest-watchdog.test.mjs

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

gsxdsm added 2 commits June 13, 2026 15:44

Merge remote-tracking branch 'origin/main' into fix/test-timeout-fail…

e51d05b

…ures # Conflicts: # scripts/test-changed.mjs

Merge remote-tracking branch 'origin/main' into fix/test-timeout-fail…

2bffad1

…ures # Conflicts: # scripts/test-changed.mjs

gsxdsm merged commit 9314d1c into main Jun 15, 2026
6 checks passed

gsxdsm deleted the fix/test-timeout-failures branch June 15, 2026 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(test-infra): bound test invocations with a fail-fast watchdog + CI timeouts#1669

fix(test-infra): bound test invocations with a fail-fast watchdog + CI timeouts#1669
gsxdsm merged 8 commits into
mainfrom
fix/test-timeout-failures

gsxdsm commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gsxdsm commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in this PR (verified)

Verification

Residual / follow-up work (remaining plan units)

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gsxdsm commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading