Skip to content

feat: behavioral verification in the Validator#1666

Open
gsxdsm wants to merge 16 commits into
mainfrom
feat/validator-behavioral-verification
Open

feat: behavioral verification in the Validator#1666
gsxdsm wants to merge 16 commits into
mainfrom
feat/validator-behavioral-verification

Conversation

@gsxdsm

@gsxdsm gsxdsm commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Behavioral verification in the Validator

Makes the mission Validator's "done" verdict reflect observed behavior instead of the diff's apparent intent. Behavioral/bug Contract Assertions now default to fail unless a bounded, non-mutating verification run confirms them by exercising the code; non-behavioral assertions keep today's static read-only judging.

Origin: docs/brainstorms/2026-06-11-validator-behavioral-verification-requirements.md · Plan: docs/plans/2026-06-11-001-feat-validator-behavioral-verification-plan.md

What's live and verified (test-execution quality gate)

  • U1 — persisted type (behavioral|static) on Contract Assertions; schema migration 115 + repo-wide version sweep; conservative static default so existing rows preserve current behavior.
  • U2/U3 — adversarial default-to-fail posture for behavioral assertions; mission-verification.ts: disposable git checkout at a trusted revision, mandatory isolating sandbox backend (fail-closed) + scrubbed env, fixed command template + validated test-path, merge-base pre-fix baseline (rejects pass-on-both), git-clean post-condition, inconclusive as a first-class verdict. Judge session stays tools: readonly.
  • U6 — observed-vs-expected reason threaded into the Fix Feature; inconclusive routes to needs-attention with no Fix Feature; createGeneratedFixFeature idempotent on (sourceFeature, run); durable mission/audit events.
  • U7 — closes the reaper → slice-deadlock P0 (reaped slow run reaches a terminal verdict instead of stalling forever); Surface Enumeration + adversarial re-drive reliability suite; updates docs/missions.md, docs/missions-completion-contract.md, CONCEPTS.md; changeset.

What's built but NOT yet live-wired (app-driving — read before merging)

U4 (isolated-app-launch harness), U8 (playwright-core browser driver), U5 (dispatch + determinism contract) are implemented and unit-tested via injected seams, but app-driving is dormant in production: the real app-server spawner (U4) is left behind an injected seam for a heavier lane, and the per-assertion channel/UI-spec classifier is the plan's deferred Open Question. Today behavioral assertions resolve through the test-execution channel only; the app-driving channel is reachable in code/tests but nothing populates channel: "app" yet. Wiring the live spawner + classifier is follow-up work.

Verification performed

  • Per-package typecheck clean: @fusion/core, @fusion/engine, agent-browser plugin.
  • Targeted + mission/reliability suites green (mission-verification, behavioral-posture, redrive-surface, app-harness, app-driving, mission-execution-loop, mission-store, db, db-migrate, roadmap schema).
  • eslint clean on all changed source files.
  • playwright-core added (no bundled-browser download; CI-safe). One local headless-Chrome smoke of the U8 driver passed.

Not verified locally (CI will run these)

  • Full-monorepo eslint ., recursive build, Boot Smoke, and test:gate were not run locally — relying on CI.
  • Known pre-existing, unrelated flaky test: reliability-interactions/soft-delete-blocker-residue.test.ts (task soft-delete; flakes independent of this change).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Workflow setting values now synchronize across nodes during settings push, pull, receive, and status flows.
    • Behavioral verification for assertions with deterministic retry logic and inconclusive verdict handling.
    • Browser-based UI assertion verification for testing observable application behavior.
    • Fix feature generation now includes detailed failure reasons from verification runs.
  • Bug Fixes

    • Resolved validator "reaper→slice" deadlock by re-driving task-less features to terminal verdicts.
    • Validation now prevents livelocking on reaped runs lacking owners.
  • Documentation

    • Clarified mission completion contract for behavioral vs. static assertions.
    • Documented verification semantics and inconclusive verdict handling.

gsxdsm added 13 commits June 10, 2026 21:48
Prevent the settings save split from serializing custom provider definitions through the global or project patch paths.

- skip `customProviders` when building global settings patches
- skip `customProviders` when building project settings patches
- add a regression test that keeps custom providers out of both serialized patches

Files changed:
 .../app/__tests__/settings-save-split.test.ts      | 24 ++++++++++++++++++++++
 .../app/components/settings/save-split.ts          |  4 ++++
 2 files changed, 28 insertions(+)

Fusion-Task-Id: FN-6205

Fusion-Task-Lineage: 84c3f42b-c621-44ff-b977-ecd3a6729e04
Move undocumented flaky engine tests into the formal quarantine path.

- add quarantine ledger entries for flaky engine test files observed under concurrent/full-suite load
- exclude the quarantined engine test files from vitest projects so they no longer run in gate and reliability pools
- remove the undocumented it.skip markers by quarantining at the file/config level instead

Files changed:
 .../__tests__/merger-file-scope-invariant.test.ts  |  4 ++--
 .../src/__tests__/project-engine-manager.test.ts   |  2 +-
 packages/engine/vitest.config.ts                   | 10 ++++++++--
 scripts/lib/test-quarantine.json                   | 23 +++++++++++++++++++++-
 4 files changed, 33 insertions(+), 6 deletions(-)

Fusion-Task-Id: FN-6206

Fusion-Task-Lineage: 8d88725e-1e92-4a06-aaa6-be34287e613a
Synchronize workflow setting values through node settings sync.

- Include workflow settings in settings payloads, diffs, status responses, and API types.
- Apply inbound and pulled workflow settings through the TaskStore while ignoring rejected setting ids.
- Add core, route, hook, API, and Nodes view coverage for workflow settings sync behavior.
- Add a patch changeset for the published Fusion package.

Files changed:
 .changeset/loud-nodes-sync.md                      |   5 +
 packages/core/src/__tests__/central-core.test.ts   |  37 ++++
 packages/core/src/central-core.ts                  |  13 +-
 packages/core/src/types.ts                         |   4 +
 packages/dashboard/app/__tests__/api-node.test.ts  |   4 +-
 packages/dashboard/app/api-node.ts                 |   3 +
 .../app/components/__tests__/NodesView.test.tsx    |   8 +-
 .../hooks/__tests__/useNodeSettingsSync.test.ts    |  33 +++-
 .../dashboard/app/hooks/useNodeSettingsSync.ts     |   4 +-
 .../__tests__/routes-nodes-sync-contract.test.ts   |  18 +-
 .../src/__tests__/routes-nodes-sync.test.ts        | 191 ++++++++++++++++++++-
 .../src/routes/register-settings-memory-routes.ts  |   7 +-
 .../register-settings-sync-inbound-routes.ts       |  73 +++++++-
 .../src/routes/register-settings-sync-routes.ts    | 128 ++++++++++++--
 14 files changed, 488 insertions(+), 40 deletions(-)

Fusion-Task-Id: FN-6208

Fusion-Task-Lineage: 9b5e7e56-6287-4bc5-966a-2bcc2ba292a7
Adds a persisted `type` field (behavioral|static) to MissionContractAssertion,
the schema column + migration 115 + version-sweep, and store population with a
conservative static default so existing rows preserve legacy judging. Foundation
for scoping the validator's default-to-fail / verification posture.
… (U2,U3)

Behavioral/bug assertions now default to fail unless a bounded verification run
confirms them; static assertions keep the existing read-only judging path. Adds
mission-verification.ts (injected capability): disposable git checkout at a
trusted revision, explicit isolating sandbox backend with fail-closed + scrubbed
env, fixed command template + validated test path, merge-base pre-fix baseline
(rejects pass-on-both), git-clean post-condition, inconclusive as a first-class
verdict. Judge session stays tools:readonly.
The migration-115 bump in U1 left many test files asserting the old version 114;
update them (db-migrate.test.ts now asserts the exported SCHEMA_VERSION constant
to drop it from future sweeps).
… (U6)

Threads observed-vs-expected failure detail into the generated Fix Feature (R6);
routes inconclusive verdicts to needs-attention with no Fix Feature and a
distinguishable infra event (R21); makes createGeneratedFixFeature idempotent on
(sourceFeatureId, runId) / skip-when-open so recovery/reaper re-drives don't
duplicate features or exhaust the retry budget (R22); persists mission/audit
events for failures, inconclusive, and previously-swallowed triage errors (R16).
…ock (U7)

Widens the stranded-done recovery branch so a reaped task-less validating/done
feature is re-driven to a terminal verdict instead of stalling its slice forever
(closes the reaper->slice-deadlock P0). Adds a Surface Enumeration + adversarial
re-drive reliability suite (git-clean, no duplicate Fix Features, terminal
verdict, no deadlock) gating release. Updates missions docs + CONCEPTS for the
default-to-fail posture, non-mutating verification, and inconclusive verdict.
Adds changeset.
launchIsolatedApp() binds a non-reserved port (reuses the 4040/FUSION_RESERVED_PORTS
guards), serves a freshly-built bundle, runs against a fresh/empty disposable DB
under a run-unique tmpdir with a scrubbed env (no inherited credentials), and tears
down unconditionally on crash/timeout/abort. Process-spawn and bundle-build are
injected seams; 18 tests characterize the R13 isolation contracts. Real end-to-end
launch wires the default spawner behind the seam in a heavier lane.
Replaces the metadata/probe stub with a playwright-core driver (no bundled
browser download; uses a probe-discovered Chrome) exposing navigate/click/type/
observe + idempotent dispose. Browser-unavailable and un-exercisable assertions
return inconclusive (never a false pass/fail); a missing selector returns a
distinct 'absent' negative observation. Driver is a direct export scoped to the
verification run, not registered as a coding-agent tool.
…sm (U5)

Adds the app-driving channel (U4 isolated launch + U8 driver, via an injected
seam to keep the engine decoupled from the plugin), a dispatching capability that
routes by assertion shape (test/app/both), the R21 three-outcome mapping
(absent-vs-found by expectation; all infra failures -> inconclusive), and the R20
determinism contract (N-run agreement before an authoritative fail; flaky ->
inconclusive). Channel runs only against the isolated instance (R11).
- Drop the unnecessary regex escape in the shell-metacharacter class.
- Move the R17 git-clean post-condition out of finally (no-unsafe-finally): a
  dirty tree now fails closed to an inconclusive verdict instead of a
  finally-throw that masked the verdict. Test updated to the new behavior.
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@gsxdsm, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 29 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 370d300b-75eb-49e8-ae8e-b9bed2465af6

📥 Commits

Reviewing files that changed from the base of the PR and between 4797eac and c7f7a0c.

📒 Files selected for processing (2)
  • packages/engine/src/mission-execution-loop.ts
  • packages/engine/src/mission-verification-app-harness.ts
📝 Walkthrough

Walkthrough

This PR introduces a behavioral verification pipeline for mission contract assertions: adds a type field (static/behavioral) persisted via schema migration 118, wires four verification capability classes (test execution, app driving, determinism, dispatching) into the execution loop, implements an isolated app harness and Playwright-backed browser driver, extends node settings sync to carry per-workflow settings diffs, and adds extensive reliability/integration test suites.

Changes

Behavioral verification and workflow sync

Layer / File(s) Summary
Behavioral verification docs and release notes
.changeset/loud-nodes-sync.md, .changeset/proud-validators-verify.md, CONCEPTS.md, docs/missions.md, docs/missions-completion-contract.md, docs/brainstorms/..., docs/plans/...
Documents assertion typing (static/behavioral), advisory-only judge verdict, bounded non-mutating verification, inconclusive outcome, reaper redrive semantics, and reaper→slice deadlock closure. Adds changeset notes for both workflow sync and validator behavioral posture.
Assertion type schema, store, and fix-feature persistence
packages/core/src/mission-types.ts, packages/core/src/db.ts, packages/core/src/mission-store.ts, packages/core/src/index.ts, packages/core/src/__tests__/mission-store.test.ts, packages/core/src/__tests__/db*.test.ts, packages/core/src/__tests__/...
Introduces MissionAssertionType, normalization helper, and MissionContractAssertion.type; bumps SCHEMA_VERSION to 118 with an idempotent migration; normalizes type on all store read/write paths; extends createGeneratedFixFeature with failure-reason enrichment and idempotency via findGeneratedFixFeature/findOpenGeneratedFixFeature; updates all schema-version test expectations.
Verification contracts, sandbox, and harness
packages/engine/src/mission-verification.ts, packages/engine/src/mission-verification-app-harness.ts, packages/engine/src/verification-utils.ts, plugins/fusion-plugin-agent-browser/src/driver.ts, plugins/fusion-plugin-agent-browser/src/probe.ts, plugins/fusion-plugin-agent-browser/src/index.ts, plugins/fusion-plugin-agent-browser/package.json, packages/engine/src/__tests__/mission-verification*.test.ts, plugins/fusion-plugin-agent-browser/src/__tests__/*
Adds VerificationCapability interface and four capability classes; implements isolated app harness with reserved-port avoidance and disposable workspace/DB; adds infra-outcome flags and injectable backend to verification-utils; adds Playwright-core browser probe and driver with inconclusive-not-fail guarantees; adds full test suites for capabilities, harness, driver, and probe.
Mission execution loop behavioral posture
packages/engine/src/mission-execution-loop.ts, packages/engine/src/__tests__/mission-execution-loop.test.ts
Adds inconclusive to ValidationResult.status, verificationCapability option, applyBehavioralPosture for per-assertion routing, assertion backfill for partial judge responses, stranded-done recovery redrive, durable mission events for fail/inconclusive/triage outcomes, and handleValidationInconclusive; updates unit tests for new type field and failure-reason argument.
Reliability suites: behavioral posture and redrive surfaces
packages/engine/src/__tests__/reliability-interactions/mission-validator-behavioral-posture.test.ts, packages/engine/src/__tests__/reliability-interactions/mission-verification-redrive-surface.test.ts
Adds reliability interaction suites asserting static/behavioral posture outcomes, inconclusive handling, fix-feature dedup, durable event emission, and five end-to-end reaper/recovery re-drive paths reaching terminal verdicts without error-state deadlock or board residue.
Workflow settings sync contracts and route wiring
packages/dashboard/src/routes/register-settings-sync-routes.ts, packages/dashboard/app/components/settings/save-split.ts, packages/dashboard/app/hooks/useNodeSettingsSync.ts, packages/dashboard/src/__tests__/routes-nodes-sync*.test.ts
Adds SettingsDiff.workflowSettings; extends computeSettingsDiff, push, pull, and sync-receive routes to carry per-workflow diffs and apply patches; excludes customProviders from save-split; counts workflow diffs in sync state; updates contract and route tests.
Test lane quarantine and unskip updates
packages/engine/vitest.config.ts, packages/engine/src/__tests__/merger-file-scope-invariant.test.ts, packages/engine/src/__tests__/project-engine-manager.test.ts, scripts/lib/test-quarantine.json
Enables two previously skipped changeset-scope tests, updates vitest project include/exclude tiers, and populates the test-quarantine ledger with four flaky-test entries.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(70, 130, 180, 0.5)
    note over MissionExecutionLoop,VerificationCapability: Behavioral posture application
  end
  participant MissionExecutionLoop
  participant applyBehavioralPosture
  participant VerificationCapability
  participant MissionStore

  MissionExecutionLoop->>applyBehavioralPosture: judgeResult + linked assertions
  applyBehavioralPosture->>applyBehavioralPosture: static assertions keep judge verdict
  applyBehavioralPosture->>VerificationCapability: verifyBehavioralAssertion(request) per behavioral assertion
  VerificationCapability-->>applyBehavioralPosture: pass | fail | inconclusive
  applyBehavioralPosture-->>MissionExecutionLoop: ValidationResult (pass | fail | inconclusive)
  alt pass
    MissionExecutionLoop->>MissionStore: completeValidatorRun("passed")
  else fail
    MissionExecutionLoop->>MissionStore: completeValidatorRun("failed") + createGeneratedFixFeature(failureReason)
    MissionStore->>MissionStore: findGeneratedFixFeature / findOpenGeneratedFixFeature (dedup)
  else inconclusive
    MissionExecutionLoop->>MissionStore: completeValidatorRun("blocked")
    MissionExecutionLoop->>MissionStore: emitMissionEvent("verification_inconclusive")
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Runfusion/Fusion#1453: Adds new quarantine entries to scripts/lib/test-quarantine.json for flaky packages/engine tests, directly extended by this PR's four additional quarantine records.
  • Runfusion/Fusion#1345: Fixes missions stalled by stranded done features via recovery-loop refactoring in mission-execution-loop.ts, which this PR extends with the task-less done feature redrive and reaper→slice deadlock closure.
  • Runfusion/Fusion#1445: Modifies register-settings-sync-routes.ts and computeSettingsDiff to filter moved/tombstoned settings keys, directly overlapping with this PR's workflow settings diff extension on the same function.

Poem

🐇 Hop, hop — the judge says "maybe,"
But proof now seals the deal!
With sandboxes and disposable dens,
No phantom pass shall squeal.
Workflow settings ride the sync wave,
And deadlocks find no meal. ✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/validator-behavioral-verification

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR wires behavioral verification into the mission Validator: contract assertions typed behavioral now default to fail and require a non-mutating verification run (test-execution channel via disposable git worktree + isolating sandbox, or app-driving channel via browser driver) to earn a pass. It also closes the reaper→slice deadlock (stranded task-less done features are re-driven to a terminal verdict in recoverActiveMissions), makes createGeneratedFixFeature idempotent across re-drives, and threads observed-vs-expected failure reasons into Fix Feature descriptions.

  • U2/U3applyBehavioralPosture in the execution loop overrides the read-only judge's verdict for behavioral assertions; TestExecutionVerificationCapability enforces fail-closed isolation (R18), path validation (R19), disposable checkouts (R11/R17), and merge-base proof anti-spoofing (R5/AE5). App-driving channel is implemented but dormant in production pending classifier wiring.
  • U6/U7createGeneratedFixFeature is now idempotent with two-level dedup; inconclusive routes to handleValidationInconclusive with no Fix Feature; stranded task-less done features are re-driven in the recovery sweep to unblock slice completion.
  • Dashboard — settings-pull path now persists global settings and invalidates caches; workflow setting keys in syncedFields are qualified with workflowId to disambiguate shared keys across workflows.

Confidence Score: 4/5

Safe to merge after fixing the repairJson infinite loop in mission-execution-loop.ts.

The behavioral verification machinery, sandbox isolation, and deadlock fix are well-structured and the two previously-flagged issues (blocked-verdict rewrite, global sandbox race) have been addressed. The one remaining defect is in repairJson — const counters are loop-invariant so any malformed AI response with unbalanced braces produces a non-terminating synchronous while loop, permanently blocking the Node.js event loop since the validation timeout does not cover the JSON-repair step.

packages/engine/src/mission-execution-loop.ts — repairJson infinite-loop at lines 961–967

Important Files Changed

Filename Overview
packages/engine/src/mission-execution-loop.ts Adds behavioral-posture logic (applyBehavioralPosture), inconclusive routing, reaper→slice deadlock fix, and idempotent createGeneratedFixFeature calls. Contains a P1 infinite-loop bug in repairJson where const loop-invariant counters produce non-terminating while loops on malformed AI JSON.
packages/engine/src/mission-verification.ts New behavioral-verification capability: test-execution channel with R18 fail-closed isolation, R19 path validation, R11/R17 disposable checkouts, R5/AE5 merge-base proof anti-spoofing, app-driving channel, and DeterministicVerificationCapability. Previously-raised issues are addressed.
packages/core/src/mission-store.ts Adds assertion type (static/behavioral) to schema, insertion, and upsert paths; idempotent createGeneratedFixFeature with run-level and open-fix dedup; failureReason threaded into Fix Feature description.
packages/core/src/mission-types.ts Introduces MissionAssertionType union, MISSION_ASSERTION_TYPES constant, normalizeMissionAssertionType helper. Conservative static default correctly preserves legacy behavior.
packages/core/src/db.ts Schema bumped to v118; migration 118 adds type column to mission_contract_assertions with DEFAULT 'static' and hasTable guard. Safe and backward-compatible.
packages/engine/src/verification-utils.ts Adds timedOut/aborted/executionError infra-failure classifiers; runVerificationCommand/execWithProcessGroup now accept an explicit SandboxBackend to avoid global-state races.
packages/engine/src/mission-verification-app-harness.ts New isolated app-launch harness (U4) behind injected seam; production spawner left dormant per plan.
packages/dashboard/src/routes/register-settings-sync-routes.ts Fixes missing global-settings persistence on pull path; qualifies workflow setting keys with workflowId in syncedFields/appliedKeys.
plugins/fusion-plugin-agent-browser/src/driver.ts New Playwright-core browser driver (U8): navigate/observe with structural result shapes; properly disposed via finally blocks; dormant in production per plan.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Task completes] --> B[runFeatureValidation]
    B --> C[AI Judge Session\ntools: readonly]
    C --> D[parseValidationResult]
    D --> E[applyBehavioralPosture]
    E --> F{error or blocked?}
    F -- Yes --> G[Return early]
    F -- No --> H{Has behavioral assertions?}
    H -- No --> I[Return static judge verdict]
    H -- Yes --> J[Promise.all: verifyBehavioralAssertion]
    J --> K{Capability injected?}
    K -- No --> L[Default to fail]
    K -- Yes --> M[Select isolating backend R18]
    M --> N{Available?}
    N -- No --> O[inconclusive: fail-closed]
    N -- Yes --> P[Disposable checkout R11/R17]
    P --> Q[Run in sandbox R19]
    Q --> R{Proof supplied?}
    R -- Yes --> S[Baseline checkout R5/AE5]
    S --> T{Result?}
    T -- pass-on-both --> U[fail: proof invalid]
    T -- fail-base pass-impl --> V[pass]
    T -- fail-impl --> W[fail]
    R -- No --> X{Suite passed?}
    X -- Yes --> V
    X -- No --> W
    O --> Y[Aggregate]
    L --> Y
    V --> Y
    U --> Y
    W --> Y
    Y --> Z{allPassed?}
    Z -- Yes --> AA[pass: feature done]
    Z -- sawInconclusive --> AB[inconclusive: no Fix Feature]
    Z -- No --> AC[fail: createGeneratedFixFeature]
Loading

Comments Outside Diff (1)

  1. packages/engine/src/mission-execution-loop.ts, line 960-967 (link)

    P1 Infinite loop on malformed AI JSON with unbalanced braces/brackets

    openBraces, closeBraces, openBrackets, and closeBrackets are all const — they are captured once before the while loops and never change. If the validation agent returns truncated JSON such that openBraces > closeBraces (not uncommon with LLM output mid-sentence), the condition closeBraces < openBraces is statically true and the loop runs forever. Since this code is synchronous, it blocks the entire Node.js event loop indefinitely. The 10-minute validation timeout in runFeatureValidation wraps only runValidationSession, not parseValidationResultrepairJson, so no timer can rescue a hung call.

Reviews (4): Last reviewed commit: "fix(engine): clear Lint and Gate failure..." | Re-trigger Greptile

Comment thread packages/engine/src/mission-execution-loop.ts
Comment thread packages/engine/src/mission-verification.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
packages/core/src/mission-store.ts (1)

2986-2990: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate run-to-feature ownership before dedup/creation.

At Line 2986, the code verifies the run exists but not that run.featureId === sourceFeatureId. A mismatched call can attach lineage to the wrong run and consume retry budget on the wrong source feature.

Suggested fix
     const run = this.getValidatorRun(runId);
     if (!run) {
       throw new Error(`Validator run ${runId} not found`);
     }
+    if (run.featureId !== sourceFeatureId) {
+      throw new Error(
+        `Validator run ${runId} belongs to feature ${run.featureId}, expected ${sourceFeatureId}`,
+      );
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/mission-store.ts` around lines 2986 - 2990, The code in the
method using getValidatorRun(runId) checks existence but not ownership: after
retrieving run via getValidatorRun(runId) verify that run.featureId ===
sourceFeatureId before proceeding with dedup/creation; if they differ, throw an
Error (or return a clear failure) to prevent attaching lineage to the wrong run
and consuming the wrong source feature's retry budget. Ensure you update the
block around getValidatorRun/run to perform this equality check and fail fast
when mismatched.
packages/engine/src/mission-execution-loop.ts (3)

590-596: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear the validation timeout once the session finishes.

Every successful validation leaves this 10-minute timer armed. Repeated validations will accumulate pending timers until they fire, which keeps unnecessary work alive long after the run completed.

Suggested fix
-      const timeoutPromise = new Promise<never>((_, reject) => {
-        setTimeout(() => reject(new Error("Validation timeout")), VALIDATION_TIMEOUT_MS);
-      });
-
-      const validationPromise = this.runValidationSession(session.session, prompt);
-
-      await Promise.race([validationPromise, timeoutPromise]);
+      let timeoutHandle: ReturnType<typeof setTimeout> | undefined;
+      const timeoutPromise = new Promise<never>((_, reject) => {
+        timeoutHandle = setTimeout(() => reject(new Error("Validation timeout")), VALIDATION_TIMEOUT_MS);
+      });
+
+      const validationPromise = this.runValidationSession(session.session, prompt);
+
+      try {
+        await Promise.race([validationPromise, timeoutPromise]);
+      } finally {
+        if (timeoutHandle) clearTimeout(timeoutHandle);
+      }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/mission-execution-loop.ts` around lines 590 - 596, The
validation timeout Promise created as timeoutPromise uses setTimeout but never
clears the timer, so repeated validations accumulate pending timers; modify the
logic around timeoutPromise and validationPromise (the call to
runValidationSession and the use of VALIDATION_TIMEOUT_MS) to capture the timer
id from setTimeout, race the promises, and ensure clearTimeout(timerId) is
called once validationPromise settles (use then/finally on the raced result or
attach finally to validationPromise) so the timer is always cleared whether the
validation succeeds or errors.

1014-1051: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Backfill omitted linked assertions before recomputing the result.

This only synthesizes defaults when the judge returns zero assertion rows. A partial response can silently drop linked assertions, which means a feature can pass without evaluating every assertion, and skipped behavioral assertions never reach verifyBehavioralAssertion(...).

Suggested fix
   private extractAssertionResults(
     parsed: Record<string, unknown>,
     assertions: MissionContractAssertion[],
   ): Array<{ assertionId: string; passed: boolean; message?: string; expected?: string; actual?: string }> {
     const results: Array<{
@@
     if (Array.isArray(parsed.assertions)) {
       for (const item of parsed.assertions) {
@@
       }
     }
 
-    // If no assertion results but we have assertions, create default results based on status
-    if (results.length === 0 && assertions.length > 0) {
-      const overallPassed = parsed.status === "pass";
-      for (const assertion of assertions) {
-        results.push({
-          assertionId: assertion.id,
-          passed: overallPassed,
-          message: overallPassed ? "Passed" : "Failed",
-        });
-      }
+    const resultIds = new Set(results.map((r) => r.assertionId));
+    const overallPassed = parsed.status === "pass";
+    for (const assertion of assertions) {
+      if (resultIds.has(assertion.id)) continue;
+      results.push({
+        assertionId: assertion.id,
+        passed: overallPassed,
+        message: overallPassed ? "Passed" : "Failed",
+      });
     }
 
     return results;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/mission-execution-loop.ts` around lines 1014 - 1051, The
current logic only synthesizes defaults when parsed.assertions is empty,
allowing partial responses to omit linked assertions; update the processing in
mission-execution-loop.ts so after iterating parsed.assertions you build a
set/map of assertion IDs seen (from assertionId or id), then iterate the
original assertions array and for any assertion.id not present push a default
result (use parsed.status to set passed and suitable message) so every linked
assertion is represented; reference parsed.assertions, assertions, results, and
parsed.status (and ensure downstream verifyBehavioralAssertion(...) will receive
these synthesized rows).

1332-1352: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Notify autopilot with blocked when the retry budget is exhausted.

createGeneratedFixFeature(...) blocks the source feature on exhausted budget, but this method still falls through to notifyValidationComplete(featureId, "failed"). That leaves autopilot out of sync with the persisted terminal state and can trigger the wrong follow-up path. Based on learnings from the provided store contract, exhausted budgets transition the source feature to blocked.

Suggested fix
   private async handleValidationFail(
     featureId: string,
     runId: string | undefined,
     result: ValidationResult,
   ): Promise<void> {
     try {
+      let terminalStatus: "failed" | "blocked" = "failed";
+
@@
         if (message.includes("retry budget exhausted") || message.includes("exhausted its retry budget")) {
+          terminalStatus = "blocked";
           loopLog.warn(`Feature ${featureId} retry budget exhausted; marking as blocked`);
           // completeValidatorRun already handles the blocked transition when budget is exhausted
           this.logFeatureMissionEvent(featureId, "error", "retry_budget_exhausted", `Feature ${featureId} exhausted its retry budget`, {
             runId: runId ?? null,
           });
@@
       // Notify autopilot if configured
       if (this.missionAutopilot?.notifyValidationComplete) {
-        await this.missionAutopilot.notifyValidationComplete(featureId, "failed");
+        await this.missionAutopilot.notifyValidationComplete(featureId, terminalStatus);
       }
     } catch (err) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/mission-execution-loop.ts` around lines 1332 - 1352, The
code currently always calls
this.missionAutopilot.notifyValidationComplete(featureId, "failed") after
createGeneratedFixFeature errors; change it so when the error message indicates
a retry budget exhaustion (the same condition that logs "retry_budget_exhausted"
and emits "validation:budget_exhausted") you call
this.missionAutopilot.notifyValidationComplete(featureId, "blocked") instead of
"failed". Locate the block that checks message.includes("retry budget
exhausted") || message.includes("exhausted its retry budget") (and the else
branch that logs fix_feature_creation_failed) and add the alternate
notifyValidationComplete call for the exhausted-budget branch, keeping the
existing logging/emission behavior for that branch.
packages/dashboard/src/routes/register-settings-sync-routes.ts (2)

322-350: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persist pulled global settings through the dashboard store.

CentralCore.applyRemoteSettings() only validates/strips the global payload; it does not write ~/.fusion/settings.json. Unlike /settings/sync-receive, this last-write-wins path never calls store.updateGlobalSettings(...), so a successful pull can report remote global keys in appliedFields while leaving the local global settings unchanged.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dashboard/src/routes/register-settings-sync-routes.ts` around lines
322 - 350, The pull path currently validates but does not persist global
settings; after a successful central.applyRemoteSettings(...) and before
building applied/skipped lists, call the dashboard store to persist the pulled
global settings (e.g. invoke store.updateGlobalSettings(...) or the equivalent
method) using remoteSettings.global (or an empty object if missing), ensuring
this only runs when result.success is true so the local ~/.fusion/settings.json
is updated to reflect the remote last-write-wins state.

227-234: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve workflowId in the surfaced field lists.

These responses flatten workflow changes down to bare setting ids. If two workflows both sync timeoutMs (or any shared declaration id), callers only see duplicate "timeoutMs" entries and cannot tell which workflow changed. Return qualified keys (for example workflowId.settingId) or structured { workflowId, settingId } entries instead. The same ambiguity exists in the inbound route’s appliedFields.

Also applies to: 339-350

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dashboard/src/routes/register-settings-sync-routes.ts` around lines
227 - 234, The current syncedFields array (built from globalSettings,
projectSettings.project, and workflowSettings) flattens workflow setting ids and
loses which workflow they came from; update the collection logic that builds
syncedFields (and the similar appliedFields usage referenced around the other
block) to preserve workflow identity by emitting either qualified keys like
`${workflowId}.${settingId}` or small objects `{ workflowId, settingId }` for
entries coming from workflowSettings, while leaving global and project keys
unchanged; locate usages of workflowSettings, syncedFields, and appliedFields in
register-settings-sync-routes.ts and adjust the mapping for
Object.values(workflowSettings).flatMap(...) and the inbound appliedFields
builder so callers receive per-workflow qualified/structured entries.
🧹 Nitpick comments (3)
packages/engine/vitest.config.ts (1)

90-90: 💤 Low value

Redundant exclude entry in engine-core.

Line 90 adds merger-file-scope-invariant.test.ts to the engine-core exclude list, but engine-core uses an explicit include allow-list (lines 66-86) that never contained this file. Excluding a file that was never included is redundant (though harmless). If the file is being removed from the merge gate, simply not listing it in the include is sufficient.

♻️ Simplify by removing redundant exclude

Since engine-core uses an explicit include list and merger-file-scope-invariant.test.ts isn't in it, the exclude entry is unnecessary:

-          exclude: [
-            "node_modules/**",
-            "dist/**",
-            "src/__tests__/merger-file-scope-invariant.test.ts",
-          ],

However, keep the exclude if it serves as documentation that this file was intentionally removed from the gate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/vitest.config.ts` at line 90, Remove the redundant exclude
entry "src/__tests__/merger-file-scope-invariant.test.ts" from the engine-core
vitest exclude list because engine-core already uses an explicit include
allow-list (the include array) and that test file is not present there; simply
delete the string from the exclude array in vitest.config.ts so the config is
simpler and less confusing (keep the exclude only if you want an explicit
documentation note that the test was intentionally removed).
packages/core/src/__tests__/db-migrate.test.ts (1)

718-719: ⚡ Quick win

Remove duplicated schema-version assertions.

Line 718 (and the other listed ranges) repeats the exact same expect(db.getSchemaVersion()).toBe(SCHEMA_VERSION) twice in a row. Keep one assertion per site to reduce noise and keep failures crisp.

Proposed cleanup pattern
-    expect(db.getSchemaVersion()).toBe(SCHEMA_VERSION);
     expect(db.getSchemaVersion()).toBe(SCHEMA_VERSION);

Also applies to: 752-753, 803-804, 833-834, 875-876, 910-911, 948-949, 1229-1230, 1237-1238, 1244-1245

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/__tests__/db-migrate.test.ts` around lines 718 - 719,
Duplicate assertions calling db.getSchemaVersion() against SCHEMA_VERSION appear
back-to-back (e.g., expect(db.getSchemaVersion()).toBe(SCHEMA_VERSION)); remove
the redundant second occurrence at each location so only a single assertion
remains per check, leaving the test logic unchanged; look for uses of
db.getSchemaVersion and SCHEMA_VERSION in the test cases (lines shown around the
duplicate pairs) and delete the extra expect(...) lines at each listed spot.
packages/dashboard/app/components/settings/save-split.ts (1)

79-81: ⚡ Quick win

Document why customProviders is excluded from both patches.

customProviders is skipped in both the global patch (lines 79-81) and project patch (line 96), even though the test at line 121 of settings-save-split.test.ts confirms isGlobalSettingsKey("customProviders") returns true. This suggests customProviders is managed through a separate persistence path (perhaps for security/credential reasons), but the rationale is not documented in the code.

Consider adding a brief comment explaining why this field is intentionally excluded from the save-split flow, similar to the existing comments for persistAgentThinkingLog (line 76) and githubTrackingDefaultRepo (line 73).

Also applies to: 96-96

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/dashboard/app/components/settings/save-split.ts` around lines 79 -
81, Add a brief explanatory comment above both places where the code skips the
"customProviders" key (the if (key === "customProviders") continue; checks)
stating why customProviders is intentionally excluded from the global/project
save-split flow (e.g., persisted separately for security/credential or
provider-specific reasons), mirroring the style of the existing comments for
persistAgentThinkingLog and githubTrackingDefaultRepo; mention the
isGlobalSettingsKey("customProviders") test behavior to clarify that exclusion
is deliberate and not a bug.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/dashboard/app/api-node.ts`:
- Line 83: NodeSettingsSyncStatus currently declares workflowSettings as
required in api-node.ts but runtime/consumer code (useNodeSettingsSync) treats
it as possibly undefined; update the type in api-node.ts by changing the
declaration of workflowSettings to optional (workflowSettings?: Record<string,
string[]>) so it matches observed runtime behavior and avoid the type/runtime
mismatch; locate the NodeSettingsSyncStatus type and change the field there,
then run typechecks and keep the existing defensive usage in useNodeSettingsSync
(or remove it only after verifying backend always returns {}) to ensure safety.

In `@packages/dashboard/app/hooks/useNodeSettingsSync.ts`:
- Around line 35-37: The code defensively treats
NodeSettingsSyncStatus.diff.workflowSettings as possibly undefined, so update
the type to match runtime behavior: make the workflowSettings property optional
(workflowSettings?: Record<string, string[]>) on the NodeSettingsSyncStatus.diff
type declaration and ensure any consumers (e.g., useNodeSettingsSync's use of
diff.workflowSettings) rely on that optional typing; verify and adjust related
imports/usages of NodeSettingsSyncStatus and the workflowSettings symbol so the
compiler no longer requires the null-coalescing guard everywhere.

In `@packages/engine/src/__tests__/mission-verification.test.ts`:
- Around line 316-334: The test currently allows either "inconclusive" or "fail"
for timeout outcomes, which masks regressions; update the assertion in the
"inconclusive when the run times out (R9), still asserts source clean" test so
that verifyBehavioralAssertion's result (outcome.verdict) must be exactly
"inconclusive" (remove "fail" from the allowed set) — locate the test that
constructs TestExecutionVerificationCapability (using makeScriptedBackend with
outcome: "timeout") and change the expect that checks outcome.verdict to assert
a single-value equality to "inconclusive" while leaving the
materializer.cleanCalls assertion intact.

In `@packages/engine/src/mission-execution-loop.ts`:
- Around line 658-676: The code currently only short-circuits when
judgeResult.status === "error", so runs with judgeResult.status === "blocked"
get re-evaluated and may be downgraded to "fail" if any behavioral assertions
exist; update the early-exit logic to also preserve blocked verdicts by
returning judgeResult when judgeResult.status is "blocked" (i.e., change the
check in the initial short-circuit that references judgeResult to include
"blocked"), and apply the same fix to the analogous block handling around the
second occurrence (the other branch that re-aggregates assertions) so functions
like normalizeMissionAssertionType, the typeById map, and the hasBehavioral
gating do not override blocked runs.

In `@packages/engine/src/mission-verification.ts`:
- Around line 476-483: verifyBehavioralAssertion currently mutates process-wide
state by calling __setSandboxBackendForTests(isolating) and restoreBackend which
races under concurrent Promise.all; instead remove any calls to
__setSandboxBackendForTests in verifyBehavioralAssertion and pass the selected
backend object (obtained via backendFactory(selection.backendId) — the isolating
variable) directly into the execution path (e.g., add a backend parameter to
runVerificationCommand or to the lower-level resolveSandboxBackend call so
runVerificationCommand uses the provided backend rather than reading global
state). Update all call sites of runVerificationCommand (and the other affected
block referenced at the second occurrence) to forward the selected backend and
remove the temporary restoreBackend logic.
- Around line 504-553: The current logic treats any result with success ===
false as a test failure; change it to first distinguish
execution-level/infrastructure failures from real test failures by using the
richer outcome from runVerificationCommand (e.g., an returned enum or fields
like timedOut, aborted, exitCode, or executionError) rather than only success.
Update the branches that read baselineResult and implResult (inside the
validatedTestPath block and the whole-suite fallback) to: if either result
indicates an infra failure (timeout/abort/executionError), return inconclusive
via this.inconclusive(assertionId, "...") with an explanatory message; only
treat results as failing tests when the verifier specifically indicates a test
failure (e.g., exitCode/testStatus shows a failing test). Modify
runVerificationCommand or its returned shape and then use those new fields in
the checks around baselineResult, implResult, and the pass/fail logic so
timeouts/aborts do not count as reproduction or passes.

In `@packages/engine/vitest.config.ts`:
- Around line 87-91: The Vitest config excludes three test files
("src/__tests__/merger-file-scope-invariant.test.ts",
"src/__tests__/project-engine-manager.test.ts",
"src/__tests__/merger-ai-cleanup.test.ts") but no engine project (engine-core,
engine-default, engine-reliability, engine-slow) actually includes them, so they
never run; fix by either moving/renaming those tests to match an existing
project's include pattern (e.g., put them under a path matched by
engine-default's "src/**/*.test.ts" or engine-reliability/engine-slow patterns)
or update the appropriate project's include array to explicitly add those three
paths (or remove them from the exclude list if they should run in that project),
making sure the include/exclude rules for engine-core, engine-default,
engine-reliability, or engine-slow are consistent so the tests are discoverable.

In `@plugins/fusion-plugin-agent-browser/src/driver.ts`:
- Around line 243-253: The observe function currently swallows all errors from
page.waitForSelector and treats them as "absent"; change the catch to capture
the error (e.g., catch (err)) and distinguish timeout errors from others: if err
is a playwright.errors.TimeoutError return the absent observation as before,
otherwise return an inconclusive result (include the current page.url() and the
error message/details) or rethrow depending on caller expectations; update
references around observe, page.waitForSelector, obsOpts?.expectAbsent, and
DEFAULT_OP_TIMEOUT_MS to implement this narrow mapping.

---

Outside diff comments:
In `@packages/core/src/mission-store.ts`:
- Around line 2986-2990: The code in the method using getValidatorRun(runId)
checks existence but not ownership: after retrieving run via
getValidatorRun(runId) verify that run.featureId === sourceFeatureId before
proceeding with dedup/creation; if they differ, throw an Error (or return a
clear failure) to prevent attaching lineage to the wrong run and consuming the
wrong source feature's retry budget. Ensure you update the block around
getValidatorRun/run to perform this equality check and fail fast when
mismatched.

In `@packages/dashboard/src/routes/register-settings-sync-routes.ts`:
- Around line 322-350: The pull path currently validates but does not persist
global settings; after a successful central.applyRemoteSettings(...) and before
building applied/skipped lists, call the dashboard store to persist the pulled
global settings (e.g. invoke store.updateGlobalSettings(...) or the equivalent
method) using remoteSettings.global (or an empty object if missing), ensuring
this only runs when result.success is true so the local ~/.fusion/settings.json
is updated to reflect the remote last-write-wins state.
- Around line 227-234: The current syncedFields array (built from
globalSettings, projectSettings.project, and workflowSettings) flattens workflow
setting ids and loses which workflow they came from; update the collection logic
that builds syncedFields (and the similar appliedFields usage referenced around
the other block) to preserve workflow identity by emitting either qualified keys
like `${workflowId}.${settingId}` or small objects `{ workflowId, settingId }`
for entries coming from workflowSettings, while leaving global and project keys
unchanged; locate usages of workflowSettings, syncedFields, and appliedFields in
register-settings-sync-routes.ts and adjust the mapping for
Object.values(workflowSettings).flatMap(...) and the inbound appliedFields
builder so callers receive per-workflow qualified/structured entries.

In `@packages/engine/src/mission-execution-loop.ts`:
- Around line 590-596: The validation timeout Promise created as timeoutPromise
uses setTimeout but never clears the timer, so repeated validations accumulate
pending timers; modify the logic around timeoutPromise and validationPromise
(the call to runValidationSession and the use of VALIDATION_TIMEOUT_MS) to
capture the timer id from setTimeout, race the promises, and ensure
clearTimeout(timerId) is called once validationPromise settles (use then/finally
on the raced result or attach finally to validationPromise) so the timer is
always cleared whether the validation succeeds or errors.
- Around line 1014-1051: The current logic only synthesizes defaults when
parsed.assertions is empty, allowing partial responses to omit linked
assertions; update the processing in mission-execution-loop.ts so after
iterating parsed.assertions you build a set/map of assertion IDs seen (from
assertionId or id), then iterate the original assertions array and for any
assertion.id not present push a default result (use parsed.status to set passed
and suitable message) so every linked assertion is represented; reference
parsed.assertions, assertions, results, and parsed.status (and ensure downstream
verifyBehavioralAssertion(...) will receive these synthesized rows).
- Around line 1332-1352: The code currently always calls
this.missionAutopilot.notifyValidationComplete(featureId, "failed") after
createGeneratedFixFeature errors; change it so when the error message indicates
a retry budget exhaustion (the same condition that logs "retry_budget_exhausted"
and emits "validation:budget_exhausted") you call
this.missionAutopilot.notifyValidationComplete(featureId, "blocked") instead of
"failed". Locate the block that checks message.includes("retry budget
exhausted") || message.includes("exhausted its retry budget") (and the else
branch that logs fix_feature_creation_failed) and add the alternate
notifyValidationComplete call for the exhausted-budget branch, keeping the
existing logging/emission behavior for that branch.

---

Nitpick comments:
In `@packages/core/src/__tests__/db-migrate.test.ts`:
- Around line 718-719: Duplicate assertions calling db.getSchemaVersion()
against SCHEMA_VERSION appear back-to-back (e.g.,
expect(db.getSchemaVersion()).toBe(SCHEMA_VERSION)); remove the redundant second
occurrence at each location so only a single assertion remains per check,
leaving the test logic unchanged; look for uses of db.getSchemaVersion and
SCHEMA_VERSION in the test cases (lines shown around the duplicate pairs) and
delete the extra expect(...) lines at each listed spot.

In `@packages/dashboard/app/components/settings/save-split.ts`:
- Around line 79-81: Add a brief explanatory comment above both places where the
code skips the "customProviders" key (the if (key === "customProviders")
continue; checks) stating why customProviders is intentionally excluded from the
global/project save-split flow (e.g., persisted separately for
security/credential or provider-specific reasons), mirroring the style of the
existing comments for persistAgentThinkingLog and githubTrackingDefaultRepo;
mention the isGlobalSettingsKey("customProviders") test behavior to clarify that
exclusion is deliberate and not a bug.

In `@packages/engine/vitest.config.ts`:
- Line 90: Remove the redundant exclude entry
"src/__tests__/merger-file-scope-invariant.test.ts" from the engine-core vitest
exclude list because engine-core already uses an explicit include allow-list
(the include array) and that test file is not present there; simply delete the
string from the exclude array in vitest.config.ts so the config is simpler and
less confusing (keep the exclude only if you want an explicit documentation note
that the test was intentionally removed).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 641b44d1-bd9c-4003-ab03-d16707c99f72

📥 Commits

Reviewing files that changed from the base of the PR and between f0d2415 and c4549de.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (56)
  • .changeset/loud-nodes-sync.md
  • .changeset/proud-validators-verify.md
  • CONCEPTS.md
  • docs/brainstorms/2026-06-11-validator-behavioral-verification-requirements.md
  • docs/missions-completion-contract.md
  • docs/missions.md
  • docs/plans/2026-06-11-001-feat-validator-behavioral-verification-plan.md
  • packages/core/src/__tests__/central-core.test.ts
  • packages/core/src/__tests__/db-migrate.test.ts
  • packages/core/src/__tests__/db.test.ts
  • packages/core/src/__tests__/goals-schema.test.ts
  • packages/core/src/__tests__/insight-store.test.ts
  • packages/core/src/__tests__/merge-request-record.test.ts
  • packages/core/src/__tests__/mission-store.test.ts
  • packages/core/src/__tests__/run-audit.test.ts
  • packages/core/src/__tests__/store-merge-queue.test.ts
  • packages/core/src/__tests__/task-documents.test.ts
  • packages/core/src/central-core.ts
  • packages/core/src/db.ts
  • packages/core/src/index.ts
  • packages/core/src/mission-store.ts
  • packages/core/src/mission-types.ts
  • packages/core/src/types.ts
  • packages/dashboard/app/__tests__/api-node.test.ts
  • packages/dashboard/app/__tests__/settings-save-split.test.ts
  • packages/dashboard/app/api-node.ts
  • packages/dashboard/app/components/__tests__/NodesView.test.tsx
  • packages/dashboard/app/components/settings/save-split.ts
  • packages/dashboard/app/hooks/__tests__/useNodeSettingsSync.test.ts
  • packages/dashboard/app/hooks/useNodeSettingsSync.ts
  • packages/dashboard/src/__tests__/routes-nodes-sync-contract.test.ts
  • packages/dashboard/src/__tests__/routes-nodes-sync.test.ts
  • packages/dashboard/src/routes/register-settings-memory-routes.ts
  • packages/dashboard/src/routes/register-settings-sync-inbound-routes.ts
  • packages/dashboard/src/routes/register-settings-sync-routes.ts
  • packages/engine/src/__tests__/merger-file-scope-invariant.test.ts
  • packages/engine/src/__tests__/mission-execution-loop.test.ts
  • packages/engine/src/__tests__/mission-verification-app-driving.test.ts
  • packages/engine/src/__tests__/mission-verification-app-harness.test.ts
  • packages/engine/src/__tests__/mission-verification.test.ts
  • packages/engine/src/__tests__/project-engine-manager.test.ts
  • packages/engine/src/__tests__/reliability-interactions/mission-validator-behavioral-posture.test.ts
  • packages/engine/src/__tests__/reliability-interactions/mission-verification-redrive-surface.test.ts
  • packages/engine/src/mission-execution-loop.ts
  • packages/engine/src/mission-verification-app-harness.ts
  • packages/engine/src/mission-verification.ts
  • packages/engine/vitest.config.ts
  • plugins/fusion-plugin-agent-browser/package.json
  • plugins/fusion-plugin-agent-browser/src/__tests__/driver-scope.test.ts
  • plugins/fusion-plugin-agent-browser/src/__tests__/driver.test.ts
  • plugins/fusion-plugin-agent-browser/src/__tests__/probe-browser.test.ts
  • plugins/fusion-plugin-agent-browser/src/driver.ts
  • plugins/fusion-plugin-agent-browser/src/index.ts
  • plugins/fusion-plugin-agent-browser/src/probe.ts
  • plugins/fusion-plugin-roadmap/src/store/__tests__/roadmap-store.test.ts
  • scripts/lib/test-quarantine.json

Comment thread packages/dashboard/app/api-node.ts
Comment thread packages/dashboard/app/hooks/useNodeSettingsSync.ts Outdated
Comment thread packages/engine/src/__tests__/mission-verification.test.ts
Comment thread packages/engine/src/mission-execution-loop.ts
Comment thread packages/engine/src/mission-verification.ts Outdated
Comment thread packages/engine/src/mission-verification.ts
Comment thread packages/engine/vitest.config.ts
Comment thread plugins/fusion-plugin-agent-browser/src/driver.ts
- mission-execution-loop: preserve judge "blocked" verdict (don't downgrade to fail), clear validation timeout, backfill omitted linked assertions, notify autopilot "blocked" on exhausted retry budget
- mission-verification: pass selected sandbox backend explicitly to fix concurrent global-override race (fail-closed isolation); distinguish infra outcomes (timeout/abort) from real test failures so they stay inconclusive
- mission-store: validate run-to-feature ownership before fix-feature dedup/creation
- settings-sync routes: persist pulled global settings via store; qualify workflow field keys with workflowId
- useNodeSettingsSync: drop redundant ?? {} guard (backend always emits diff.workflowSettings)
- driver(agent-browser): map only Playwright TimeoutError to "absent" in observe(); other faults -> inconclusive
- vitest.config: remove redundant engine-core exclude (quarantine lives in engine-default)
- tests: tighten timeout verdict assertion to inconclusive; remove duplicate schema-version assertions
- save-split: document why customProviders is excluded (persisted via dedicated provider routes)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gsxdsm

gsxdsm commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed CodeRabbit out-of-diff review findings (commit 1bb8cea9f). These were posted in the review body without inline threads, so replying here:

Validate run-to-feature ownership before dedup/creation. (packages/core/src/mission-store.ts)

Addressed: createGeneratedFixFeature now fails fast if run.featureId !== sourceFeatureId, so a mismatched call can't attach lineage to the wrong run or consume the wrong feature's retry budget.

Clear the validation timeout once the session finishes. (mission-execution-loop.ts)

Addressed: the VALIDATION_TIMEOUT_MS timer handle is captured and clearTimeout'd in a finally around the Promise.race, so timers no longer leak across validations.

Backfill omitted linked assertions before recomputing the result. (mission-execution-loop.ts)

Addressed: extractAssertionResults now backfills a default for every linked assertion missing from a partial judge response (not just when zero rows return), so behavioral assertions always reach verifyBehavioralAssertion.

Notify autopilot with blocked when the retry budget is exhausted. (mission-execution-loop.ts)

Addressed: handleValidationFail tracks a terminalStatus and notifies autopilot with "blocked" (not "failed") on the exhausted-budget branch, keeping autopilot in sync with the persisted blocked transition.

Persist pulled global settings through the dashboard store. (register-settings-sync-routes.ts)

Addressed: on a successful pull the route now calls store.updateGlobalSettings(remoteSettings.global) + cache invalidation (mirroring /settings/sync-receive), so ~/.fusion/settings.json actually reflects the pulled last-write-wins state.

Preserve workflowId in the surfaced field lists. (register-settings-sync-routes.ts)

Addressed: workflow-sourced entries in syncedFields/appliedFields now emit qualified ${workflowId}.${settingId} keys, so a shared declaration id synced by two workflows is no longer ambiguous. Global/project keys unchanged.

Redundant exclude entry in engine-core. (vitest.config.ts)

Addressed: removed the no-op merger-file-scope-invariant.test.ts exclude from engine-core (its include allow-list never matched it); the FN-6206 quarantine remains effective via engine-default.

Remove duplicated schema-version assertions. (db-migrate.test.ts)

Addressed: removed 10 back-to-back duplicate expect(db.getSchemaVersion()).toBe(SCHEMA_VERSION) assertions, one per cited site.

Document why customProviders is excluded from both patches. (save-split.ts)

Addressed: added comments explaining customProviders is persisted via dedicated provider routes (register-custom-provider-routes.ts) which mask API keys on read — routing it through save-split would write masked keys back and clobber real credentials.

Validation: engine/core/dashboard/agent-browser typecheck clean; engine (104), core (298), dashboard (518), agent-browser (19) targeted suites pass.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/engine/vitest.config.ts (1)

108-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

These excludes disable the files in every engine Vitest lane.

engine-core no longer includes these files, engine-default now excludes them here, and they do not match engine-reliability or engine-slow. That leaves merger-file-scope-invariant.test.ts, project-engine-manager.test.ts, and merger-ai-cleanup.test.ts unrunnable anywhere instead of merely evicting them from the merge gate. If the intent is merge-gate eviction, drop these per-file excludes and let a non-gate project still own them.

As per coding guidelines, "A flake INSIDE the merge gate is evicted, not skipped: remove its line from engine-core allow-list in packages/engine/vitest.config.ts."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/vitest.config.ts` around lines 108 - 110, The per-file
excludes in the Vitest lane configuration are removing these tests from every
engine project instead of only evicting them from the merge gate. Update the
test selection in vitest.config.ts around the engine-core/engine-default lane
config so merger-file-scope-invariant.test.ts, project-engine-manager.test.ts,
and merger-ai-cleanup.test.ts are not excluded everywhere; keep them runnable in
a non-gate lane and only remove them from engine-core’s allow-list if that is
the intended merge-gate eviction behavior.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/engine/src/verification-utils.ts`:
- Around line 431-437: Reorder the condition checks in the error handling block
to prioritize abort detection over timeout detection. Currently, the check for
ETIMEDOUT or killed-process conditions (which sets result.timedOut) executes
before the ABORT_ERR check (which sets result.aborted), causing abort cases that
have both killed and aborted flags to be incorrectly labeled as timeouts. Swap
the order of these two if/else if branches so that the condition checking for
errish.code === "ABORT_ERR" || errish.aborted executes first, followed by the
check for errish.code === "ETIMEDOUT" || (errish.killed && result.exitCode ===
null). This ensures abort outcomes are correctly distinguished from timeout
outcomes for downstream routing decisions.

---

Duplicate comments:
In `@packages/engine/vitest.config.ts`:
- Around line 108-110: The per-file excludes in the Vitest lane configuration
are removing these tests from every engine project instead of only evicting them
from the merge gate. Update the test selection in vitest.config.ts around the
engine-core/engine-default lane config so merger-file-scope-invariant.test.ts,
project-engine-manager.test.ts, and merger-ai-cleanup.test.ts are not excluded
everywhere; keep them runnable in a non-gate lane and only remove them from
engine-core’s allow-list if that is the intended merge-gate eviction behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8e946b8c-c262-448a-bf9f-4cd927d0e039

📥 Commits

Reviewing files that changed from the base of the PR and between c4549de and 1bb8cea.

📒 Files selected for processing (14)
  • packages/core/src/__tests__/db-migrate.test.ts
  • packages/core/src/mission-store.ts
  • packages/dashboard/app/components/settings/save-split.ts
  • packages/dashboard/app/hooks/useNodeSettingsSync.ts
  • packages/dashboard/src/__tests__/routes-nodes-sync-contract.test.ts
  • packages/dashboard/src/__tests__/routes-nodes-sync.test.ts
  • packages/dashboard/src/routes/register-settings-sync-routes.ts
  • packages/engine/src/__tests__/mission-verification.test.ts
  • packages/engine/src/mission-execution-loop.ts
  • packages/engine/src/mission-verification.ts
  • packages/engine/src/verification-utils.ts
  • packages/engine/vitest.config.ts
  • plugins/fusion-plugin-agent-browser/src/__tests__/driver.test.ts
  • plugins/fusion-plugin-agent-browser/src/driver.ts
💤 Files with no reviewable changes (1)
  • packages/core/src/tests/db-migrate.test.ts
🚧 Files skipped from review as they are similar to previous changes (10)
  • packages/dashboard/app/hooks/useNodeSettingsSync.ts
  • packages/dashboard/app/components/settings/save-split.ts
  • packages/dashboard/src/tests/routes-nodes-sync-contract.test.ts
  • packages/engine/src/tests/mission-verification.test.ts
  • packages/core/src/mission-store.ts
  • plugins/fusion-plugin-agent-browser/src/tests/driver.test.ts
  • packages/dashboard/src/tests/routes-nodes-sync.test.ts
  • packages/dashboard/src/routes/register-settings-sync-routes.ts
  • packages/engine/src/mission-execution-loop.ts
  • packages/engine/src/mission-verification.ts

Comment on lines +431 to +437
if (!result.success && !maxBufferExceeded) {
const errish = err as { code?: number | string; killed?: boolean; aborted?: boolean };
if (errish.code === "ETIMEDOUT" || (errish.killed && result.exitCode === null)) {
result.timedOut = true;
} else if (errish.code === "ABORT_ERR" || errish.aborted) {
result.aborted = true;
} else if (result.exitCode === null) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check ABORT_ERR before the generic killed-process timeout heuristic.

toLegacyExecResult() marks in-flight aborts with { code: "ABORT_ERR", aborted: true, killed: true }. With the current branch order, those cases hit timedOut first and never set result.aborted, so user cancels get mislabeled downstream.

🔧 Minimal fix
-      if (errish.code === "ETIMEDOUT" || (errish.killed && result.exitCode === null)) {
-        result.timedOut = true;
-      } else if (errish.code === "ABORT_ERR" || errish.aborted) {
+      if (errish.code === "ABORT_ERR" || errish.aborted) {
         result.aborted = true;
+      } else if (errish.code === "ETIMEDOUT" || (errish.killed && result.exitCode === null)) {
+        result.timedOut = true;
       } else if (result.exitCode === null) {

Based on PR objectives, timeout and abort outcomes are supposed to stay distinct for inconclusive routing.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!result.success && !maxBufferExceeded) {
const errish = err as { code?: number | string; killed?: boolean; aborted?: boolean };
if (errish.code === "ETIMEDOUT" || (errish.killed && result.exitCode === null)) {
result.timedOut = true;
} else if (errish.code === "ABORT_ERR" || errish.aborted) {
result.aborted = true;
} else if (result.exitCode === null) {
if (!result.success && !maxBufferExceeded) {
const errish = err as { code?: number | string; killed?: boolean; aborted?: boolean };
if (errish.code === "ABORT_ERR" || errish.aborted) {
result.aborted = true;
} else if (errish.code === "ETIMEDOUT" || (errish.killed && result.exitCode === null)) {
result.timedOut = true;
} else if (result.exitCode === null) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/verification-utils.ts` around lines 431 - 437, Reorder
the condition checks in the error handling block to prioritize abort detection
over timeout detection. Currently, the check for ETIMEDOUT or killed-process
conditions (which sets result.timedOut) executes before the ABORT_ERR check
(which sets result.aborted), causing abort cases that have both killed and
aborted flags to be incorrectly labeled as timeouts. Swap the order of these two
if/else if branches so that the condition checking for errish.code ===
"ABORT_ERR" || errish.aborted executes first, followed by the check for
errish.code === "ETIMEDOUT" || (errish.killed && result.exitCode === null). This
ensures abort outcomes are correctly distinguished from timeout outcomes for
downstream routing decisions.

gsxdsm and others added 2 commits June 13, 2026 15:48
Resolve conflicts from concurrent main work:
- db.ts: renumber behavioral-verification migration 115 -> 118 after main's
  workflow-work-items (115), graphResumeRetryCount (116), autoMergeProvenance (117);
  SCHEMA_VERSION = 118. Update all schema-version test assertions to 118
  (keep SCHEMA_VERSION constant where used).
- vitest.config / test-quarantine.json: union both branches' quarantine sets
  (FN-6206 behavioral-verification files + main's merger-ai-cleanup/merger-ai/
  QuickEntryBox/soft-delete-blocker-residue/routes-settings entries).
- settings-sync routes: keep workflowId-qualified field keys + persisted global
  pull; drop duplicate workflowApplyResult from auto-merge interleave.
- Keep behavioral-verification changes in mission-execution-loop imports,
  useNodeSettingsSync, save-split, driver, and sync tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- mission-execution-loop: drop now-unused @fusion/core imports (TEST_MODE_RESOLVED,
  isTestModeActive, resolveTaskValidatorModel) left over from the merge; main
  refactored their consumers away. Keep normalizeMissionAssertionType.
- mission-verification-app-harness: add port-4040-allowlist marker; the "4040"
  references are documentation of the reserved-port guard contract, not a kill/bind.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant