feat(watch): /fleet hybrid dispatch — 2.9x faster parallel execution#776
feat(watch): /fleet hybrid dispatch — 2.9x faster parallel execution#776tamirdresher merged 11 commits intodevfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new --dispatch-mode option for squad watch --execute to support batching read-heavy issues into a single /fleet-style Copilot invocation (with a hybrid mode that keeps write-heavy work on the existing per-issue execution path).
Changes:
- Introduces
DispatchMode(task|fleet|hybrid) in watch config and wires it from CLI → config loader → capability contexts. - Adds
FleetDispatchCapabilityto build and run a multi-track/fleetprompt for read-heavy issues. - Updates
ExecuteCapabilityto classify issues by title keywords and route work based ondispatchMode.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/squad-cli/src/cli/commands/watch/index.ts | Passes dispatchMode through the per-capability config context. |
| packages/squad-cli/src/cli/commands/watch/config.ts | Adds DispatchMode, config field, merge logic, and JSON normalization. |
| packages/squad-cli/src/cli/commands/watch/capabilities/index.ts | Registers the new fleet-dispatch capability in the default registry. |
| packages/squad-cli/src/cli/commands/watch/capabilities/fleet-dispatch.ts | New capability that batches read-heavy issues and invokes Copilot with a /fleet prompt. |
| packages/squad-cli/src/cli/commands/watch/capabilities/execute.ts | Adds keyword classification + fleet/hybrid routing behavior for execution. |
| packages/squad-cli/src/cli-entry.ts | Parses --dispatch-mode and passes it into loadWatchConfig. |
| return { | ||
| success: true, | ||
| summary: dispatchMode === 'fleet' | ||
| ? `fleet mode: all ${executable.length} issues deferred to fleet-dispatch` | ||
| : `hybrid mode: no write issues (${readCount} read-only deferred to fleet)`, | ||
| data: { executed: 0, failed: 0, deferredToFleet: executable.length - writeIssues.length }, |
There was a problem hiding this comment.
In dispatchMode === 'fleet' this branch reports “all N issues deferred to fleet-dispatch”, but deferredToFleet is currently set to executable.length - writeIssues.length (read-only count). In fleet mode the deferred count should be executable.length (since both read+write are deferred).
| return { | |
| success: true, | |
| summary: dispatchMode === 'fleet' | |
| ? `fleet mode: all ${executable.length} issues deferred to fleet-dispatch` | |
| : `hybrid mode: no write issues (${readCount} read-only deferred to fleet)`, | |
| data: { executed: 0, failed: 0, deferredToFleet: executable.length - writeIssues.length }, | |
| const deferredToFleet = dispatchMode === 'fleet' ? executable.length : readCount; | |
| return { | |
| success: true, | |
| summary: dispatchMode === 'fleet' | |
| ? `fleet mode: all ${executable.length} issues deferred to fleet-dispatch` | |
| : `hybrid mode: no write issues (${readCount} read-only deferred to fleet)`, | |
| data: { executed: 0, failed: 0, deferredToFleet }, |
| // In fleet or hybrid mode, split read vs write issues | ||
| if (dispatchMode === 'fleet' || dispatchMode === 'hybrid') { | ||
| const writeIssues = executable.filter(i => classifyIssue(i.title) === 'write'); | ||
| const batch = dispatchMode === 'fleet' | ||
| ? [] // fleet mode: all issues go to fleet-dispatch capability | ||
| : writeIssues.slice(0, maxConcurrent); // hybrid: only write issues here | ||
|
|
There was a problem hiding this comment.
In dispatchMode === 'fleet', ExecuteCapability does no work and relies on the separate fleet-dispatch capability to be enabled. If a user sets --dispatch-mode fleet but doesn’t also enable --fleet-dispatch (or config), this will result in no execution happening. Consider automatically enabling fleet-dispatch when dispatchMode is fleet|hybrid, or invoking fleet dispatch directly from here when in fleet mode.
| // Cross-platform: shell-expand the file contents into the -p argument | ||
| const isWindows = process.platform === 'win32'; | ||
| const cmd = isWindows | ||
| ? `powershell -NoProfile -Command "copilot -p (Get-Content '${promptFile}' -Raw) --allow-all --no-ask-user --autopilot"` | ||
| : `copilot -p "$(cat '${promptFile}')" --allow-all --no-ask-user --autopilot`; | ||
|
|
||
| const result = execSync(cmd, { | ||
| cwd, | ||
| timeout: timeoutMs, | ||
| encoding: 'utf-8', | ||
| shell: true, | ||
| stdio: ['pipe', 'pipe', 'pipe'], | ||
| }); |
There was a problem hiding this comment.
invokeFleet() builds a shell command that inlines the prompt file contents into a quoted -p argument ("$(cat ...)" / PowerShell Get-Content ... -Raw) and executes it with shell: true. Issue titles/bodies are untrusted input and can contain quotes/newlines that break quoting, leading to command injection or arbitrary argument injection. Prefer execFile/spawn with an args array (no shell) and pass the prompt as a single argument safely (or use a supported file/stdin input mode if Copilot CLI has one).
| export class FleetDispatchCapability implements WatchCapability { | ||
| readonly name = 'fleet-dispatch'; | ||
| readonly description = 'Batch read-heavy issues into a parallel /fleet Copilot session'; | ||
| readonly configShape = 'boolean' as const; | ||
| readonly requires = ['gh']; | ||
| readonly phase = 'post-execute' as const; | ||
|
|
||
| async preflight(_context: WatchContext): Promise<PreflightResult> { | ||
| // Fleet dispatch requires the copilot CLI — quick sanity check | ||
| try { | ||
| execSync('copilot --version', { encoding: 'utf-8', stdio: 'pipe' }); | ||
| return { ok: true }; | ||
| } catch { | ||
| return { ok: false, reason: 'copilot CLI not found — required for fleet dispatch' }; | ||
| } | ||
| } |
There was a problem hiding this comment.
This capability’s requires lists only ['gh'], but preflight/execute invoke the standalone copilot binary (and on Windows also relies on powershell). This makes the preflight failure messaging/tooling metadata inaccurate. Update requires (and/or standardize on gh copilot if that’s the intended CLI) so dependency reporting matches what’s actually executed.
| const dispatchModeIdx = args.indexOf('--dispatch-mode'); | ||
| const dispatchMode = (dispatchModeIdx !== -1 && args[dispatchModeIdx + 1]) | ||
| ? args[dispatchModeIdx + 1] as 'fleet' | 'task' | 'hybrid' | ||
| : undefined; |
There was a problem hiding this comment.
--dispatch-mode is cast to the union type without runtime validation. If a user passes an invalid value, it will silently flow into config and later behave like task mode (because it won’t match fleet|hybrid), which is hard to diagnose. Consider validating against {task,fleet,hybrid} and either erroring or warning + defaulting to task (and, if fleet|hybrid, auto-enable fleet-dispatch to match the documented usage).
| /** Keywords that indicate read-heavy / analysis work. */ | ||
| const READ_KEYWORDS = [ | ||
| 'research', 'review', 'analyze', 'investigate', 'audit', | ||
| 'check', 'scan', 'assess', 'evaluate', 'fact-check', | ||
| 'document', 'report', | ||
| ]; | ||
|
|
||
| /** Keywords that indicate write-heavy / implementation work. */ | ||
| const WRITE_KEYWORDS = [ | ||
| 'fix', 'implement', 'create', 'build', 'refactor', | ||
| 'add', 'update', 'migrate', 'deploy', 'feature', | ||
| ]; | ||
|
|
||
| /** Classify an issue as read-heavy or write-heavy by title keywords. */ | ||
| export function classifyIssue(title: string): 'read' | 'write' { | ||
| const lower = title.toLowerCase(); | ||
| const isRead = READ_KEYWORDS.some(k => lower.includes(k)); | ||
| const isWrite = WRITE_KEYWORDS.some(k => lower.includes(k)); | ||
| if (isRead && !isWrite) return 'read'; | ||
| return 'write'; // default to write (safer — gets full agent session) | ||
| } |
There was a problem hiding this comment.
New dispatch-mode behavior (classifyIssue + fleet/hybrid routing) isn’t covered by the existing watch execute tests. Adding focused unit tests for classifyIssue() and for the fleet/hybrid branching (e.g., fleet defers all, hybrid executes only write issues up to maxConcurrent) would help prevent regressions.
There was a problem hiding this comment.
Fixed in commit e07e8f9. Added unit tests in \ est/cli/watch-execute.test.ts\ covering:
- \classifyIssue(): read keywords (research, review, analyze, investigate, audit), write keywords (fix, implement, add, update, refactor), default-to-write, case-insensitive, mixed-keyword tie-breaking
- Fleet mode: all executable issues dispatched
- Hybrid mode: only read-heavy issues fleet-dispatched, write issues execute
- Hybrid mode: assigned issues excluded from both paths
All 16 watch-execute tests pass.
…775) Adds --dispatch-mode flag to squad watch --execute with three modes: - task (default): existing Promise.all behavior, unchanged - fleet: all issues dispatched via single /fleet prompt, true parallelism - hybrid: read-heavy issues via /fleet, write-heavy via task tool + worktrees Benchmark results (4 real issues): - Fleet: 116s total (29s/issue avg) - Sequential: 332s total (83s/issue avg) - Fleet is 2.9x faster, ~25% cheaper on premium requests Key finding: /fleet ignores custom agents (.github/agents/) and spawns generic explore agents. This makes fleet ideal for read-only analysis (triage, research, reviews) but NOT for charter-driven code changes. New files: - fleet-dispatch.ts: FleetDispatchCapability with /fleet prompt builder - Updated execute.ts: issue classification (read vs write keywords) - Updated config.ts: DispatchMode type + config field - Updated cli-entry.ts: --dispatch-mode flag parsing Closes #775 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove shell:true and stdio from execSync options (fixes overload mismatch) - Add .changeset/fleet-dispatch-hybrid.md for changelog-gate CI check - Build verified locally: tsc --noEmit passes clean Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. deferredToFleet count: use executable.length in fleet mode (was read-only count) 2. Warning log when fleet/hybrid defers but fleet-dispatch may not be enabled 3. Command injection fix: execFileSync with args array instead of shell command 4. requires: added 'copilot' alongside 'gh' 5. --dispatch-mode validation: error on invalid values, default to task 6. Unit tests: 11 tests for classifyIssue() covering all keyword categories Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8893c70 to
7d57659
Compare
🛫 PR Readiness Check
|
| Status | Check | Details |
|---|---|---|
| ❌ | Single commit | 11 commits — consider squashing before review |
| ✅ | Not in draft | Ready for review |
| ✅ | Branch up to date | Up to date with dev |
| ❌ | Copilot review | No Copilot review yet — it may still be processing |
| ✅ | Changeset present | Changeset file found |
| ✅ | Scope clean | No .squad/ or docs/proposals/ files |
| ✅ | No merge conflicts | No merge conflicts |
| ❌ | Copilot threads resolved | 3 unresolved Copilot thread(s) — fix and resolve before merging |
| ❌ | CI passing | 15 check(s) still running |
This check runs automatically on every push. Fix any ❌ items and push again.
See CONTRIBUTING.md and PR Requirements for details.
diberry
left a comment
There was a problem hiding this comment.
LGTM
Review Summary
Verdict: Approve
What's Good
- Real performance win - 2.9x faster on a 4-issue board. Benchmarked on real issues. MCP startup amortized (1x vs 4x).
- Three-mode design - task (backward compatible default), fleet (all parallel), hybrid (reads->fleet, writes->task). Safe rollout.
- Issue classification is pragmatic - Keyword-based read/write split, defaults to write when ambiguous (safer). 11 tests.
- Documented the key finding - Fleet ignores custom agents and spawns generic explore agents. Honest engineering, explains why hybrid exists.
- Clean plugin architecture - FleetDispatchCapability slots into capability registry at post-execute phase. No special-casing.
- Good security - execFileSync with args array (no shell injection). Temp file cleanup in finally block.
Concerns (non-blocking)
- 4 commits - should squash before merge (readiness check expects 1)
- Hardcoded copilot binary - uses copilot.cmd/copilot instead of agentCmd config. Should use context.agentCmd or document why fleet always uses Copilot CLI
- Duplicate issue fetching - both ExecuteCapability and FleetDispatchCapability independently call listWorkItems. Future: pass issue list through context.data
- Keyword overlap fragile - "Audit and refactor" goes to write because refactor keyword. Acceptable for v1, future could weight by position
Non-blocking suggestions
- Squash to 1 commit
- Make fleet binary configurable instead of hardcoding copilot/copilot.cmd
- Pass issue list through context to avoid double API calls
- Future: consider LLM-based classification for ambiguous titles
- Add warning log when dispatchMode=fleet/hybrid but fleet-dispatch cap is not enabled - Add comment documenting --dispatch-mode runtime validation - Re-export classifyIssue from watch/index barrel - Add classifyIssue unit tests for dispatch-mode categories - Pass dispatchMode through to capability synthesized config Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🏗️ Architectural Review
Automated architectural review — informational only. |
…es PS1 design) The execute capability now passes ALL squad issues to the agent and lets the agent decide what to work on, matching the PS1 ralph-watch behavior. The prompt includes Task/WHY/Success/Escalation structure and references .squad/ralph-instructions.md for full instructions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eIssues The PR #776 simplified findExecutableIssues to only filter by squad label (intentionally minimal), but this dropped two pre-existing filters that the test suite expects: - exclude issues already assigned to a human - exclude issues with status:blocked (or similar blocking labels) These filters match the PS1 ralph-watch pre-filter logic and are the right behavior — the agent should only receive clearly actionable issues. Fixes test: CLI: watch execute mode > findExecutableIssues > returns only issues ready for execution
🟡 Impact Analysis — PR #776Risk tier: 🟡 MEDIUM 📊 Summary
🎯 Risk Factors
📦 Modules Affectedroot (2 files)
squad-cli (7 files)
squad-sdk (3 files)
tests (4 files)
|
Addresses active review comment on PR #776: add unit tests for classifyIssue() (read vs write classification) and fleet/hybrid dispatch routing logic. Tests cover: - classifyIssue() read keywords (research, review, analyze, investigate, audit) - classifyIssue() write keywords (fix, implement, add, update, refactor) - Default to write when no keywords match - Default to write when both read and write keywords appear - Case-insensitivity - Fleet mode: all executable issues dispatched - Hybrid mode: only read-heavy issues fleet-dispatched, write issues execute - Hybrid mode: assigned issues excluded from both paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Superseded by #830 — combined into a single watch next-gen PR for easier review. |
|
Closing — superseded by #830 (watch-next-gen combined PR) |
/fleet Hybrid Dispatch for
squad watch --executeAdds
--dispatch-modeflag with three modes:task(default)Promise.all— unchangedfleet/fleetprompthybrid/fleet, writes via task toolUsage
squad watch --execute --dispatch-mode hybrid squad watch --execute --dispatch-mode fleet squad watch --execute # default: task (unchanged)Benchmark (4 real issues)
How hybrid dispatch works
Fleet builds one prompt with N parallel tracks:
Key finding: Fleet ignores custom agents
@sevenor@datain fleet prompts, Copilot CLI spawns generic explore agents — NOT custom agents from.github/agents/. Charters are NOT followed. This is why hybrid mode uses fleet only for reads and task tool for writes.Files changed
fleet-dispatch.tsexecute.tsconfig.tsDispatchModetype + config fieldcli-entry.ts--dispatch-modeflag parsingcapabilities/index.tswatch/index.tsBuilds on
Closes #775