-
Notifications
You must be signed in to change notification settings - Fork 296
docs: coordinator compaction recovery and restraint rules (#934) #953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,146 @@ | ||
| # Compaction Recovery | ||
|
|
||
| > Recovery mechanism for when conversation context is compacted. | ||
|
|
||
| The Coordinator writes a recovery checkpoint to `.squad/sessions/{session-id}.json` after each agent batch. If prior messages are missing upon context resumption (detected by the Coordinator), it reads the checkpoint to recover its place in the workflow without losing the plan. | ||
|
|
||
| ## Why Compaction Recovery Matters | ||
|
|
||
| Large conversations can exceed the Coordinator's context window. When this happens: | ||
| 1. Your LLM compacts prior messages (they're summarized or removed) | ||
| 2. The Coordinator loses its working memory of what happened | ||
| 3. **Without recovery:** Coordinator might replay earlier steps or lose the plan | ||
| 4. **With recovery:** Coordinator reads the session state checkpoint and continues exactly where it left off | ||
|
|
||
| ## Session State Checkpoint | ||
|
|
||
| ### What It Contains | ||
|
|
||
| A minimal checkpoint answering: **"What should the Coordinator do next?"** | ||
|
|
||
| Example: | ||
| ```markdown | ||
| # Session State — 2026-04-08T14:50:00Z | ||
|
|
||
| **Last Completed Step:** 8 (Results persisted to orchestration log) | ||
|
|
||
| **Next Action:** | ||
| - Spawn Scribe to summarize decisions made during this session | ||
|
|
||
| **Workflow Phase:** Post-agent-work | ||
|
|
||
| **Context:** Three agents completed: API Agent (refactored endpoints), Frontend Agent (updated components), Tests Agent (added coverage) | ||
| ``` | ||
|
|
||
| ### File Location | ||
|
|
||
| ``` | ||
| .squad/sessions/{session-id}.json | ||
| ``` | ||
|
|
||
| This file is **temporary** — it's overwritten after each agent batch and is listed in `.gitignore` to prevent accidental commits. | ||
|
|
||
| ## Compaction Recovery Flow | ||
|
|
||
| ### Detection | ||
| The Coordinator detects context compaction when: | ||
| - Prior conversation turns are missing from the context window | ||
| - Earlier agent results are no longer visible | ||
| - Memory gaps appear in the workflow timeline | ||
|
|
||
| ### Recovery Steps | ||
|
|
||
| 1. **Read Session State** | ||
| ``` | ||
| Open .squad/sessions/{session-id}.json | ||
| Parse "nextAction" field | ||
| ``` | ||
|
|
||
| 2. **Skip Replayed Steps** | ||
| - Do NOT re-run steps already recorded in the checkpoint | ||
| - Use the orchestration log to verify what happened | ||
| - Resume at the "nextAction" field | ||
|
|
||
| 3. **Continue Workflow** | ||
| - Execute the next action | ||
| - Update the checkpoint after each batch | ||
| - Proceed normally | ||
|
|
||
| ### Example Recovery | ||
|
|
||
| **Before Compaction:** | ||
| ``` | ||
| Coordinator spawned: API Agent → Frontend Agent → Tests Agent | ||
| Results persisted to orchestration log | ||
| (context is now full) | ||
| ``` | ||
|
|
||
| **Context Compaction Occurs:** | ||
| ``` | ||
| Session context is pruned to fit within token limit | ||
| Earlier messages are removed or summarized | ||
| Coordinator's memory of agent work is gone | ||
| ``` | ||
|
|
||
| **Upon Context Resumption:** | ||
| ``` | ||
| Coordinator notices missing prior messages | ||
| Reads .squad/sessions/{session-id}.json | ||
| Finds: "lastCompletedStep": 8 | ||
| Finds: "nextAction": "Spawn Scribe to summarize decisions" | ||
| Coordinator skips steps 1-8, jumps directly to spawning Scribe | ||
| ``` | ||
|
|
||
| ## Observable Behavior | ||
|
|
||
| - Session state checkpoint is written after each agent batch (step 8 in post-work flow) | ||
| - Checkpoint persists through context compaction | ||
| - Upon resumption, Coordinator continues the post-work flow without replaying earlier steps | ||
| - User sees no interruption — recovery is transparent | ||
|
|
||
| ## Session State Format | ||
|
|
||
| The session state checkpoint includes: | ||
| - **Timestamp:** ISO 8601 format (when checkpoint was created) | ||
| - **Last Completed Step:** Number of the most recent workflow step | ||
| - **Next Action:** One bullet point describing the next action | ||
| - **Workflow Phase:** Current phase (e.g., "post-agent-work", "pre-spawn") | ||
| - **Context:** Brief summary of what happened (for debugging) | ||
|
|
||
| Example: | ||
| ```markdown | ||
| # Session State — 2026-04-08T15:12:30Z | ||
|
|
||
| **Last Completed Step:** 8 (Orchestration log updated) | ||
|
|
||
| **Next Action:** | ||
| - Review agent results and determine if any follow-up work is needed | ||
|
|
||
| **Workflow Phase:** Post-agent-work | ||
|
|
||
| **Context:** | ||
| Agents spawned: API Agent (completed successfully), Frontend Agent (completed with warnings) | ||
| API Agent modified: src/routes/users.ts, src/routes/__tests__/users.test.ts | ||
| Frontend Agent modified: src/components/UserProfile.tsx | ||
| Warnings: Frontend Agent flagged 2 deprecated React APIs | ||
| ``` | ||
|
|
||
| ## Checkpoint Limitations | ||
|
|
||
| The session state checkpoint is **NOT**: | ||
| - Authoritative for architectural decisions (use `.squad/decisions.md` instead) | ||
| - Authoritative for work routing (use `.squad/routing.md` instead) | ||
| - A permanent archive (it's overwritten after each batch) | ||
| - A detailed work log (use the orchestration log for details) | ||
|
|
||
| It is **ONLY** a breadcrumb to help the Coordinator resume at the right place. | ||
|
|
||
| ## Related Concepts | ||
|
|
||
| - **Result Persistence** — Immediate archival of agent results to disk before context expires | ||
| - **Orchestration Log** — Timestamped records of every agent's work (`.squad/orchestration-log/`) | ||
|
|
||
| ## See Also | ||
|
|
||
| - [Result Persistence](./result-persistence.md) — How agent results are archived | ||
| - [Coordinator Restraint Rules](./coordinator-restraint.md) — How Coordinator avoids over-managing agents |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| # Coordinator Restraint Rules | ||
|
|
||
| > Principles that balance the Coordinator's proactive execution with respectful boundaries. | ||
|
|
||
| After dispatching agents, the Coordinator follows 6 restraint rules to avoid over-managing agents, duplicating their work, or spawning unsolicited follow-ups. | ||
|
|
||
| ## The 6 Rules | ||
|
|
||
| ### 1. No Context Re-explanation | ||
| **Rule:** Do NOT re-explain context agents already know. | ||
|
|
||
| **Why:** Agents read their charter, your decisions, and your routing rules before work starts. Repeating context wastes output space and signals mistrust. | ||
|
|
||
| **Observable Behavior:** | ||
| - Coordinator output focuses on what agents did, not reminding them what they know | ||
| - No sentences like "Remember, you're the API lead, so you need to..." | ||
|
|
||
| ### 2. Do NOT Intervene While Agents Run | ||
| **Rule:** Do NOT intervene while an agent is still running. | ||
|
|
||
| **Why:** Agents need uninterrupted focus. Jumping in mid-work breaks their reasoning and creates context conflicts. | ||
|
|
||
| **Observable Behavior:** | ||
| - If an agent is still in-progress, Coordinator waits for completion before responding | ||
| - No "I notice you're doing X, have you considered Y?" messages mid-run | ||
|
|
||
| ### 3. Present Output Directly | ||
| **Rule:** Do NOT summarize or rephrase agent output — present it directly. | ||
|
|
||
| **Why:** Coordinator narration adds noise and can misrepresent what agents intended to communicate. | ||
|
|
||
| **Observable Behavior:** | ||
| - Agent output appears in responses with minimal framing (one sentence max) | ||
| - No "To summarize what {agent} said:" preambles | ||
| - No "I think they meant..." interpretations | ||
|
|
||
| ### 4. No Unsolicited Analysis | ||
| **Rule:** Do NOT add unsolicited analysis without user request. | ||
|
|
||
| **Why:** Users didn't ask for Coordinator opinion. Agents already provided their analysis; additional commentary is noise. | ||
|
|
||
| **Observable Behavior:** | ||
| - Coordinator refrains from verbose summaries or "what I think this means" | ||
| - Analysis only appears when user explicitly asks: "What do you make of this?" or "Analyze this result" | ||
|
|
||
| ### 5. No Follow-up Agents Unless Mandated | ||
| **Rule:** Do NOT spawn follow-up agents unless explicitly requested, mandated, or part of a declared dependency chain. | ||
|
|
||
| **Why:** Agents have limited spawns per session. Over-zealous orchestration burns budget and violates user agency. | ||
|
|
||
| **Observable Behavior:** | ||
| - No unsolicited chain reactions (e.g., "Now I'll spawn Scribe to document this") | ||
| - Follow-up agents only spawn if: | ||
| - User asks: "Please have Agent X review this" | ||
| - Routing rules mandate it: "On merge, always spawn Ralph" | ||
| - Agent declares dependencies: "Depends on: API agent" | ||
|
|
||
| ### 6. Brief Coordinator Commentary | ||
| **Rule:** Keep coordinator commentary to 1-2 sentences maximum. | ||
|
|
||
| **Why:** Brevity respects user and agent time. Long preambles distract from agent results. | ||
|
|
||
| **Observable Behavior:** | ||
| - Coordinator framing is concise: "Agent completed the task" or "Here are the results:" | ||
| - No multi-sentence narratives about what happened or why | ||
|
|
||
| ## Enforcement | ||
|
|
||
| These rules are hardcoded into the Coordinator prompt and verified by: | ||
| - **Session state checkpoint:** Tracks Coordinator's last action; if a restraint violation is detected, session can be rolled back | ||
|
|
||
| ## See Also | ||
|
|
||
| - [Compaction Recovery](./compaction-recovery.md) — How Coordinator recovers when context is compacted | ||
| - [Result Persistence](./result-persistence.md) — How Coordinator preserves agent results before context expires | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| # Result Persistence | ||
|
|
||
| > Mandatory archival of agent results before context expires. | ||
|
|
||
| Agent results from `read_agent` expire within 2–3 minutes. The Scribe writes agent results to the orchestration log immediately after reading them to ensure results are not lost if the session ends or context is compacted. | ||
|
|
||
| ## Why Result Persistence Matters | ||
|
|
||
| When an agent completes work: | ||
| 1. You call `read_agent` to retrieve its results | ||
| 2. Coordinator displays output to you | ||
| 3. Session ends or context is compacted | ||
| 4. The `read_agent` response disappears from memory | ||
| 5. **Results are gone** unless persisted | ||
|
|
||
| Result persistence ensures the Coordinator archives every agent's work to disk before moving forward, so you can always look up what happened. | ||
|
|
||
| ## Result Persistence Flow | ||
|
|
||
| **Step 1: Immediate Write After `read_agent`** | ||
| After calling `read_agent`, before any other processing, the Scribe writes: | ||
| ``` | ||
| .squad/orchestration-log/{ISO8601-timestamp}-{agent-name}.md | ||
| ``` | ||
|
diberry marked this conversation as resolved.
diberry marked this conversation as resolved.
|
||
|
|
||
| Example filename: `2026-04-08T14-23-45Z-API-Agent.md` (hyphens instead of colons for Windows compatibility) | ||
|
|
||
| **Step 2: Log File Format** | ||
| Each log file contains: | ||
| - Agent name and spawn ID | ||
| - ISO 8601 timestamp of completion | ||
| - Original task description | ||
| - Result summary (what the agent did) | ||
| - List of files modified (if any) | ||
| - Full response text | ||
|
|
||
|
diberry marked this conversation as resolved.
|
||
| Example: | ||
| ```markdown | ||
| # Orchestration Log: 2026-04-08T14-23-45Z — API Agent | ||
|
|
||
| **Agent:** API Agent (spawn-id: api-agent-2026-04-08-142345) | ||
| **Task:** Refactor user endpoints to use dependency injection | ||
|
|
||
| **Status:** ✅ Complete | ||
|
|
||
| **Summary:** Refactored three endpoints (GET /users, POST /users, DELETE /users/:id) to accept dependency-injected logger. Added tests for each endpoint. All tests pass. | ||
|
|
||
| **Files Modified:** | ||
| - src/routes/users.ts | ||
| - src/routes/__tests__/users.test.ts | ||
|
|
||
| **Full Response:** | ||
| [Agent's complete output from read_agent] | ||
| ``` | ||
|
|
||
| **Step 3: Persistence Before Anything Else** | ||
| Result persistence happens BEFORE: | ||
| - Coordinator displays results to user | ||
| - Coordinator spawns follow-up agents | ||
| - Coordinator processes session state | ||
| - Any other action | ||
|
|
||
| This ensures results are safe even if the session crashes or context compacts mid-process. | ||
|
|
||
| ## Observable Behavior | ||
|
|
||
| - `.squad/orchestration-log/` contains timestamped markdown files for every agent that ran | ||
| - Each file includes agent name, task, result summary, and files modified | ||
| - If `read_agent` returns no response, Coordinator checks the filesystem for `history.md`, `decisions/`, and `output/` files written by the agent directly during its run | ||
| - Session never loses agent results | ||
|
|
||
| ## Orchestration Log Location | ||
|
|
||
| ``` | ||
| .squad/orchestration-log/ | ||
| ├── 2026-04-08T14-23-45Z-API-Agent.md | ||
| ├── 2026-04-08T14-35-12Z-Frontend-Agent.md | ||
| └── 2026-04-08T14-50-00Z-Scribe.md | ||
| ``` | ||
|
|
||
| File naming: `{ISO8601-timestamp}-{agent-name}.md` (timestamps use hyphens instead of colons for Windows compatibility) | ||
|
|
||
| ## Related Concepts | ||
|
|
||
| - **Compaction Recovery** — Session state checkpoint that helps Coordinator resume after context is compacted | ||
| - **Session State** — Lightweight checkpoint (`.squad/session-state.md`) that captures next action needed | ||
|
|
||
| ## See Also | ||
|
|
||
| - [Compaction Recovery](./compaction-recovery.md) — How Coordinator recovers from context compaction | ||
| - [Coordinator Restraint Rules](./coordinator-restraint.md) — How Coordinator avoids over-managing agents | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.