docs: add minion runbook, ops coverage reports, and planning docs

seanphan · seanphan · commit 8521bfada95a · 2026-02-22T18:09:44.000-08:00
Add closed-loop SOP, definition of done, n8n quickwin results,
SDK delegation plan, and ops coverage reports.
diff --git a/.python-version b/.python-version
@@ -0,0 +1 @@
+3.12.1
diff --git a/docs/minion/closed_loop_sop.md b/docs/minion/closed_loop_sop.md
@@ -0,0 +1,81 @@
+# Minion Closed-Loop SOP
+
+This SOP defines an autonomous execution loop for minion workers and QA.
+All autonomous runs must follow this sequence until DoD is met.
+
+## Phase Loop
+
+Plan -> Build -> Test -> Semantic Verify -> QA -> Loop/Retry -> Ship
+
+1. Plan
+   - Read task brief, source references, and current `docs/minion/definition_of_done.md`.
+   - Record assumptions, out-of-scope items, and risks.
+2. Build
+   - Apply focused changes only for the assigned task.
+   - Record files touched and command(s) planned.
+3. Test
+   - Run task-relevant checks (lint/tests/CLI checks as applicable).
+   - Capture command + exit code + key output.
+4. Semantic Verify
+   - Validate behavior against intended semantics, not only transport success.
+   - Confirm capability coverage, tool-backed execution, and expected outputs.
+5. QA
+   - Validate outputs against DoD and rejection rules.
+   - Verify release-readiness gate command is runnable and passing.
+6. Loop/Retry
+   - If any phase fails, classify and route via failure matrix.
+   - Apply remediation and re-run Plan→...→QA.
+7. Ship
+   - Stop only when all stop conditions are satisfied.
+
+## Stop Conditions
+
+- Stop and hand over as **DONE** when:
+  - Workflow and/or agent checks satisfy `docs/minion/definition_of_done.md`.
+  - Semantic Verify is pass/fail documented as `pass`.
+  - QA confirms release readiness and no hard blockers remain.
+- Stop and escalate as **BLOCKED** when:
+  - Max retry policy is exhausted.
+  - Required artifact is missing or unrecoverable in this loop.
+  - Security/compliance or dependency constraints cannot be resolved.
+
+## Max Retry Policy
+
+- Retry budget per task: 3 total full loop attempts.
+- Per attempt, rerun only failed phases after fixes.
+- After 3 failed full attempts, stop as BLOCKED and report blockers.
+
+## Artifact Requirements per Iteration
+
+Each loop attempt must produce:
+
+- `Plan`: task hypothesis and changed file list.
+- `Build`: code/docs diff and command plan.
+- `Test`: command list with pass/fail status.
+- `Semantic Verify`: explicit pass/fail against intended behavior.
+- `QA`: DoD check against `docs/minion/definition_of_done.md` with blockers.
+- `Release`: output of `bash scripts/release_readiness.sh` (or explicit failure reason).
+
+## Failure Classification and Remediation Routing
+
+- `Transport-only pass`
+  - Root cause: command exit status passed but intent/semantic checks failed.
+  - Route: rerun with semantic checks and user-behavior evidence.
+- `Semantic fail`
+  - Route: fix behavior mismatch and re-run Test + Semantic Verify.
+- `Test failure`
+  - Route: fix implementation defect, missing deps, or fixture issues; rerun failing tests.
+- `Tooling/infra failure`
+  - Route: capture error context, retry once, then escalate if persistent.
+- `Blocked/dependency`
+  - Route: mark BLOCKED immediately and escalate unresolved items.
+
+## Release Gate Requirements
+
+Before Ship, all runs must satisfy:
+
+1. `bash scripts/release_readiness.sh` completes successfully.
+2. DoD checks in this repo are met for affected scope.
+3. Semantic checks are explicitly marked pass (no placeholders).
+4. Artifacts for the iteration are present and internally consistent.
+5. Transport-only and silent semantic regressions are rejected.
diff --git a/docs/minion/definition_of_done.md b/docs/minion/definition_of_done.md
@@ -0,0 +1,41 @@
+# Minion Definition of Done
+
+This file defines what workers and QA must enforce before calling a workflow/agent task complete.
+
+## Workflow Done
+
+All required:
+
+1. Shape checks pass
+   - `agenticflow workflow validate --body @workflow.json`
+2. Lifecycle checks pass
+   - Create/update returns success
+   - Get/read confirms persisted entity
+3. Runtime checks pass
+   - Run returns `workflow_run_id`
+   - Run status reaches terminal `success`
+4. Semantic checks pass
+   - Output satisfies source intent, not just generic model response
+   - Tool-backed intent requires tool-backed behavior
+5. Evidence provided
+   - Payload(s), run id, final status payload, and short pass/fail table
+
+## Agent Done
+
+All required:
+
+1. `agent create` succeeds
+2. `agent get` returns created agent
+3. `agent update` succeeds
+4. `agent stream` succeeds with at least one real prompt
+5. If tool use is expected, at least one test proves tool-backed behavior
+6. Evidence provided: payloads, ids, transcript snippets, pass/fail table
+
+## Rejection Rules
+
+Reject as not done if any apply:
+
+1. Only dry-run evidence is provided.
+2. Only transport success is shown (no semantic verification).
+3. Required capabilities from source template are silently dropped.
+4. Errors are reported without actionable remediation.
diff --git a/docs/minion/tasks/qa-release.md b/docs/minion/tasks/qa-release.md
@@ -0,0 +1,19 @@
+You are QA release minion for agenticflow-cli.
+
+Mission:
+Run the CLI test suites and release gates, then return a strict PASS/FAIL verdict for release-readiness.
+
+Required commands (in this order):
+1) `PATH=/Users/sean/.nvm/versions/node/v22.18.0/bin:$PATH bash scripts/release_readiness.sh`
+2) `PYTHONPATH=. .venv/bin/python -m pytest -q tests/unit`
+3) CLI smoke checks:
+   - `PYTHONPATH=. .venv/bin/python scripts/agenticflow_cli.py --help`
+   - `PYTHONPATH=. .venv/bin/python scripts/agenticflow_cli.py code search --help`
+   - `PYTHONPATH=. .venv/bin/python scripts/agenticflow_cli.py code execute --help`
+   - `node ./bin/agenticflow.js --help`
+
+Output requirements:
+1) Provide PASS/FAIL.
+2) Provide exact commands run.
+3) If FAIL, provide top blockers with file/line if applicable.
+4) If PASS, confirm the package is ready for version bump + publish workflow run.
diff --git a/docs/minion/tasks/qa.md b/docs/minion/tasks/qa.md
@@ -0,0 +1,16 @@
+You are QA minion for agenticflow-cli.
+
+Mission:
+Validate remediation outputs against docs/solid_plan.md and docs/minion/definition_of_done.md.
+
+Required checks:
+1) Run `PATH=/Users/sean/.nvm/versions/node/v22.18.0/bin:$PATH bash scripts/release_readiness.sh`.
+2) Verify closed-loop harness latest artifact no longer fails on empty node validation and reaches create/run/poll path.
+3) Verify latest runtime report contains runtime + semantic verdict with evidence.
+4) Verify ops coverage uses declared supported baseline and avoids placeholder UUID parsing failures for supported executed ops.
+5) Verify coverage report exists and classifies each attempted operation.
+
+Acceptance policy:
+- Reject transport-only success.
+- Reject missing semantic evidence.
+- Produce final verdict: PASS or FAIL with blockers and exact files/commands.
diff --git a/docs/minion/tasks/worker-1.md b/docs/minion/tasks/worker-1.md
@@ -0,0 +1,26 @@
+You are worker-1 (`runtime-loop-fix`) for agenticflow-cli.
+
+Mission:
+Fix closed-loop harness so real template 6270 produces a non-empty workflow payload and reaches create/run/poll stages with real key.
+
+Scope:
+1) Inspect live response shape from:
+   - `GET /v1/workflow_templates/6270`
+2) Fix `scripts/runtime_loop_harness.py` template extraction logic to handle current live schema robustly.
+3) Add defensive fallback when extracted nodes are empty:
+   - fail loud with explicit reason, or
+   - auto-select a safe minimal runnable node only if deterministic.
+4) Preserve structured artifact output format.
+5) Add/adjust tests if testable without live network.
+
+Required validation:
+1) Run:
+   - `set -a; source /Users/sean/WIP/Antigravity-Workspace/WorkflowChef-Web/.env; set +a`
+   - `PYTHONPATH=. .venv/bin/python scripts/runtime_loop_harness.py --template-id 6270`
+2) Provide artifact paths and verdict fields.
+
+Acceptance:
+1) Harness no longer fails at validate due empty `nodes`.
+2) At least one attempt reaches create/run/poll stages.
+3) Report contains runtime + semantic verdict and clear evidence.
+4) Provide changed files and commands run.
diff --git a/docs/minion/tasks/worker-2.md b/docs/minion/tasks/worker-2.md
@@ -0,0 +1,28 @@
+You are worker-2 (`ops-coverage-fix`) for agenticflow-cli.
+
+Mission:
+Fix ops coverage harness so declared supported operations are evaluated realistically with real key, not mostly placeholder-validation failures.
+
+Scope:
+1) Review `scripts/ops_coverage_harness.py` declared operation set and execution policy.
+2) Implement fixture/bootstrap strategy for IDs:
+   - resolve real UUID workspace/project if possible,
+   - create/find temporary workflow/agent/thread/run when needed,
+   - avoid fake `*_demo` IDs for operations that require UUID path params.
+3) Separate operation support levels clearly:
+   - executed
+   - blocked-by-policy
+   - unsupported
+4) Ensure failures represent real auth/infra/semantic problems, not avoidable placeholder errors.
+5) Update `docs/ops_coverage_report.{json,md}` from a real-key run.
+
+Required validation:
+1) Run:
+   - `set -a; source /Users/sean/WIP/Antigravity-Workspace/WorkflowChef-Web/.env; set +a`
+   - `PYTHONPATH=. .venv/bin/python scripts/ops_coverage_harness.py --env-file /Users/sean/WIP/Antigravity-Workspace/WorkflowChef-Web/.env --report-json docs/ops_coverage_report.json --report-md docs/ops_coverage_report.md`
+2) Print summary totals and classification counts.
+
+Acceptance:
+1) No avoidable `uuid_parsing` failures remain for supported executed ops.
+2) Report classifications are actionable and exact by operation id.
+3) Provide changed files and commands run.
diff --git a/docs/minion/tasks/worker-2b.md b/docs/minion/tasks/worker-2b.md
@@ -0,0 +1,19 @@
+You are worker-2b (`ops-harness-single-owner`) for agenticflow-cli.
+
+Mission:
+Finish `scripts/ops_coverage_harness.py` to a correct, executable state and produce final real-key coverage reports.
+
+Rules:
+1) You are the ONLY worker editing `scripts/ops_coverage_harness.py` in this run.
+2) Keep compatibility with existing report schema where possible.
+3) Ensure script compiles and runs.
+4) Ensure support-scope constants are internally consistent (no undefined names).
+5) Execute harness with real key env:
+   - set -a; source /Users/sean/WIP/Antigravity-Workspace/WorkflowChef-Web/.env; set +a
+   - PYTHONPATH=. .venv/bin/python scripts/ops_coverage_harness.py --env-file /Users/sean/WIP/Antigravity-Workspace/WorkflowChef-Web/.env --report-json docs/ops_coverage_report.json --report-md docs/ops_coverage_report.md
+6) Print summary totals and classification counts.
+
+Acceptance:
+- script runs without syntax/runtime crash
+- docs/ops_coverage_report.json + .md regenerated
+- final message includes changed files and commands
diff --git a/docs/minion/tasks/worker-3.md b/docs/minion/tasks/worker-3.md
@@ -0,0 +1,22 @@
+You are worker-3 (`qa-remediation`) for agenticflow-cli.
+
+Mission:
+Run release gate + harnesses after worker fixes and produce a strict PASS/FAIL verdict with blockers.
+
+Scope:
+1) Pull latest local changes in your working tree.
+2) Run release gate:
+   - `PATH=/Users/sean/.nvm/versions/node/v22.18.0/bin:$PATH bash scripts/release_readiness.sh`
+3) Run closed-loop harness with real key:
+   - `set -a; source /Users/sean/WIP/Antigravity-Workspace/WorkflowChef-Web/.env; set +a`
+   - `PYTHONPATH=. .venv/bin/python scripts/runtime_loop_harness.py --template-id 6270`
+4) Run ops coverage harness with real key:
+   - `PYTHONPATH=. .venv/bin/python scripts/ops_coverage_harness.py --env-file /Users/sean/WIP/Antigravity-Workspace/WorkflowChef-Web/.env --report-json docs/ops_coverage_report.json --report-md docs/ops_coverage_report.md`
+5) Write final verdict summary with exact file paths.
+
+Acceptance:
+1) Report includes:
+   - release gate result
+   - runtime/semantic verdict from latest runtime artifact
+   - ops totals + classification counts
+2) Final output is PASS only if all required gates in `docs/solid_plan.md` are satisfied.
diff --git a/docs/minion/tasks/worker-4.md b/docs/minion/tasks/worker-4.md
@@ -0,0 +1,25 @@
+You are worker-4 (`support-matrix`) for agenticflow-cli.
+
+Mission:
+Define and enforce a precise "supported operations baseline" so coverage reflects intentional support, not all raw public spec endpoints.
+
+Scope:
+1) Audit:
+   - `src/agenticflow_cli/operation_ids.py`
+   - `src/agenticflow_cli/public_ops_manifest.json`
+   - `scripts/ops_coverage_harness.py`
+2) Implement a support matrix model (docs + code) that clearly marks each op as:
+   - supported-executed
+   - supported-blocked-policy
+   - out-of-scope
+3) Ensure harness and release docs use this same baseline.
+4) Update documentation with explicit rationale for each class.
+
+Constraints:
+- Do not over-claim support for endpoints not wrapped or not safely executable by CLI.
+- Keep backward compatibility for existing wrapper commands.
+
+Acceptance:
+1) There is a single source of truth for supported coverage scope.
+2) Coverage report + docs align with that scope.
+3) Provide changed files and commands run.
diff --git a/docs/minion/tasks/worker-4b.md b/docs/minion/tasks/worker-4b.md
@@ -0,0 +1,23 @@
+You are worker-4b (`support-matrix-docs-only`) for agenticflow-cli.
+
+Mission:
+Update docs only for support matrix baseline; do NOT edit Python code.
+
+Scope:
+1) Update docs to match current intended support categories used by harness:
+   - executed
+   - blocked-by-policy
+   - unsupported/out-of-scope
+2) Ensure docs clearly separate "declared public API" vs "CLI-supported coverage baseline".
+3) Update at least:
+   - docs/cli_secured_ops_baseline.md
+   - docs/public_api_agent_capabilities.md (if needed)
+4) Include release guidance for interpreting blocked/unsupported rows.
+
+Hard constraints:
+- Do not modify files under scripts/ or src/
+- Do not run destructive git commands
+
+Acceptance:
+- docs compile/read cleanly
+- final response includes exact doc files changed and rationale
diff --git a/docs/minion_runbook.md b/docs/minion_runbook.md
@@ -0,0 +1,69 @@
+# Minion Runbook (tmux + Codex)
+
+This repository supports an unattended multi-pane coding workflow modeled after one-shot agent runs:
+
+- 1 orchestrator pane
+- 4 worker panes
+- 1 QA pane
+
+All workers run `codex exec` with `gpt-5.3-codex-spark` and produce machine-readable artifacts.
+
+## 1) Prepare tasks
+
+Create or edit:
+
+- `docs/minion/tasks/worker-1.md`
+- `docs/minion/tasks/worker-2.md`
+- `docs/minion/tasks/worker-3.md`
+- `docs/minion/tasks/worker-4.md`
+- `docs/minion/tasks/qa.md`
+- `docs/minion/definition_of_done.md`
+
+Each worker task should be atomic and acceptance-testable in one shot.
+All workers and QA must enforce `docs/minion/definition_of_done.md`.
+
+## 2) Start the session
+
+```bash
+bash scripts/minion_orchestrator.sh \
+  --session af-minions \
+  --repo "$(pwd)" \
+  --tasks-dir "$(pwd)/docs/minion/tasks" \
+  --workers 4 \
+  --model gpt-5.3-codex-spark
+```
+
+Attach:
+
+```bash
+tmux attach -t af-minions
+```
+
+## 3) Artifacts
+
+The orchestrator writes artifacts to:
+
+- `.minion-runs/<timestamp>/worker-N.events.jsonl`
+- `.minion-runs/<timestamp>/worker-N.final.txt`
+- `.minion-runs/<timestamp>/worker-N.meta.json`
+- `.minion-runs/<timestamp>/qa.events.jsonl`
+- `.minion-runs/<timestamp>/qa.final.txt`
+- `.minion-runs/<timestamp>/qa.meta.json`
+
+## 4) Merge policy
+
+Before merge:
+
+1. Review worker outputs and diffs.
+2. Run local release gate:
+   - `bash scripts/release_readiness.sh`
+3. Accept only changes that pass tests and readiness gates.
+4. Reject changes that pass transport checks but fail semantic acceptance.
+
+## 5) Dry-run orchestration
+
+```bash
+bash scripts/minion_orchestrator.sh --dry-run
+```
+
+This prints pane commands without creating a tmux session.
diff --git a/docs/n8n_quickwin_create_results.md b/docs/n8n_quickwin_create_results.md
diff --git a/docs/n8n_quickwin_translation.md b/docs/n8n_quickwin_translation.md
diff --git a/docs/ops_coverage_report.json b/docs/ops_coverage_report.json
diff --git a/docs/ops_coverage_report.md b/docs/ops_coverage_report.md
diff --git a/docs/sdk_delegation_plan.md b/docs/sdk_delegation_plan.md
diff --git a/docs/solid_plan.md b/docs/solid_plan.md