Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ The split below is by question type, not by human-versus-agent audience.
- Need Decodex operator lane-control capability support, including inspect,
pause/resume, scan, interrupt, steer, retained retry/resume, manual attention, or
unsupported/deferred controls -> `docs/spec/lane-control.md`
- Need the post-control recovery sequence after lane interrupt, hard fallback, broad
steer, task replacement, or ambiguous retained evidence ->
`docs/runbook/lane-control-recovery.md`
- Need public static-site contracts, GitHub bundle schemas, signal-entry schemas, or
release-delta schemas -> `docs/spec/`
- Need runbooks, migrations, validation steps, troubleshooting, or operational
Expand Down
4 changes: 4 additions & 0 deletions docs/runbook/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ Question this index answers: "which sequence should I execute?"
`decodex.space` custom-domain setup for the static public site.
- [`linear-archive-hygiene.md`](./linear-archive-hygiene.md) for dry-run-first
archive hygiene of old terminal Linear issues by repo label.
- [`lane-control-recovery.md`](./lane-control-recovery.md) for deciding whether to
inspect, resume, scan, keep or remove queue labels, or route manual attention after
interrupt, hard fallback, broad steer, task replacement, or ambiguous recovery
evidence.
- [`local-github-signal-workflow.md`](./local-github-signal-workflow.md) for collecting
GitHub change bundles, running Codex editorial analysis, validating signal entries,
and publishing static site content.
Expand Down
155 changes: 155 additions & 0 deletions docs/runbook/lane-control-recovery.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Lane-Control Recovery

Goal: Give agents and operators a bounded recovery sequence after Decodex lane
interrupt, hard fallback, broad steer, task replacement, or ambiguous retained-lane
evidence.

Read this when: A lane-control request has returned, timed out, fallen back, changed
task content materially, or left unclear evidence about whether a retained lane should
resume, requeue, stop, or require human attention.

Inputs: Registered project id, issue identifier, run id, attempt number, current turn
id when available, control request result, `decodex status` or
`decodex status --json`, private evidence from `decodex evidence`, tracker state,
retained worktree state, and PR lineage when present.

Depends on: [`../spec/lane-control.md`](../spec/lane-control.md),
[`../spec/tracker-tools.md`](../spec/tracker-tools.md),
[`../reference/operator-control-plane.md`](../reference/operator-control-plane.md),
[`./recover-review-handoff.md`](./recover-review-handoff.md), the Decodex `automation`,
`manual-cli`, and `labels` skills, plus the registered project `project.toml` and
`WORKFLOW.md`.

Verification: The chosen path should cite the inspection evidence, the control outcome,
the retained worktree or PR lineage when relevant, and the supported Decodex command,
API, label skill, or issue-scoped tracker tool used for the next mutation.

## Recovery Principle

Lane control is not a shortcut around retained-lane lifecycle. `turn/steer` can carry
broad operator text, and `hard_interrupt_fallback` can stop a recorded child process
when explicitly requested, but recovery still has to preserve audit, lane identity,
workflow policy, and useful local work.

Do not directly kill hidden `_attempt` children, edit runtime DB rows, or mutate Linear
labels to simulate lane control. The normal paths are CLI/API lane controls, retained
retry/resume, explicit recovery commands, label-skill actions, issue-scoped tracker
tools, and manual attention. If an operator had to stop a process outside Decodex
controls for immediate host safety, treat the next state as ambiguous evidence until
the lane, private evidence, and worktree have been inspected.

## Inspect First

Run the smallest set of inspections that can prove the lane identity and current owner:

```sh
decodex lane inspect <ISSUE> --run-id <RUN_ID> --json
decodex status --json
decodex diagnose --json
decodex evidence <ISSUE> --run-id <RUN_ID> --attempt <N> --json
```

Use the local HTTP API only against the same trusted listener when CLI access is not
the active surface:

```sh
curl -sS 'http://127.0.0.1:8912/api/lane/inspect?projectId=<service-id>&issue=<ISSUE>&runId=<RUN_ID>'
```

Before mutating anything, confirm:

- project id and registered project path
- issue identifier, tracker state, and service-scoped labels
- branch, worktree, and whether the worktree is active, retained, queued-attention, or
cleanup-only
- run id, attempt, thread id, current turn id, and process/protocol liveness
- control outcome such as accepted, rejected, timed out, failed, or
`hard_interrupt_fallback`
- private evidence and public lifecycle signal
- PR URL, head branch, and head SHA when the lane has crossed review handoff

If these facts do not prove the requested lane, do not steer, interrupt, retry, resume,
or clean labels.

## Decision Tree

| Evidence after inspection | Agent decision | Supported next action |
| --- | --- | --- |
| Active lane still matches the issue, branch, run id, attempt, and turn. | Let the runtime continue or wait for the control result. | No label change. Use the next CLI/API control only when the operator explicitly asks. |
| Soft interrupt was accepted and the runtime is still resolving the attempt. | Wait for status, protocol activity, or evidence to settle. | Re-inspect; do not requeue or force-kill. |
| Hard fallback reports `hard_interrupt_fallback`. | Treat it as an interrupted runtime event, not a graceful completion. | Inspect retained worktree and evidence; resume only if lineage is exact. |
| Retained worktree has useful local changes and lineage matches issue, branch, runtime evidence, and PR when present. | Resume or repair the same lane. | Use `decodex run <ISSUE>` when the registered workflow makes it eligible, or use the specific retained recovery runbook. |
| Review handoff marker is missing or stale but the retained PR lane appears recoverable. | Diagnose before rebind. | Run `decodex recover review-handoff diagnose <ISSUE>` and follow [`recover-review-handoff.md`](./recover-review-handoff.md). |
| Queue label or tracker state was changed and the scheduler should observe it before the next poll. | Request a refresh, not a retry. | `POST /api/linear-scan` with `projectId`, or no body for all enabled projects. |
| Queue label should be added, removed, or interpreted. | Use service-scoped label policy. | Follow the `labels` skill; do not guess `<service-id>` or clear `needs-attention` before fixing the blocker. |
| Broad steer materially changes the objective or acceptance contract. | Preserve audit and resolve lifecycle explicitly. | Update and requeue the same issue, create a new issue/lane, or route the owned run to manual attention. |
| Operator wants a different issue or replacement task. | Treat as task replacement, not steer. | Stop or pause through supported controls as needed, then create/update/requeue through the supported lifecycle. |
| Evidence is missing, contradictory, or would require guessing whether local work is safe to overwrite. | Stop automatic recovery. | Use manual attention with structured public blockers and keep private evidence local. |

## Broad Steer Examples

Broad steer can be delivered by the runtime, but it does not erase lifecycle authority.

Example: an active lane is implementing "add lane-control guidance" and an operator
steers "ignore that and add dashboard retry buttons." The CLI/API may accept the steer
when the run id and expected turn id match. After the turn resolves, an agent must
inspect the diff and evidence. If the issue still has the old objective and the diff
now contains dashboard controls, do not hand off the PR as if the original issue was
satisfied. Preserve the steer audit and either create a replacement issue, update and
requeue the current issue through explicit lifecycle, or route manual attention.

Example: an operator steers "narrow this to docs only; do not touch Rust." If the issue
still accepts that scope and the resulting diff matches the same acceptance criteria,
the lane may continue after inspection. The agent should still cite the steer evidence
and ensure the review handoff summary does not imply unrequested runtime behavior
changed.

## Interrupt And Hard Fallback Examples

Example: `decodex lane interrupt XY-123 --run-id run-abc` reports a soft interrupt
request. Re-run `decodex lane inspect` or `decodex status --json`. If protocol
activity shows the same turn is still stopping, wait or inspect private evidence; do
not kill the child process from the side.

Example: `decodex lane interrupt XY-123 --run-id run-abc --force` reports
`hard_interrupt_fallback`. Inspect the retained worktree before retry. If the worktree
contains a partial patch that still belongs to `XY-123`, resume through
`decodex run XY-123` only when `WORKFLOW.md` eligibility, runtime evidence, branch,
and PR lineage still match. If the patch belongs to a replaced task or the issue state
is unclear, route manual attention.

## Label And Scan Rules

`POST /api/linear-scan` only asks the local listener to refresh Linear-backed intake and
status before the next scheduled poll. It does not start an attempt, retry a failed
lane, or change labels.

Keep `decodex:queued:<service-id>` when the issue is still intended for automation and
the scheduler simply needs to observe a changed state. Remove it only through the
labels skill when the issue should no longer be an intake candidate. Keep
`decodex:needs-attention` until the recorded blocker is resolved; clearing it is not a
recovery shortcut.

During an owned automation run, agents use issue-scoped tracker tools for progress,
review handoff, manual attention, and terminal finalization. Outside the owned run,
operators use the documented CLI/API controls and label procedures.

## Manual Attention Route

Use manual attention when:

- lane identity cannot be proven from current evidence
- retained work may be overwritten or discarded without a human decision
- broad steer or task replacement changed the issue authority
- hard fallback stopped a process but retained worktree state is unclear
- Linear labels, active ownership, or tracker state conflict with runtime evidence
- PR lineage cannot be validated after review handoff

The valid owned-agent path is:

1. add the configured `decodex:needs-attention` label
2. call `issue_comment` with `kind = "manual_attention"` and structured public fields
3. call `issue_terminal_finalize(path = "manual_attention")`

Keep host-local paths, private payloads, raw steer text, process diagnostics, account
details, and secrets out of the public Linear fields.
5 changes: 4 additions & 1 deletion docs/spec/lane-control.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@ Status: normative
Read this when: You are implementing, validating, or using CLI/API controls for active
or retained Decodex lanes.
Not this document: The full runtime state machine, the low-level app-server method
schema, dashboard layout, or tracker-tool payload schema.
schema, dashboard layout, tracker-tool payload schema, or the step-by-step recovery
sequence after a control action. Use
[`../runbook/lane-control-recovery.md`](../runbook/lane-control-recovery.md) for
post-control recovery decisions.
Defines: The lane-control capability matrix, supported and deferred controls, audit
requirements, and policy boundary for inspect, pause/resume, scan, interrupt, steer,
retained retry/resume, and manual-attention controls.
Expand Down
32 changes: 32 additions & 0 deletions plugins/decodex/skills/automation/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ Operate Decodex as the retained-lane control plane for automatic development.
- `docs/spec/lane-control.md` owns CLI/API-first lane-control capabilities, including
inspect, pause/resume, scan, interrupt, steer, retained resume/retry, manual
attention, and deferred controls.
- `docs/runbook/lane-control-recovery.md` owns the post-control decision trees for
agents after interrupt, hard fallback, broad steer, task replacement, or ambiguous
recovery evidence.
- `docs/spec/workflow-file.md` owns `WORKFLOW.md` schema and field semantics.
- `docs/reference/operator-control-plane.md` owns the current status/dashboard field map.

Expand Down Expand Up @@ -111,6 +114,8 @@ terminal automation signal.
## Lane Controls

Read `docs/spec/lane-control.md` before using or explaining operator controls.
Read `docs/runbook/lane-control-recovery.md` before retrying, resuming, relabeling, or
escalating after a control action or ambiguous recovery signal.

Rules for agents:

Expand Down Expand Up @@ -146,6 +151,33 @@ Rules for agents:
owned agent run, use issue-scoped tools for progress, review handoff, manual
attention, and terminal finalization. Outside the owned lane, use documented
CLI/API controls and the labels skill.
- Do not directly kill hidden `_attempt` children or edit runtime DB rows to force a
lane state. Use the supported interrupt, retained retry/resume, recovery, and
manual-attention paths. If an operator had to stop a process for immediate host
safety outside Decodex controls, treat the lane as evidence-ambiguous until
`status`, `diagnose`, `evidence`, and the retained worktree have been inspected.

Post-control decision tree for automation agents:

1. Inspect the current lane and private evidence before deciding whether the control
succeeded, failed, timed out, or fell back to `hard_interrupt_fallback`.
2. If the lane is still active and identity still matches the issue, branch, run id,
attempt, and current turn, let the runtime continue or wait for the control result;
do not requeue or clear labels.
3. If the lane is interrupted, failed, or retained with useful local work, resume only
when the retained worktree, branch, issue, runtime evidence, and PR lineage still
prove the same lane. Use runtime lifecycle entrypoints such as `decodex run
<ISSUE>`; do not restart from a guessed branch.
4. If a queued or relabeled issue should be observed sooner, request a Linear scan with
`POST /api/linear-scan`. Keep or remove queue labels only through the labels skill
or the supported tracker-tool path for the owned issue.
5. If a broad steer materially changes the requested objective, acceptance criteria, or
issue authority, preserve the local control audit and resolve lifecycle explicitly:
update and requeue the issue, create a new lane, or route the owned run to manual
attention. Do not silently hand off a PR whose diff no longer matches the issue.
6. If evidence cannot prove whether to resume, retry, requeue, or discard retained
work, stop automatic recovery and use manual attention with structured public
blockers.

## Boundaries

Expand Down
26 changes: 26 additions & 0 deletions plugins/decodex/skills/manual-cli/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ runtime-owned retained-lane lifecycle.
- `README.md` for the current CLI shape.
- `Makefile.toml` before running repo-native checks.
- `docs/spec/lane-control.md` before using CLI/API lane controls.
- `docs/runbook/lane-control-recovery.md` before deciding what to do after interrupt,
hard fallback, broad steer, task replacement, or ambiguous recovery evidence.
- `docs/reference/operator-control-plane.md` when interpreting `status` or dashboard
fields.
- `docs/runbook/linear-archive-hygiene.md` before archiving old terminal Linear issues.
Expand Down Expand Up @@ -83,6 +85,30 @@ CLI/API lane controls:
- Do not use active-lane UI controls, direct runtime DB edits, raw
`thread/inject_items`, or tracker-state mutations as substitutes for the lane-control
contract.
- Do not kill hidden `_attempt` children to simulate interrupt. Use
`decodex lane interrupt ... --force` or the API `"force": true` only when explicit
operator intent allows hard fallback. If an emergency host-safety stop happens
outside Decodex controls, inspect local evidence and route recovery explicitly before
retrying or cleaning labels.

Post-control CLI recovery:

1. Inspect again with `decodex lane inspect <ISSUE>`, `decodex status --json`, and
`decodex evidence <ISSUE>` when a control request returns, times out, or reports
`hard_interrupt_fallback`.
2. If identity still matches an active lane, wait for the runtime-owned attempt or use
the next supported control. Do not remove `decodex:active:<service-id>` by hand.
3. If the lane is retained and lineage is exact, use the registered workflow path such
as `decodex run <ISSUE>` for retry/resume. If status reports a retained review
handoff mismatch, use `docs/runbook/recover-review-handoff.md`.
4. If the operator changed labels or issue state and wants the scheduler to notice
before the next poll, request `POST /api/linear-scan`; this is a refresh request,
not a retry command.
5. If the new operator text replaces the task or changes acceptance materially, do not
hide that as steer. Resolve the old lane explicitly, then update/requeue the same
issue or create a new issue for the replacement work.
6. If the evidence is ambiguous or useful retained work would be overwritten, route to
manual attention instead of direct Linear label mutation.

Manual commit and landing are separate narrow workflows:

Expand Down