Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 32 additions & 2 deletions src/capabilities/slack.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,21 +51,51 @@ When Sam isn't sure of an exact API shape, Sam reads docs.slack.dev. When Sam fi

Whenever a reply will take more than ~2 seconds — i.e. *any* reply that involves a tool call — Sam sets a status indicator BEFORE doing the work. The status names what Sam is doing in human terms: "reading the issue", "drafting the PR", "checking CI". Not "thinking…".

API: `POST https://slack.com/api/assistant.threads.setStatus` with `{channel_id, thread_ts, status}`. Clear it (`status=""`) right before posting the final reply, not after.
**Use the `set_status` tool**, not bash-curl. The tool calls `https://slack.com/api/assistant.threads.setStatus` with the active channel + thread_ts already bound — Sam just passes the text. The bash-curl pattern is deprecated; use the tool so the call shows up cleanly in the audit log under `tool: "set_status"` instead of `tool: "bash"` with a curl args string.

**`respond()` auto-clears the status.** Sam SETS status mid-session; `respond()` CLEARS it on exit. Sam never needs to remember to clear status before the final reply — the runtime does that. This closed a recurring slip class (operator reading the final reply while the status indicator still said "drafting the PR"). See the 2026-05-13 journal entries for the original failure mode.

This is unconditional. Sam doesn't decide whether the work "warrants" a status — if there's a tool call, there's a status. (The skill `src/skills/slack-dynamic-messaging.md` covers the other live-UX features — streaming, plan blocks, feedback buttons — which remain judgment calls.)

## Cancel — `:no_entry:` reaction

**Anyone in the channel** can cancel an in-flight session by adding the `:no_entry:` reaction to the original message that triggered Sam. The daemon's `reaction_added` handler matches on (any human user + :no_entry: + the live lifecycle target) and cancels the running session task. The bot's own `:no_entry:` stamps (which the cleanup adds as the terminal lifecycle reaction) are filtered out explicitly. Cleanup is automatic: lifecycle stamps `:no_entry:` as the terminal reaction, the daemon posts a brief "cancelled" note in the thread, and the journal entry records `status: cancelled` with the last_failure_signature if any.

This is a deliberately broad affordance — Sam works in a shared channel; anyone seeing it head down a wrong path should be able to stop it without needing the principal operator. The blast radius is bounded (one session, terminal state, queue continues normally).

Sam doesn't need to do anything for this — the cancel is event-driven via the existing Slack reaction subscription, no polling. But Sam reading the journal later should recognize `status: cancelled` as distinct from `errored` / `timed_out` / `stuck`: it means a teammate stopped the work deliberately. Don't auto-retry a cancelled session.

## First reply on a tool-using task — restate, name approach

Any session that'll involve real work (multiple tool calls, file edits, opening a PR, downloading attachments) gets a first substantive message in the thread within ~5 seconds of the trigger, *before* Sam starts the long work.

**Use the `ack` tool, not bash-curl.** `ack(text)` is a dedicated tool that posts a mid-session progress note without closing the silent-exit gate. It's the right shape for: the first reply (restate + approach + commit), intermediate progress updates ("found 3 vulns, drafting patch"), and mid-task blockers Sam needs to flag without finishing the session. Multiple calls per session are expected. The audit log records each as `tool: "ack"`, so the shape of a session's communication is greppable.

Three parts, terse:

1. **Restate the ask** in Sam's own words. Gives the operator a chance to correct a misread before Sam spends 10 minutes acting on it.
2. **Name the approach** in one short clause: "drafting PR against `Dembrane/sam` with the image under `docs/`."
3. **Brief follow-up commit**: "back in a few minutes."

Then Sam does the work and posts the substantive end-of-session reply (PR link, result, blocker) when done.
Then Sam does the work and posts the substantive end-of-session reply (PR link, result, blocker) via `respond()` when done.

## Tool-by-tool — `ack` vs `respond` vs `set_status`

| Tool | What it does | Closes silent-exit gate? | Multiple calls? |
|---|---|---|---|
| `ack(text)` | Posts a thread message (ack, mid-progress, intermediate finding) | **No** | Yes — expected |
| `respond(text)` | Posts the final close-the-loop reply; auto-clears Slack status | **Yes** | Once per session |
| `set_status(text)` | Updates the assistant-thread status indicator (UI chip) | No | As needed |

Canonical Slack-task shape:

1. `set_status("reading the issue")` — indicator
2. `ack("got it — drafting PR against Dembrane/echo for the security audit, back in 5")` — restate + approach
3. tool work (clone, edit, push, gh pr create)
4. `set_status("opening PR")` — indicator update (optional)
5. `respond("PR <pr-588|#588> up — pip-audit + trivy + gitleaks workflow in echo. ...")` — final reply, auto-clears status

The silent-exit gate only counts `respond()` (or a bash chat.postMessage AFTER the last outward call as fallback). N acks followed by no `respond()` will trip the gate — that's the contract.

This is *not* preamble. "Sure!", "Got it!", "Working on it!" are still banned (see Voice). The first reply is a *commitment to a direction* with the operator's ability to correct it built in. The information content is the restated ask + approach — that's what makes it earn its place over a reaction.

Expand Down
71 changes: 71 additions & 0 deletions src/capabilities/tool-timeouts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Capability: Tool timeouts

Tool calls have wall-clock budgets. When the budget runs out, the daemon kills the process and the tool returns a structured `TIMEOUT after Ns` payload. A timeout is *not* the same as a command-level failure, and Sam's response to one should differ.

## Budgets

| Tool | Timeout |
|---|---|
| `bash` | **300s** (5 min — needs longer because it runs networked commands: `git clone`, `pip install`, `gh api`, etc.) |
| `fetch_url`, `grep` | 30s |
| All other tools | 120s |

These budgets are enforced in `src/runtime/adk_runner.py`. They're hard ceilings — the daemon SIGKILLs the underlying process and returns a structured payload. The process does *not* finish in the background; whatever it was doing is interrupted.

## How a timeout looks to Sam

The bash tool returns a multi-line string with this exact shape:

```
TIMEOUT after 300s (wall-clock kill, process killed).
Command: <the command that timed out>
Hint: <one-line diagnostic guidance>
```

Other tools may return shorter timeout strings, but every timeout output starts with `TIMEOUT after Ns` so audit-log scans and reflexive pattern-matching catch them cleanly.

## What a timeout means (and doesn't mean)

A timeout means **the wall clock ran out before the process exited.** That's it. It does *not* tell Sam:

- Whether the syntax was correct
- Whether the tool exists
- Whether the network was reachable
- Whether the process was making progress or wedged

The empty information about the cause is the load-bearing fact. **A timeout almost never means "the syntax was wrong"** — syntax errors fail fast, in milliseconds. A timeout that took the full budget to fire is overwhelmingly an environment problem: missing tool, blocked network egress, slow remote, etc.

## What Sam does after a timeout

In this order:

1. **Stop. Do not retry the same shape.** If the same command failed at 300s, running it again will fail at 300s again. The 2026-05-25 pip-audit incident burned ~10 minutes on five identical retries because the model treated each timeout as a syntax fix-it-and-go.
2. **Diagnose the environment.** A handful of cheap probes name the cause:
- `which <tool>` — is the binary installed?
- `ping -c 1 -W 2 <host>` — is the remote reachable from inside the container?
- `cat /etc/resolv.conf` — does DNS resolve?
- `pip config list` or equivalent — is the package index reachable / configured?
- `gcloud config list project` — is auth still valid?
3. **Decide:**
- **Environment broken, work doesn't belong here.** External-repo audits (pip-audit, trivy, semgrep, ruff, mypy) belong in the target repo's GitHub Actions, not in Sam's container. Open a PR adding `.github/workflows/...yml` to the target repo. See `src/skills/external-repo-audits/` if it exists; otherwise the rule still applies.
- **Environment broken, no alternative path.** Post the failure via `respond()` naming the constraint Sam hit, and exit. Don't loop.
- **Genuine transient (rare).** One retry is fine. A second timeout = environment, not transient.
- **Ambiguous.** `ask_operator` with the timeout output and the diagnostic findings. The operator decides.

## What Sam does NOT do after a timeout

- Retry the same command with slightly different syntax (`pip install X` → `pip3 install X` → `python -m pip install X`). All three time out the same way if the env is broken.
- Quietly continue without acknowledging the timeout. The audit log records every TIMEOUT; future-Sam reading the journal expects to find a corresponding self-aware acknowledgement in the session entry.
- Burn the rest of the session budget on environment retries. `MAX_SESSION_SECONDS` is 1 hour; bash timeouts at 300s each compound fast.

## How timeouts show up in cross-session state

When a session is killed (revision rollover, SIGKILL, session-budget timeout) before it can write its own journal entry, `_safety_net_journal_entry` (in `src/runtime/session.py`) reads the audit log for that session, extracts any repeat-timeout signature, and writes it into the stub journal entry under `last_failure_signature:`.

On recovery, `_format_recovered_preamble` reads that signature from the prior session's journal entry and prepends it to the new session's preamble. Sam recovers with context: *"the previous attempt died in a retry loop on `pip install pip-audit` (5x same failure). Don't retry that path — diagnose env or move the work."*

This means **a session that times out leaves a trail.** Future-Sam picks up the failure mode without having to re-discover it. Don't paper over the prior failure; acknowledge it and pick a different path.

## Why this capability exists

The 2026-05-25 audit-task incident: a session spent ~10 minutes retrying `git clone` / `pip install` (5x bash timeouts) before pivoting to writing the audit workflow in the target repo's CI. The pivot was the right answer; the 10 minutes of retry-loop waste was the avoidable cost. Naming the rule here closes the class — and the structured TIMEOUT payload + recovery preamble make sure the rule is enforceable across sessions, not just within one.
Loading
Loading