feat(runtime): respond() tool + Slack ground-truth alert suppression by spashii · Pull Request #83 · Dembrane/sam

spashii · 2026-05-25T14:27:27Z

What this solves

The recurring three-message cascade the operator has seen N times on Slack mentions:

msg 1   substantive reply                       ← what operator wanted
msg 2   "apologies for the duplicate notify"    ← false-positive retry
msg 3   "<@op> something's wrong with me"       ← daemon alert

Both layers were inferring "did Sam close the loop" from heuristics — regex over bash command strings (cascade layer 1) and the same inference re-applied to the retry session (cascade layer 2). Three known holes fired the cascade twice on 2026-05-25 in thread 1779688501.139669 (sessions 9843c66e6870 / 6afc1f60a477 / a0ccae594a61 / 71e3df76652b) and a third time on the bug-fix session 716d41decda1.

How

Two commits, each independently revertable.

`respond(text)` tool — `4a3542a`

A structured close-the-loop tool on the main agent. Its call IS the gate signal — the classifier reads respond_called, not the trace. Cleanup work after the call (rm /tmp/x, tail /data/sessions.jsonl, journal writes) is harmless because ordering and substring matching no longer drive control flow.

Auto-remediates the two mechanical mrkdwn rules: ### Heading → *Heading*, standalone --- → blank line. Warns (does not rewrite) on ALL CAPS labels and Sure!/Got it! preambles — polish is OK, don't lock.

Bash to chat.postMessage still works as a fallback for experimental endpoints. The legacy regex classifier remains as the bash-path gate; respond is canonical, not exclusive. Daily-maintenance flags recurring bash patterns that hit daemon/runtime limitations as tier-3 promotion candidates.

Slack ground-truth alert suppression — `84c9f84`

Before posting OPERATOR_ALERT_TEMPLATE, query conversations.replies on the originating thread. If the bot posted (via any path — including ones the regex missed), suppress the alert. The catch for genuine silent failures stays: when Slack confirms zero bot posts, the alert fires as today.

Only affects the alert decision. The silent-exit retry itself still uses the regex classifier — that preserves the ACK-then-work-no-reply catch (sessions like 624e27ec that opened PRs but never posted the result; Slack alone can't distinguish ACK from wrap-up).

Cascade behavior, before vs after

                              BEFORE          AFTER
common case (clean post)     1 message       1 message
bash hits regex hole         3 messages      2 messages
both layers hit a hole       3 messages      2 messages
genuine silent failure       3 messages      2 messages
                                             (the alert is the 2nd,
                                              which is correct here)

Worst case compresses from 3 → 2 permanently.

What does NOT change

Voice rules in slack.md / identity.md — Sam's voice stays in Sam's prose. Only the two mechanical drift cases ( ### headings, --- rules) move to renderer behavior.
PR-comment followup — tracked in Linear separately.
Streaming — dropped from scope (separate question).
ask_operator — unchanged.

Tests

11 new tests in tests/runtime/test_silent_exit.py:

4 invariants for respond_called=True paths (bash chaos around the call is harmless).
2 fallback semantics (bash chat.postMessage still satisfies; helper-script-without-respond still misses — documented limitation).
5 for the alert-suppression helper + branch.

All 181 tests pass locally.

Files

src/capabilities/slack.md              +18 -5    respond section, fallback rule, tier-3 link
src/skills/daily-maintenance/skill.md  +2  -0    tier-3 promotion signal note
src/runtime/adk_runner.py              +170     _clean_for_slack_mrkdwn + _make_respond_tool + registration
src/runtime/prompts.py                 +24 -13   RETRY/SILENT_EXIT direct to `respond`
src/runtime/session.py                 +32 -1    respond case in classifier, respond_called or fallback
src/runtime/daemon.py                  +56 -2    _bot_posted_in_thread_since + suppression
tests/runtime/test_silent_exit.py      +335 -1   11 new tests, 4 helpers

Closes / supersedes

Supersedes the closed #80 (helper-script regex patch) and #82 (prose rule asking the LLM to avoid the brittle classifier). #81 (sentence-case labels) stays merged unchanged.

The silent-exit classifier inferred "Sam closed the loop" by regex on bash command strings + timing against an outward/inward classification. Three holes fired the cascade twice on 2026-05-25 and a third time on the bug-fix session itself: - bash-heredoc journal writes (regex doesn't check the bash tool name) - helper-script posts (literal `chat.postMessage` substring is missing) - cleanup bash after the post (`rm -f`, `tail` count as outward) Each turned a normal reply into three messages: substantive → silent-exit retry → OPERATOR_ALERT "something's wrong with me". Fix: `respond(text)` tool, main agent only. Its call IS the gate signal — classifier reads the call, not the trace. Cleanup work after is harmless. Auto-remediates `### Heading` → `*Heading*` and standalone `---` → blank line. Warns (doesn't rewrite) on ALL CAPS labels and `Sure!/Got it!` preambles — polish is OK, don't lock. Bash to chat.postMessage still works as a fallback for experimental endpoints. The legacy regex classifier remains as the bash-path gate; `respond` is canonical, not exclusive. Daily-maintenance now flags recurring bash patterns that hit daemon/runtime limitations as tier-3 promotion candidates (the same audit shape that surfaced this case). Tests: 4 in test_silent_exit.py pin the operator-invariant (closed_loop=True with respond + bash chaos around it); 2 pin the fallback semantics (bash chat.postMessage still satisfies; helper- script-without-respond still misses — documented limitation).

@op

… posted The three-message cascade today fires whenever both the first session and the retry session trip a classifier hole — even though Sam did post via bash. The daemon was inferring "did Sam close the loop" from its own tool-call trace; the third message ("<@op> something's wrong with me") was a false alarm. Replace the inference with a Slack ground-truth check for the alert decision only: 1. Before posting OPERATOR_ALERT_TEMPLATE, call conversations.replies on the originating thread with `oldest=first.started_at_wall`. 2. If any message with `user=self.bot_user_id` exists, suppress the alert (operator already heard from Sam). 3. If zero bot posts exist, the alert fires as today — the catch for genuine silent failures stays intact. Helper: `_bot_posted_in_thread_since(channel, thread_ts, since_wall)` mirrors `_bot_replied_downstream` but is purpose-built for the alert path (stricter `user == bot_user_id` filter; returns Optional[bool] so API failures fall through to today's behavior). This is only about whether to fire the alert. The silent-exit retry itself still uses the regex classifier — that preserves the ACK-then- work-no-reply catch (sessions like 624e27ec that opened PRs but never posted the result; Slack alone can't distinguish ACK from wrap-up). Together with the `respond` tool (previous commit), worst-case cascade compresses from 3 messages to 2 — and 2 only when Sam used bash both times AND the prompt-level nudge to use `respond` didn't take. Tests: 5 in test_silent_exit.py covering the helper (true/false/None on API failure) and the suppression branch (suppressed when bot posted, fires when genuinely silent).

spashii added 2 commits May 25, 2026 16:08

spashii merged commit bd360f1 into main May 25, 2026
2 checks passed

spashii deleted the sam/respond-tool branch May 25, 2026 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runtime): respond() tool + Slack ground-truth alert suppression#83

feat(runtime): respond() tool + Slack ground-truth alert suppression#83
spashii merged 2 commits into
mainfrom
sam/respond-tool

spashii commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spashii commented May 25, 2026

What this solves

How

respond(text) tool — 4a3542a

Slack ground-truth alert suppression — 84c9f84

Cascade behavior, before vs after

What does NOT change

Tests

Files

Closes / supersedes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`respond(text)` tool — `4a3542a`

Slack ground-truth alert suppression — `84c9f84`