feat(runtime): respond() tool + Slack ground-truth alert suppression#83
Merged
Conversation
The silent-exit classifier inferred "Sam closed the loop" by regex on bash command strings + timing against an outward/inward classification. Three holes fired the cascade twice on 2026-05-25 and a third time on the bug-fix session itself: - bash-heredoc journal writes (regex doesn't check the bash tool name) - helper-script posts (literal `chat.postMessage` substring is missing) - cleanup bash after the post (`rm -f`, `tail` count as outward) Each turned a normal reply into three messages: substantive → silent-exit retry → OPERATOR_ALERT "something's wrong with me". Fix: `respond(text)` tool, main agent only. Its call IS the gate signal — classifier reads the call, not the trace. Cleanup work after is harmless. Auto-remediates `### Heading` → `*Heading*` and standalone `---` → blank line. Warns (doesn't rewrite) on ALL CAPS labels and `Sure!/Got it!` preambles — polish is OK, don't lock. Bash to chat.postMessage still works as a fallback for experimental endpoints. The legacy regex classifier remains as the bash-path gate; `respond` is canonical, not exclusive. Daily-maintenance now flags recurring bash patterns that hit daemon/runtime limitations as tier-3 promotion candidates (the same audit shape that surfaced this case). Tests: 4 in test_silent_exit.py pin the operator-invariant (closed_loop=True with respond + bash chaos around it); 2 pin the fallback semantics (bash chat.postMessage still satisfies; helper- script-without-respond still misses — documented limitation).
… posted
The three-message cascade today fires whenever both the first session
and the retry session trip a classifier hole — even though Sam did
post via bash. The daemon was inferring "did Sam close the loop" from
its own tool-call trace; the third message ("<@op> something's wrong
with me") was a false alarm.
Replace the inference with a Slack ground-truth check for the alert
decision only:
1. Before posting OPERATOR_ALERT_TEMPLATE, call conversations.replies
on the originating thread with `oldest=first.started_at_wall`.
2. If any message with `user=self.bot_user_id` exists, suppress the
alert (operator already heard from Sam).
3. If zero bot posts exist, the alert fires as today — the catch for
genuine silent failures stays intact.
Helper: `_bot_posted_in_thread_since(channel, thread_ts, since_wall)`
mirrors `_bot_replied_downstream` but is purpose-built for the alert
path (stricter `user == bot_user_id` filter; returns Optional[bool]
so API failures fall through to today's behavior).
This is only about whether to fire the alert. The silent-exit retry
itself still uses the regex classifier — that preserves the ACK-then-
work-no-reply catch (sessions like 624e27ec that opened PRs but never
posted the result; Slack alone can't distinguish ACK from wrap-up).
Together with the `respond` tool (previous commit), worst-case cascade
compresses from 3 messages to 2 — and 2 only when Sam used bash both
times AND the prompt-level nudge to use `respond` didn't take.
Tests: 5 in test_silent_exit.py covering the helper (true/false/None
on API failure) and the suppression branch (suppressed when bot posted,
fires when genuinely silent).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this solves
The recurring three-message cascade the operator has seen N times on Slack mentions:
Both layers were inferring "did Sam close the loop" from heuristics — regex over bash command strings (cascade layer 1) and the same inference re-applied to the retry session (cascade layer 2). Three known holes fired the cascade twice on 2026-05-25 in thread
1779688501.139669(sessions9843c66e6870/6afc1f60a477/a0ccae594a61/71e3df76652b) and a third time on the bug-fix session716d41decda1.How
Two commits, each independently revertable.
respond(text)tool —4a3542aA structured close-the-loop tool on the main agent. Its call IS the gate signal — the classifier reads
respond_called, not the trace. Cleanup work after the call (rm /tmp/x, tail /data/sessions.jsonl, journal writes) is harmless because ordering and substring matching no longer drive control flow.Auto-remediates the two mechanical mrkdwn rules:
### Heading→*Heading*, standalone---→ blank line. Warns (does not rewrite) on ALL CAPS labels andSure!/Got it!preambles — polish is OK, don't lock.Bash to
chat.postMessagestill works as a fallback for experimental endpoints. The legacy regex classifier remains as the bash-path gate;respondis canonical, not exclusive. Daily-maintenance flags recurring bash patterns that hit daemon/runtime limitations as tier-3 promotion candidates.Slack ground-truth alert suppression —
84c9f84Before posting
OPERATOR_ALERT_TEMPLATE, queryconversations.replieson the originating thread. If the bot posted (via any path — including ones the regex missed), suppress the alert. The catch for genuine silent failures stays: when Slack confirms zero bot posts, the alert fires as today.Only affects the alert decision. The silent-exit retry itself still uses the regex classifier — that preserves the ACK-then-work-no-reply catch (sessions like
624e27ecthat opened PRs but never posted the result; Slack alone can't distinguish ACK from wrap-up).Cascade behavior, before vs after
Worst case compresses from 3 → 2 permanently.
What does NOT change
slack.md/identity.md— Sam's voice stays in Sam's prose. Only the two mechanical drift cases (###headings,---rules) move to renderer behavior.ask_operator— unchanged.Tests
11 new tests in
tests/runtime/test_silent_exit.py:respond_called=Truepaths (bash chaos around the call is harmless).chat.postMessagestill satisfies; helper-script-without-respond still misses — documented limitation).All 181 tests pass locally.
Files
Closes / supersedes
Supersedes the closed
#80(helper-script regex patch) and#82(prose rule asking the LLM to avoid the brittle classifier).#81(sentence-case labels) stays merged unchanged.