From 4a3542a7ed3e86544d5006b8c0ed791c6c9df314 Mon Sep 17 00:00:00 2001
From: Sameer Pashikanti <sameer@dembrane.com>
Date: Mon, 25 May 2026 16:08:34 +0200
Subject: [PATCH 1/2] feat(runtime): respond() tool replaces regex-on-bash
 silent-exit gate
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The silent-exit classifier inferred "Sam closed the loop" by regex on
bash command strings + timing against an outward/inward classification.
Three holes fired the cascade twice on 2026-05-25 and a third time on
the bug-fix session itself:

- bash-heredoc journal writes (regex doesn't check the bash tool name)
- helper-script posts (literal `chat.postMessage` substring is missing)
- cleanup bash after the post (`rm -f`, `tail` count as outward)

Each turned a normal reply into three messages: substantive →
silent-exit retry → OPERATOR_ALERT "something's wrong with me".

Fix: `respond(text)` tool, main agent only. Its call IS the gate
signal — classifier reads the call, not the trace. Cleanup work after
is harmless. Auto-remediates `### Heading` → `*Heading*` and standalone
`---` → blank line. Warns (doesn't rewrite) on ALL CAPS labels and
`Sure!/Got it!` preambles — polish is OK, don't lock.

Bash to chat.postMessage still works as a fallback for experimental
endpoints. The legacy regex classifier remains as the bash-path gate;
`respond` is canonical, not exclusive. Daily-maintenance now flags
recurring bash patterns that hit daemon/runtime limitations as tier-3
promotion candidates (the same audit shape that surfaced this case).

Tests: 4 in test_silent_exit.py pin the operator-invariant
(closed_loop=True with respond + bash chaos around it); 2 pin the
fallback semantics (bash chat.postMessage still satisfies; helper-
script-without-respond still misses — documented limitation).
---
 src/capabilities/slack.md             |  18 ++-
 src/runtime/adk_runner.py             | 170 +++++++++++++++++++++++
 src/runtime/prompts.py                |  36 +++--
 src/runtime/session.py                |  32 ++++-
 src/skills/daily-maintenance/skill.md |   2 +
 tests/runtime/test_silent_exit.py     | 191 ++++++++++++++++++++++++++
 6 files changed, 426 insertions(+), 23 deletions(-)

diff --git a/src/capabilities/slack.md b/src/capabilities/slack.md
index 1b6d7c1..50a3d3c 100644
--- a/src/capabilities/slack.md
+++ b/src/capabilities/slack.md
@@ -120,13 +120,23 @@ If Sam doesn't have substance, Sam stays quiet. Silence is not rude; noise is.
 
 ### One exception — Slack-triggered sessions MUST close the loop
 
-When the session was triggered by a Slack message (not a scheduled wake-up, not a retry), Sam **must** close the loop — meaning a `chat.postMessage` / `chat.update` to the originating thread must come AFTER the last substantive outward-facing tool call in the session. Not just *any* post: an ACK at the start followed by work + silence is NOT a closed loop. The operator gets a quick "got it" and then hears nothing about the result.
+When the session was triggered by a Slack message (not a scheduled wake-up, not a retry), Sam **must** close the loop — the operator should hear the result of the work, not just an ACK at the start.
 
-"Substantive outward-facing" = anything the operator would expect a report for: `gh pr create / edit / merge`, `consult_opus`, `worker` / `parallel_workers`, `edit_file` / `write_file` outside `/data/journal/`, `fetch_url`. Internal reads (`read_file`, `grep`, `glob_files`), journal writes, and Slack housekeeping (`setStatus`, `reactions.add`, `conversations.replies`) don't count — they don't need a corresponding report.
+The canonical way to close the loop is the **`respond(text: str)`** tool. `respond` posts to the originating thread via `chat.postMessage` and is the structured signal that satisfies the silent-exit gate. Calling it once anywhere in the session is enough — ordering doesn't matter, and cleanup work after the call (rm /tmp/x, tail /data/sessions.jsonl, journal writes) is harmless.
 
-The daemon enforces this structurally. After a clean exit, if the session was Slack-triggered and `closed_loop` is False (no post, OR the last post came before more substantive work), the daemon spawns a one-shot retry whose only job is to read the previous session's audit-log slice and post the summary. Exiting silent doesn't save anything — it adds a retry round-trip and surfaces in the logs as a silent-exit recovery, which the operator can see.
+```
+respond(text="*Multi-channel refinements (PR #79)*\n- eliminated hardcoding: ...\n- mention bypass anywhere: ...")
+```
 
-The honest move is: post the summary as the *final* outward action of the session — after the last `gh pr create`, after `consult_opus` returns, after the last source edit. Lead with consequence, cite PR/issue IDs at the end of lines. Don't let the artifacts do the talking — Slack is where the operator lives.
+The text is auto-cleaned for Slack mrkdwn: `### Heading` becomes `*Heading*`, standalone `---` lines become blank lines. ALL CAPS section labels and `Sure!/Got it!` preambles trigger warnings in the logs but aren't rewritten — write in your normal voice.
+
+**Bash-curl to `chat.postMessage` / `chat.update` still works.** Sam can experiment with new Slack endpoints via bash; `respond` is canonical, not exclusive. The fallback rule still applies: if Sam closes the loop via bash, a `chat.postMessage` / `chat.update` must come AFTER the last substantive outward-facing tool call (worker/parallel_workers, fetch_url, consult_opus, non-Slack bash, write/edit_file outside `/data/journal/`). Internal reads and journal writes don't count.
+
+When Sam reaches for a bash pattern that `respond` (or any other existing tool) can't handle, that's a tier-3-promotion signal — daily-maintenance already audits bash command shapes (`src/skills/daily-maintenance/skill.md`, "Aggregation rule") and proposes extractions when the pattern recurs across sessions and isn't already covered. Bash patterns that hit a *daemon/runtime* limitation (silent-exit gate, image pipeline, audit redaction) are the strongest tier-3 candidates and should be surfaced explicitly in the daily broadcast.
+
+After a clean exit, if neither path satisfied the gate, the daemon spawns a one-shot retry whose only job is to read the previous session's audit-log slice and call `respond` with the summary. Exiting silent doesn't save anything — it adds a retry round-trip and surfaces in the logs.
+
+The honest move is: call `respond` as the final substantive action of the session — after the last `gh pr create`, after `consult_opus` returns, after the last source edit. Lead with consequence, cite PR/issue IDs at the end of lines.
 
 ## Threading
 
diff --git a/src/runtime/adk_runner.py b/src/runtime/adk_runner.py
index 54fd17e..3c0f570 100644
--- a/src/runtime/adk_runner.py
+++ b/src/runtime/adk_runner.py
@@ -854,6 +854,161 @@ async def ask_operator(question: str) -> str:
     return ask_operator
 
 
+def _clean_for_slack_mrkdwn(text: str) -> tuple[str, list[str]]:
+    """Auto-remediate the mechanical mrkdwn rules + collect soft warnings.
+
+    Slack doesn't render `###` headings or `---` horizontal rules; both
+    show up as raw text and look like a bug to the operator. We rewrite
+    those deterministically — the LLM keeps the semantic intent (heading
+    or section break) without having to remember the exact mrkdwn syntax.
+
+    Soft drift cases (ALL CAPS labels, "Sure!"/"Got it!" preambles) are
+    collected as warnings but NOT rewritten — Sam's prose may have a
+    reason, and the operator can grep the journal/audit for these later
+    to decide whether to promote to a hard rule. The user's stance is
+    "polish is okay, don't lock" — over-correction destroys voice.
+
+    Returns (cleaned_text, warnings). Warnings are logged at the call
+    site; they're never returned to Slack.
+    """
+    import re
+
+    warnings: list[str] = []
+    out_lines: list[str] = []
+    for raw in text.split("\n"):
+        line = raw
+
+        # Markdown headings → bold line. `### Heading` → `*Heading*`.
+        # Bold preserves the semantic anchor without rendering as raw
+        # text in Slack.
+        m = re.match(r"^(#{1,6})\s+(.+?)\s*$", line)
+        if m:
+            line = f"*{m.group(2)}*"
+
+        # Horizontal rules → blank line. Empty separators read fine in
+        # Slack; `---` reads as raw text and signals "I forgot Slack
+        # mrkdwn syntax".
+        if re.match(r"^\s*(---+|\*\*\*+|___+)\s*$", line):
+            line = ""
+
+        # Soft warnings — not auto-rewritten. Caller logs.
+        if re.search(r"^\s*\*[A-Z][A-Z0-9 _#()/-]{2,}\*\s*$", line):
+            warnings.append(f"ALL CAPS label detected: {line.strip()!r}")
+        for opener in ("Sure!", "Got it!", "Absolutely!", "Of course!"):
+            if line.lstrip().startswith(opener):
+                warnings.append(f"banned preamble {opener!r} at line start")
+                break
+
+        out_lines.append(line)
+
+    return "\n".join(out_lines), warnings
+
+
+def _make_respond_tool(
+    *,
+    channel: Optional[str],
+    thread_ts: Optional[str],
+    event_ts: Optional[str],
+):
+    """Return a coroutine that posts Sam's close-the-loop reply to the
+    originating Slack thread and signals the session's gate as satisfied.
+
+    `respond` is the structured contract that replaces the regex-on-bash
+    silent-exit gate. The classifier in `session.py:_classify_tool_use`
+    treats a `respond` tool call as definitive "Sam closed the loop" —
+    no timing inference, no substring matching, no ordering against
+    outward bash. Cleanup work after `respond` (rm /tmp/x, tail
+    /data/sessions.jsonl) does not trip the gate because the gate reads
+    the call, not the trace.
+
+    Only the MAIN agent gets this tool. Workers/pro_executor/mentor
+    return to their caller, not the operator — same scoping as
+    `ask_operator`.
+
+    Channel/thread_ts come from the IncomingMessage and are bound at
+    closure-time so the LLM signature is just `respond(text: str)`.
+    For scheduled / synthetic / retry sessions without an originating
+    thread, the tool errors gracefully and the LLM falls back to
+    bash-curl (or the daily-maintenance silent-exit allowance).
+
+    Bash to chat.postMessage still works. `respond` is canonical, not
+    exclusive — the operator wants standardization without restriction.
+    The bash fallback path keeps the legacy regex classifier as a
+    best-effort gate; sessions that close via bash get retries only when
+    the bash pattern hits a classifier hole (heredoc, helper script,
+    cleanup-after-post).
+    """
+    async def respond(text: str) -> str:
+        """Post Sam's close-the-loop reply to the originating Slack thread.
+
+        Call this exactly once per Slack-triggered session as Sam's final
+        substantive reply — the wrap-up that the operator should see.
+        ACK / status / mid-session updates remain free-form (bash curl
+        or whatever); `respond` is specifically the "I'm done, here's
+        the result" message that satisfies the silent-exit gate.
+
+        The text is light-cleaned before posting: `### Heading` becomes
+        `*Heading*`, standalone `---` lines become blank. ALL CAPS labels
+        and `Sure!/Got it!` preambles are logged as warnings but kept
+        as-is. Write in your normal voice — terse, sentence-case OK,
+        lowercase OK, lead with the consequence. The Slack formatting
+        rules in `src/capabilities/slack.md` still apply; this tool just
+        handles the two mechanical rules that drift most often.
+
+        text: the Slack message body in mrkdwn. Use `*bold*` for emphasis,
+            `_italic_`, `` `code` ``, `- ` for bullets (flat only — Slack
+            doesn't render nested lists), `<url|label>` for links.
+
+        Returns a sentinel string the LLM can ignore — the work is done.
+        """
+        import aiohttp
+        from .config import SLACK_BOT_TOKEN
+
+        if not channel:
+            return (
+                "(respond unavailable: no originating channel — this "
+                "session has no Slack thread to post into. If this is a "
+                "scheduled/cron session that has nothing worth saying, "
+                "just exit; silence is allowed here. Otherwise post via "
+                "bash-curl to chat.postMessage and accept that the "
+                "silent-exit gate falls back to the regex classifier.)"
+            )
+        if not SLACK_BOT_TOKEN:
+            return "(respond unavailable: SLACK_BOT_TOKEN not set. Fall back to bash-curl.)"
+
+        cleaned, warnings = _clean_for_slack_mrkdwn(text)
+        for w in warnings:
+            log.warning("respond: voice warning — %s", w)
+
+        target_thread_ts = thread_ts or event_ts
+        post_body = {"channel": channel, "text": cleaned}
+        if target_thread_ts:
+            post_body["thread_ts"] = target_thread_ts
+        headers = {
+            "Authorization": f"Bearer {SLACK_BOT_TOKEN}",
+            "Content-Type": "application/json; charset=utf-8",
+        }
+
+        try:
+            async with aiohttp.ClientSession() as http:
+                async with http.post(
+                    "https://slack.com/api/chat.postMessage",
+                    headers=headers,
+                    json=post_body,
+                ) as resp:
+                    post_json = await resp.json()
+            if not post_json.get("ok"):
+                err = post_json.get("error", "unknown")
+                return f"(respond failed at chat.postMessage: {err}. Try bash-curl as fallback.)"
+        except Exception as exc:
+            log.exception("respond HTTP call failed")
+            return f"(respond HTTP error: {type(exc).__name__}: {exc}. Try bash-curl as fallback.)"
+
+        return f"Posted to {channel} (ts={post_json.get('ts')}). Loop closed."
+
+    return respond
+
+
 # ─── ADK runner ───────────────────────────────────────────────────────────────
 
 
@@ -971,6 +1126,20 @@ async def run(self, request: AgentRunRequest) -> AgentRunResult:
             event_ts=request.slack_event_ts,
         )
 
+        # `respond` — the structured close-the-loop tool. Posts via
+        # chat.postMessage to the originating thread; its mere call
+        # satisfies the silent-exit gate (no regex on bash, no ordering
+        # against outward calls). Cleanup work after `respond` is
+        # harmless. Only the main agent gets it — workers must not
+        # post to Slack. Bash-curl to chat.postMessage still works as
+        # a fallback for new/experimental Slack endpoints; `respond`
+        # is canonical, not exclusive.
+        respond_tool = _make_respond_tool(
+            channel=request.slack_channel,
+            thread_ts=request.slack_thread_ts,
+            event_ts=request.slack_event_ts,
+        )
+
         # `consult_opus` — dispatch the mentor (Opus, read-only) to review
         # accumulated context. Sam DOES NOT decide to invoke autonomously —
         # only the `daily-maintenance` skill (review step) and the
@@ -997,6 +1166,7 @@ async def run(self, request: AgentRunRequest) -> AgentRunResult:
                 AgentTool(agent=pro_executor_agent),
                 FunctionTool(func=consult_opus_tool),
                 FunctionTool(func=ask_operator_tool),
+                FunctionTool(func=respond_tool),
             ],
         )
 
diff --git a/src/runtime/prompts.py b/src/runtime/prompts.py
index 7d91ae3..3c0ac64 100644
--- a/src/runtime/prompts.py
+++ b/src/runtime/prompts.py
@@ -27,18 +27,21 @@
     "Do not retry the original task. Your one job in this session is:\n"
     "\n"
     "1. Read the failure context below.\n"
-    "2. Post ONE reply in the original Slack thread (channel={channel}, "
-    "thread_ts={thread_target}) in your normal Slack voice. Name what failed "
-    "in human terms, name the likely cause, and suggest a concrete fix. "
-    "Don't dump stderr verbatim — read it, summarise it.\n"
+    "2. Call `respond(text=...)` ONCE to post your reply to the original "
+    "Slack thread (channel={channel}, thread_ts={thread_target}). Use your "
+    "normal Slack voice. Name what failed in human terms, name the likely "
+    "cause, and suggest a concrete fix. Don't dump stderr verbatim — read "
+    "it, summarise it.\n"
     "3. Then stop. This is a one-shot. The daemon will not retry again."
 )
 
 RETRY_SESSION_OUTRO = (
-    "Remember: post ONCE, in the original thread, in your Slack voice. "
-    "Be honest about the failure. Suggest a fix if you can see one. "
-    "Prefer the synthetic errors above over stderr when identifying the cause. "
-    "Then exit."
+    "Remember: call `respond` ONCE — that closes the loop. Be honest about "
+    "the failure. Suggest a fix if you can see one. Prefer the synthetic "
+    "errors above over stderr when identifying the cause. Then exit. Do "
+    "NOT do bash cleanup afterwards that touches the network or non-/data "
+    "paths — `respond` already satisfies the gate, but a final discipline "
+    "of 'one call, then exit' keeps the audit log clean."
 )
 
 SILENT_EXIT_INTRO = (
@@ -61,12 +64,17 @@
     "not infer from journal text alone, since the previous session is the "
     "exact failure mode that motivated this retry: journal entries claimed "
     "things that never actually happened.\n"
-    "2. **Post ONE reply in the original Slack thread** (channel="
-    "{channel}, thread_ts={thread_target}) summarizing what was done. "
+    "2. **Call `respond(text=...)` ONCE** to post the summary in the "
+    "original Slack thread (channel={channel}, thread_ts={thread_target}). "
     "Consequence-first: what the operator should do with the information, "
     "not a chronological log. PR / issue IDs go at the end of the relevant "
     "lines as references, not as headlines. Match the formatting rules in "
-    "`src/capabilities/slack.md` (bold sentence-case labels, flat lists).\n"
+    "`src/capabilities/slack.md` (bold sentence-case labels, flat lists). "
+    "`respond` is the structured close-the-loop tool — the gate reads its "
+    "call directly, so cleanup work after is harmless. Do NOT post via "
+    "`bash python3 /tmp/x.py` or other helper-script indirection; that's "
+    "the pattern that caused the previous session's silent-exit in the "
+    "first place.\n"
     "3. **Then stop.** This is a one-shot. The daemon will not retry again.\n"
     "\n"
     "What you do NOT do:\n"
@@ -82,9 +90,9 @@
 )
 
 SILENT_EXIT_OUTRO = (
-    "Remember: post ONCE, in the original thread, in your Slack voice. "
-    "Cite PR/issue IDs as references at the end of lines, not as the "
-    "headline. Consequence-first. Then exit."
+    "Remember: call `respond` ONCE — that closes the loop. Cite PR/issue "
+    "IDs as references at the end of lines, not as the headline. "
+    "Consequence-first. Then exit. No cleanup bash afterwards."
 )
 
 CONTINUATION_INTRO = (
diff --git a/src/runtime/session.py b/src/runtime/session.py
index 555e7ba..a74ba03 100644
--- a/src/runtime/session.py
+++ b/src/runtime/session.py
@@ -655,7 +655,7 @@ class SamSession:
 
     DEFAULT_ALLOWED_TOOLS: list[str] = [
         "bash", "read_file", "write_file", "edit_file", "grep", "glob_files",
-        "fetch_url", "worker", "parallel_workers",
+        "fetch_url", "worker", "parallel_workers", "respond",
     ]
 
     def __init__(
@@ -900,6 +900,15 @@ def _classify_tool_use(
                 # for this turn.
                 is_post = True
                 ask_operator_called = True
+            elif name == "respond":
+                # `respond` is the structured close-the-loop tool. The tool
+                # itself posts via chat.postMessage; the call is the
+                # canonical "Sam closed the loop" signal. Counts as a post,
+                # not as outward work — the post IS the work. Cleanup bash
+                # AFTER `respond` (rm /tmp/x, tail /data/sessions.jsonl) no
+                # longer trips the gate because we read the flag below, not
+                # the timing.
+                is_post = True
             elif name == "bash":
                 command = input_dict.get("command") or ""
                 if "chat.postMessage" in command or "chat.update" in command:
@@ -919,10 +928,23 @@ def _classify_tool_use(
                 last_outward_idx = i
             if is_post:
                 last_post_idx = i
-        # Sam closed the loop iff a post happened AND it came after the
-        # last outward-facing call (or no outward-facing work happened
-        # at all — Sam just answered a question without tools).
-        closed_loop = last_post_idx >= 0 and last_post_idx > last_outward_idx
+        # Sam closed the loop iff:
+        #   1. The structured `respond` tool was called at any point. This
+        #      is the canonical path — its mere presence in the record list
+        #      satisfies the gate regardless of any bash chaos around it.
+        #      Cleanup work (rm /tmp/x, tail /data/*) after `respond` is
+        #      fine; ordering doesn't matter because the tool IS the
+        #      contract, not its position.
+        #   2. Fallback for bash-curl posts (Sam experimenting with a new
+        #      Slack endpoint, or just preferring bash): the *timing* rule
+        #      still applies — a `chat.postMessage` / `chat.update` came
+        #      AFTER the last substantive outward-facing tool call. This
+        #      preserves the ACK-then-work-no-reply catch from before
+        #      `respond` existed.
+        respond_called = any(r.name == "respond" for r in records)
+        closed_loop = respond_called or (
+            last_post_idx >= 0 and last_post_idx > last_outward_idx
+        )
         return worker_used, web_used, bash_used, edited_files, closed_loop, ask_operator_called
 
     def _safety_net_journal_entry(self, result: SessionResult) -> None:
diff --git a/src/skills/daily-maintenance/skill.md b/src/skills/daily-maintenance/skill.md
index 13e051b..a001e10 100644
--- a/src/skills/daily-maintenance/skill.md
+++ b/src/skills/daily-maintenance/skill.md
@@ -111,6 +111,8 @@ Do not use a simple raw-frequency threshold (like "repeats 5+ times"). A recurri
 
 **Worked example.** The `cd /data/repos` → `/tmp` pivot (gcsfuse-chmod failures) appeared 17× / 8× across N sessions; no capability documents it; Sam wasted substantial session time retrying and pivoting → propose adding a "cloning Sam's repo" note to `src/capabilities/github.md`.
 
+**Tier-3 promotion signals.** When a recurring bash pattern hits a daemon or runtime *limitation* — i.e. the bash exists because the runtime can't express the pattern as a structured tool — that's a stronger signal than a doc gap. Examples already observed: `bash python3 /tmp/post.py` to wrap Slack posts (silent-exit classifier hole, resolved by the `respond` tool), `bash python3 -c 'open("/data/journal/...")'` for journal writes (also a classifier hole). When you spot this shape in the audit log, surface it explicitly in §5 — `Tier 3 candidate: <shape> — bash workaround for <runtime limitation>` — rather than opening a Tier 1 doc PR for a problem prose can't solve. Tier 3 PRs come from the operator; Sam's job is to spot and name them.
+
 If nothing is worth a code change today, open no self-maintenance PRs.
 
 ## 3. Consult the Opus mentor
diff --git a/tests/runtime/test_silent_exit.py b/tests/runtime/test_silent_exit.py
index 292e34a..a52f6d9 100644
--- a/tests/runtime/test_silent_exit.py
+++ b/tests/runtime/test_silent_exit.py
@@ -70,6 +70,59 @@ def _opus() -> ToolUseRecord:
     return ToolUseRecord(name="consult_opus", input={"request": "review"})
 
 
+def _bash_heredoc_journal(date: str = "2026-05-25") -> ToolUseRecord:
+    """Sam writing a journal entry via `bash python3 -c ...` heredoc.
+
+    Today (2026-05-25) Sam's session `9843c66e6870` wrote its entry to
+    `/data/journal/2026-05-25.md` via a Python heredoc inside a bash
+    call. The classifier's journal-path detection (`_is_journal_path`)
+    is only consulted when the tool name is `write_file` / `edit_file`,
+    so the bash version is mis-classified as outward, tripping the gate.
+    This is hole #1 in today's false-positive cascade.
+    """
+    return _bash(
+        f'python3 -c \'open("/data/journal/{date}.md", "a").write("entry")\''
+    )
+
+
+def _bash_helper_script_post() -> ToolUseRecord:
+    """Sam posting via `python3 /tmp/post_reply.py` — the helper-script post.
+
+    Session `a0ccae594a61` (2026-05-25) wrote a `/tmp/post_reply.py` that
+    internally called `chat.update` and invoked it via bash. The literal
+    bash command string contains neither `chat.postMessage` nor
+    `chat.update`, so the classifier sees it as plain outward bash, not
+    a post. This is hole #2 in today's cascade.
+    """
+    return _bash("python3 /tmp/post_reply.py")
+
+
+def _bash_inspection_tail(path: str = "/data/sessions.jsonl") -> ToolUseRecord:
+    """Sam reading a /data file via bash tail/cat — read-only inspection.
+
+    The operator doesn't expect a report for a `tail` / `cat` of audit
+    logs or the session ledger — these are inward bookkeeping like
+    `read_file` / `grep`. But the classifier today only inspects the
+    bash command for Slack URLs; everything else is outward. Cleanup
+    `tail` after the final post helps trigger the false-positive gate
+    when Sam verifies state post-reply.
+    """
+    return _bash(f"tail -n 10 {path}")
+
+
+def _bash_helper_script_cleanup() -> ToolUseRecord:
+    """Sam removing a one-shot helper script after using it."""
+    return _bash("rm -f /tmp/post_reply.py")
+
+
+def _respond(text: str = "the answer is X") -> ToolUseRecord:
+    """The structured close-the-loop tool call. Its presence in the
+    records list is the canonical "Sam closed the loop" signal — the
+    classifier reads the call, not the ordering or any regex around it.
+    """
+    return ToolUseRecord(name="respond", input={"text": text})
+
+
 def _closed_loop(records):
     return SamSession._classify_tool_use(records)[4]
 
@@ -357,3 +410,141 @@ def test_silent_exit_message_dispatches_only_on_explicit_flag():
     body = msg.to_initial_user_message()
     assert "previous Sam session attempting to respond to a Slack message FAILED" in body
     assert "exited cleanly" not in body
+
+
+# ─── 2026-05-25 cascade — operator-invariant tests pinned to `respond` ──────
+#
+# Origin: the cascade the operator saw twice in thread 1779688501.139669
+# on 2026-05-25 (sessions 9843c66e6870, a0ccae594a61, 716d41decda1),
+# plus the third "<@op> something's wrong with me — i tried to respond
+# and exit 0, then tried to explain what failed and that also exit 0"
+# OPERATOR_ALERT_TEMPLATE message fired on each retry.
+#
+# Three classifier holes were involved:
+#   #1 bash-heredoc journal writes (`python3 -c 'open("/data/journal/..."`)
+#   #2 helper-script posts          (`python3 /tmp/post_reply.py`)
+#   #3 cleanup bash after the post  (`rm -f /tmp/x`, `tail /data/sessions.jsonl`)
+# Each tripped the regex-and-timing classifier into closed_loop=False
+# even though the operator HAD received the reply.
+#
+# The structural fix is the `respond` tool: its call is the canonical
+# "Sam closed the loop" signal. The classifier reads the call, not the
+# tool-call trace. Cleanup, helper scripts, and bash journal writes
+# around it are harmless because ordering and substring matching no
+# longer drive control flow.
+#
+# Each test below records Sam doing the day's failure-pattern AND
+# calling `respond`. The assertion: closed_loop=True regardless of
+# what bash chaos surrounds the structured call. This is the
+# operator-visible invariant.
+
+
+def test_respond_after_bash_heredoc_journal_closes_loop():
+    """Hole #1 — Sam writes the journal via bash heredoc, then calls
+    `respond`. Today's regex classifier would have flipped closed_loop
+    to False (bash heredoc is "outward" by the regex, and the bash post
+    that preceded the journal write was the last "post"). With
+    `respond`, ordering stops mattering: the structured call is the
+    signal.
+
+    Origin: session 9843c66e6870 (2026-05-25, hole #1 in the cascade).
+    """
+    records = [
+        _gh("gh pr create --title foo"),
+        _bash_heredoc_journal(),  # journal write via bash — regex would call this outward
+        _respond("PR opened — see #N"),  # structured close-the-loop
+    ]
+    assert _closed_loop(records) is True
+
+
+def test_respond_replaces_helper_script_post():
+    """Hole #2 — instead of writing `/tmp/post_reply.py` and running it
+    via `python3 /tmp/post_reply.py` (which the regex never sees as a
+    post), Sam calls `respond` directly. The helper-script indirection
+    disappears entirely.
+
+    Origin: session a0ccae594a61 (2026-05-25, hole #2 in the cascade).
+    The `respond` tool *is* the structured replacement for that pattern.
+    """
+    records = [
+        _gh("gh pr create"),
+        _respond("PR opened"),
+    ]
+    assert _closed_loop(records) is True
+
+
+def test_respond_tolerates_cleanup_bash_afterwards():
+    """Hole #3 — Sam calls `respond`, then does cleanup work (rm /tmp/x,
+    tail /data/sessions.jsonl to verify ledger state). Today's classifier
+    flipped closed_loop to False here because every non-Slack bash call
+    counted as outward, pushing `last_outward_idx` past `last_post_idx`.
+    The new gate reads the `respond` call directly — cleanup is harmless.
+
+    Origin: session 716d41decda1 (2026-05-25, Sam's own bug-fix session
+    that ironically fired the cascade AGAIN via `rm -f /tmp/post_reply_final.py`
+    after a `bash python3 /tmp/post_reply_final.py # chat.postMessage` post.
+    """
+    records = [
+        _gh("gh pr create"),
+        _respond("PR opened"),
+        _bash_helper_script_cleanup(),  # rm -f /tmp/post_reply.py
+        _bash_inspection_tail(),  # tail /data/sessions.jsonl
+    ]
+    assert _closed_loop(records) is True
+
+
+def test_retry_session_with_respond_does_close_loop():
+    """End-to-end shape of a silent-exit retry session. The retry's only
+    job is to read the previous session's audit log and post a summary
+    via `respond`. Cleanup bash + bash journal writes around it must
+    not trip the gate, or the daemon fires OPERATOR_ALERT_TEMPLATE for
+    a non-failure — the third "something's wrong with me" message in
+    today's cascade.
+
+    Origin: retry sessions 6afc1f60a477 and 71e3df76652b (2026-05-25,
+    hole #3 — the cascade compounder).
+    """
+    records = [
+        _respond("apologies for the duplicate notify — the previous session..."),
+        _bash_helper_script_cleanup(),
+        _bash_heredoc_journal(),
+        _bash_inspection_tail(),
+    ]
+    assert _closed_loop(records) is True
+
+
+# ─── Bash fallback — `respond` is canonical, not exclusive ────────────────────
+#
+# The operator's "don't lock" stance: bash chat.postMessage / chat.update
+# still works as a fallback for experimental endpoints. The legacy
+# regex classifier remains as the bash-path gate. These tests pin the
+# fallback semantics — bash to Slack is allowed, but its known holes
+# still bite if Sam relies on the bash path.
+
+
+def test_bash_post_via_curl_still_closes_loop():
+    """Sam chooses bash chat.postMessage (no `respond` call). The legacy
+    regex classifier still reads the post — closed_loop=True via the
+    fallback path. `respond` is canonical, not exclusive.
+    """
+    records = [
+        _gh("gh pr create"),
+        _post("opened PR — see #N"),  # bash curl chat.postMessage
+    ]
+    assert _closed_loop(records) is True
+
+
+def test_bash_helper_script_without_respond_does_NOT_close_loop():
+    """Documenting the known limitation: if Sam uses a helper script
+    (`python3 /tmp/post.py`) to post AND does not call `respond`, the
+    legacy regex classifier still misses the post. This was hole #2.
+    The fix is for Sam to use `respond`; the legacy bash path is
+    best-effort and has known holes — daily-maintenance audits these
+    patterns and proposes tier-3 promotion. Pinned here so we notice
+    if the legacy fallback path silently changes shape.
+    """
+    records = [
+        _gh("gh pr create"),
+        _bash_helper_script_post(),  # post hidden inside script — regex misses
+    ]
+    assert _closed_loop(records) is False

From 84c9f844426293f1283fe43b023d6dcfcc557dfd Mon Sep 17 00:00:00 2001
From: Sameer Pashikanti <sameer@dembrane.com>
Date: Mon, 25 May 2026 16:22:09 +0200
Subject: [PATCH 2/2] feat(daemon): suppress OPERATOR_ALERT when Slack confirms
 bot already posted
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The three-message cascade today fires whenever both the first session
and the retry session trip a classifier hole — even though Sam did
post via bash. The daemon was inferring "did Sam close the loop" from
its own tool-call trace; the third message ("<@op> something's wrong
with me") was a false alarm.

Replace the inference with a Slack ground-truth check for the alert
decision only:

1. Before posting OPERATOR_ALERT_TEMPLATE, call conversations.replies
   on the originating thread with `oldest=first.started_at_wall`.
2. If any message with `user=self.bot_user_id` exists, suppress the
   alert (operator already heard from Sam).
3. If zero bot posts exist, the alert fires as today — the catch for
   genuine silent failures stays intact.

Helper: `_bot_posted_in_thread_since(channel, thread_ts, since_wall)`
mirrors `_bot_replied_downstream` but is purpose-built for the alert
path (stricter `user == bot_user_id` filter; returns Optional[bool]
so API failures fall through to today's behavior).

This is only about whether to fire the alert. The silent-exit retry
itself still uses the regex classifier — that preserves the ACK-then-
work-no-reply catch (sessions like 624e27ec that opened PRs but never
posted the result; Slack alone can't distinguish ACK from wrap-up).

Together with the `respond` tool (previous commit), worst-case cascade
compresses from 3 messages to 2 — and 2 only when Sam used bash both
times AND the prompt-level nudge to use `respond` didn't take.

Tests: 5 in test_silent_exit.py covering the helper (true/false/None
on API failure) and the suppression branch (suppressed when bot posted,
fires when genuinely silent).
---
 src/runtime/daemon.py             |  62 +++++++++++-
 tests/runtime/test_silent_exit.py | 150 ++++++++++++++++++++++++++++++
 2 files changed, 211 insertions(+), 1 deletion(-)

diff --git a/src/runtime/daemon.py b/src/runtime/daemon.py
index 7153fa0..de54571 100644
--- a/src/runtime/daemon.py
+++ b/src/runtime/daemon.py
@@ -1481,6 +1481,46 @@ async def _worker(self) -> None:
                 if not message.scheduled:
                     save_cursor(message.event_ts)
 
+    async def _bot_posted_in_thread_since(
+        self,
+        *,
+        channel: str,
+        thread_ts: str,
+        since_wall: float,
+    ) -> Optional[bool]:
+        """Ground-truth check: did this bot post in `thread_ts` since `since_wall`?
+
+        Returns True / False if the Slack API answered, or None if the
+        query itself failed (in which case the caller should default to
+        the conservative behavior — alert as today).
+
+        Used by `_post_operator_alert` to suppress the "something's wrong
+        with me" message when the classifier inferred no closed loop but
+        Slack actually shows Sam posted via some other path (bash curl,
+        helper script the regex missed, etc.). The pre-`respond`-tool
+        cascade fired here three times on 2026-05-25 with Sam having
+        actually posted via bash; the regex called it silent, the alert
+        fired, the operator saw a false alarm.
+        """
+        if not self.bot_user_id:
+            # Can't filter to "this bot's" posts without knowing our own id.
+            return None
+        try:
+            resp = await self.app.client.conversations_replies(
+                channel=channel,
+                ts=thread_ts,
+                oldest=f"{since_wall:.6f}",
+                limit=200,
+            )
+        except Exception:
+            log.exception("conversations.replies failed during alert-suppression check")
+            return None
+        messages = resp.get("messages") or []
+        for m in messages:
+            if m.get("user") == self.bot_user_id:
+                return True
+        return False
+
     async def _post_operator_alert(
         self,
         original: IncomingMessage,
@@ -1491,7 +1531,27 @@ async def _post_operator_alert(
 
         No claude subprocess — this has to work even when claude itself is
         the thing that's broken. Keep the text short and concrete.
+
+        Ground-truth guard: before posting, query Slack for bot posts in
+        the originating thread since the first session started. If the
+        bot DID post (via bash curl, helper script, or any path the
+        classifier missed), suppress the alert — the operator already
+        heard from Sam. Only fire when Slack confirms zero bot posts.
         """
+        target_thread_ts = original.thread_ts or original.event_ts
+        bot_posted = await self._bot_posted_in_thread_since(
+            channel=original.channel,
+            thread_ts=target_thread_ts,
+            since_wall=first.started_at_wall,
+        )
+        if bot_posted is True:
+            log.info(
+                "operator-alert suppressed: bot posted in %s thread since %.3f — "
+                "classifier said silent, Slack disagrees",
+                original.channel, first.started_at_wall,
+            )
+            return
+
         mention = f"<@{SAM_OPERATOR_USER_ID}> " if SAM_OPERATOR_USER_ID else ""
         first_status = (
             "stuck" if first.stuck
@@ -1513,7 +1573,7 @@ async def _post_operator_alert(
         try:
             await self.app.client.chat_postMessage(
                 channel=original.channel,
-                thread_ts=original.thread_ts or original.event_ts,
+                thread_ts=target_thread_ts,
                 text=redact_secrets(text),
             )
         except Exception:
diff --git a/tests/runtime/test_silent_exit.py b/tests/runtime/test_silent_exit.py
index a52f6d9..61ad89a 100644
--- a/tests/runtime/test_silent_exit.py
+++ b/tests/runtime/test_silent_exit.py
@@ -548,3 +548,153 @@ def test_bash_helper_script_without_respond_does_NOT_close_loop():
         _bash_helper_script_post(),  # post hidden inside script — regex misses
     ]
     assert _closed_loop(records) is False
+
+
+# ─── OPERATOR_ALERT suppression — Slack ground-truth check ────────────────────
+#
+# The third "<@op> something's wrong with me" message in today's cascade
+# fired because both the first session AND the retry tripped the regex
+# classifier despite Sam HAVING posted (via bash heredoc / helper script).
+# Daemon's `_post_operator_alert` now queries Slack directly for bot
+# posts in the originating thread since the first session started; if
+# any post exists, the alert is suppressed (operator already heard from
+# Sam). The alert still fires for genuine silent failures (no posts at
+# all in the thread).
+#
+# These tests mock the Daemon's Slack client and exercise the new
+# `_bot_posted_in_thread_since` helper + the suppression branch in
+# `_post_operator_alert`. Pattern mirrors `_mock_daemon_for_recovery`
+# in test_recovery.py.
+
+
+def _mock_daemon_for_alert():
+    """Construct a Daemon without __init__ + wire just what the alert
+    path reads: bot_user_id, app.client. Methods on the class still
+    work because the instance has the class's MRO."""
+    from unittest.mock import AsyncMock, MagicMock
+    from src.runtime.daemon import Daemon
+    d = Daemon.__new__(Daemon)
+    d.bot_user_id = "U0BOT"
+    d.app = MagicMock()
+    d.app.client = AsyncMock()
+    return d
+
+
+def _make_failed_result(closed_loop: bool, *, started_at_wall: float = 100.0) -> SessionResult:
+    """A clean-exit but loop-open SessionResult — the shape that triggers
+    retry-then-alert today. closed_loop is the *classifier's* answer."""
+    return SessionResult(
+        session_id="abc123def456",
+        started_at=0.0,
+        started_at_wall=started_at_wall,
+        ended_at=1.0,
+        exit_code=0,
+        last_output_at=1.0,
+        stuck=False,
+        timed_out=False,
+        closed_loop=closed_loop,
+    )
+
+
+async def test_bot_posted_in_thread_since_true_when_bot_post_exists():
+    """Slack has at least one bot post since `since_wall` → returns True.
+    Used to suppress the operator alert."""
+    from unittest.mock import AsyncMock
+    d = _mock_daemon_for_alert()
+    d.app.client.conversations_replies = AsyncMock(return_value={
+        "messages": [
+            {"ts": "100.000000", "user": "U06USER", "text": "the question"},
+            {"ts": "150.000000", "user": "U0BOT", "text": "Sam's reply via bash"},
+        ],
+    })
+    result = await d._bot_posted_in_thread_since(
+        channel="C1", thread_ts="100.000000", since_wall=99.0,
+    )
+    assert result is True
+
+
+async def test_bot_posted_in_thread_since_false_when_only_user_messages():
+    """No bot posts in the thread → returns False. The alert path will fire."""
+    from unittest.mock import AsyncMock
+    d = _mock_daemon_for_alert()
+    d.app.client.conversations_replies = AsyncMock(return_value={
+        "messages": [
+            {"ts": "100.000000", "user": "U06USER", "text": "the question"},
+        ],
+    })
+    result = await d._bot_posted_in_thread_since(
+        channel="C1", thread_ts="100.000000", since_wall=99.0,
+    )
+    assert result is False
+
+
+async def test_bot_posted_in_thread_since_none_on_api_failure():
+    """API error → returns None so the caller defaults to today's behavior
+    (alert fires). Conservative: better a false alarm than a missed
+    genuine silent failure."""
+    from unittest.mock import AsyncMock
+    d = _mock_daemon_for_alert()
+    d.app.client.conversations_replies = AsyncMock(side_effect=RuntimeError("api boom"))
+    result = await d._bot_posted_in_thread_since(
+        channel="C1", thread_ts="100.000000", since_wall=99.0,
+    )
+    assert result is None
+
+
+async def test_operator_alert_suppressed_when_bot_already_posted():
+    """The headline behavior: classifier said silent, Slack disagrees
+    (bot DID post via some path the regex missed), alert is suppressed.
+
+    This is the structural fix for today's three-message cascade. The
+    first session posted (bash). The retry posted too (bash, also missed
+    by regex). Both reported `closed_loop=False`. Without this guard the
+    daemon would post "something's wrong with me" as a false alarm —
+    the third message in the operator-visible cascade.
+    """
+    from unittest.mock import AsyncMock
+    d = _mock_daemon_for_alert()
+    d.app.client.conversations_replies = AsyncMock(return_value={
+        "messages": [
+            {"ts": "100.000000", "user": "U06USER"},
+            {"ts": "150.000000", "user": "U0BOT", "text": "actual reply"},
+        ],
+    })
+    d.app.client.chat_postMessage = AsyncMock()
+
+    original = IncomingMessage(
+        channel="C1", user="U06USER", text="hi",
+        thread_ts=None, event_ts="100.000000",
+    )
+    first = _make_failed_result(closed_loop=False, started_at_wall=99.0)
+    retry = _make_failed_result(closed_loop=False, started_at_wall=120.0)
+
+    await d._post_operator_alert(original, first, retry)
+    # Alert post must NOT have been called — operator already heard from Sam.
+    d.app.client.chat_postMessage.assert_not_called()
+
+
+async def test_operator_alert_fires_when_genuinely_silent():
+    """Slack confirms zero bot posts → alert fires. This is the catch
+    we want to keep: Sam crashed before saying anything, operator is
+    waiting, the alert is the only signal."""
+    from unittest.mock import AsyncMock
+    d = _mock_daemon_for_alert()
+    d.app.client.conversations_replies = AsyncMock(return_value={
+        "messages": [
+            {"ts": "100.000000", "user": "U06USER"},  # only the user's question
+        ],
+    })
+    d.app.client.chat_postMessage = AsyncMock()
+
+    original = IncomingMessage(
+        channel="C1", user="U06USER", text="hi",
+        thread_ts=None, event_ts="100.000000",
+    )
+    first = _make_failed_result(closed_loop=False, started_at_wall=99.0)
+    retry = _make_failed_result(closed_loop=False, started_at_wall=120.0)
+
+    await d._post_operator_alert(original, first, retry)
+    # Alert post MUST have been called — Sam was genuinely silent.
+    d.app.client.chat_postMessage.assert_called_once()
+    call_kwargs = d.app.client.chat_postMessage.call_args.kwargs
+    assert "something's wrong with me" in call_kwargs.get("text", "")