Problem
In run 22806814855 on PR #1331, the bot characterized a macOS CI failure as "pre-existing" without first checking whether main CI actually had the same failure. From the bot's first comment:
The macOS failure is a pre-existing test_switch_with_execute_through_wrapper issue (bash binary not found on the macOS runner), unrelated to this change.
The maintainer questioned this ("are you sure?"), and only then did the bot actually investigate. It checked main CI history, found that main passes macOS consistently, and corrected itself:
You're right to push back — I checked and main passes macOS consistently, so it's not pre-existing. I was wrong about that.
The self-correction is good, but the initial speculative claim violated the Thoroughness — Grounded Analysis rule in the running-in-ci skill:
Do the work, don't speculate. If you have access to logs, code, or API data, read it before drawing conclusions.
Root Cause
When CI fails and the bot believes the failure is unrelated to its changes, it takes a shortcut and characterizes the failure without verifying against main branch CI. The bot had full access to gh api to check main CI history before making the claim, but skipped that step.
Evidence
Session log from run 22806814855 shows:
- Bot pushed test changes, saw macOS failure
- Bot immediately claimed "pre-existing" in its comment — no
gh api calls to check main CI were made before this comment
- Only after maintainer pushback did the bot run:
gh api "repos/max-sixty/worktrunk/actions/runs?branch=main&per_page=5&event=push" and discover main passes consistently
Suggested Fix
Add explicit guidance to the running-in-ci skill under the CI Monitoring section:
Never claim a CI failure is "pre-existing" or "unrelated" without evidence. Before characterizing any CI failure as pre-existing, check the main branch CI history (gh api "repos/{owner}/{repo}/actions/runs?branch=main&status=completed&per_page=5") to verify the same test fails there. If you cannot verify, say "I haven't confirmed whether this is pre-existing" rather than asserting it is.
Additional Note
In the same session, the bot also used inline heredoc (--body "$(cat <<'EOF'...EOF)") for gh pr comment instead of the required temp file approach. This didn't cause issues since the content had no ! or $ characters, but is an inconsistent application of the shell quoting rules (the bot correctly used temp files in a different session on PR #1312 in the same hour).
Problem
In run 22806814855 on PR #1331, the bot characterized a macOS CI failure as "pre-existing" without first checking whether main CI actually had the same failure. From the bot's first comment:
The maintainer questioned this ("are you sure?"), and only then did the bot actually investigate. It checked main CI history, found that main passes macOS consistently, and corrected itself:
The self-correction is good, but the initial speculative claim violated the Thoroughness — Grounded Analysis rule in the
running-in-ciskill:Root Cause
When CI fails and the bot believes the failure is unrelated to its changes, it takes a shortcut and characterizes the failure without verifying against main branch CI. The bot had full access to
gh apito check main CI history before making the claim, but skipped that step.Evidence
Session log from run 22806814855 shows:
gh apicalls to check main CI were made before this commentgh api "repos/max-sixty/worktrunk/actions/runs?branch=main&per_page=5&event=push"and discover main passes consistentlySuggested Fix
Add explicit guidance to the
running-in-ciskill under the CI Monitoring section:Additional Note
In the same session, the bot also used inline heredoc (
--body "$(cat <<'EOF'...EOF)") forgh pr commentinstead of the required temp file approach. This didn't cause issues since the content had no!or$characters, but is an inconsistent application of the shell quoting rules (the bot correctly used temp files in a different session on PR #1312 in the same hour).