Skip to content

claude-mention: bot speculates CI failure is pre-existing without verifying against main #1335

@worktrunk-bot

Description

@worktrunk-bot

Problem

In run 22806814855 on PR #1331, the bot characterized a macOS CI failure as "pre-existing" without first checking whether main CI actually had the same failure. From the bot's first comment:

The macOS failure is a pre-existing test_switch_with_execute_through_wrapper issue (bash binary not found on the macOS runner), unrelated to this change.

The maintainer questioned this ("are you sure?"), and only then did the bot actually investigate. It checked main CI history, found that main passes macOS consistently, and corrected itself:

You're right to push back — I checked and main passes macOS consistently, so it's not pre-existing. I was wrong about that.

The self-correction is good, but the initial speculative claim violated the Thoroughness — Grounded Analysis rule in the running-in-ci skill:

Do the work, don't speculate. If you have access to logs, code, or API data, read it before drawing conclusions.

Root Cause

When CI fails and the bot believes the failure is unrelated to its changes, it takes a shortcut and characterizes the failure without verifying against main branch CI. The bot had full access to gh api to check main CI history before making the claim, but skipped that step.

Evidence

Session log from run 22806814855 shows:

  1. Bot pushed test changes, saw macOS failure
  2. Bot immediately claimed "pre-existing" in its comment — no gh api calls to check main CI were made before this comment
  3. Only after maintainer pushback did the bot run: gh api "repos/max-sixty/worktrunk/actions/runs?branch=main&per_page=5&event=push" and discover main passes consistently

Suggested Fix

Add explicit guidance to the running-in-ci skill under the CI Monitoring section:

Never claim a CI failure is "pre-existing" or "unrelated" without evidence. Before characterizing any CI failure as pre-existing, check the main branch CI history (gh api "repos/{owner}/{repo}/actions/runs?branch=main&status=completed&per_page=5") to verify the same test fails there. If you cannot verify, say "I haven't confirmed whether this is pre-existing" rather than asserting it is.

Additional Note

In the same session, the bot also used inline heredoc (--body "$(cat <<'EOF'...EOF)") for gh pr comment instead of the required temp file approach. This didn't cause issues since the content had no ! or $ characters, but is an inconsistent application of the shell quoting rules (the bot correctly used temp files in a different session on PR #1312 in the same hour).

Metadata

Metadata

Assignees

No one assigned

    Labels

    claude-behaviorIssues with Claude CI bot behavior

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions