Skip to content

fix(skill): strengthen details-tag and local-test-verification guidance#1447

Merged
max-sixty merged 3 commits intomainfrom
hourly/review-22969807526
Mar 13, 2026
Merged

fix(skill): strengthen details-tag and local-test-verification guidance#1447
max-sixty merged 3 commits intomainfrom
hourly/review-22969807526

Conversation

@worktrunk-bot
Copy link
Copy Markdown
Collaborator

Summary

Two recurring behavioral issues found in hourly review of 14 Claude CI runs (12 successful, 2 cancelled) from the past hour:

1. Long comments posted without <details> tags (3 sessions)

The running-in-ci skill says to use collapsible sections for detailed analysis, but the guidance was too vague — it only mentioned "file-by-file breakdowns, code snippets." The bot consistently posts multi-section technical analyses (headers, tables, surveys) without collapsing them.

Evidence:

Fix: Added a concrete threshold (~15 lines, multiple headers/tables) and made clear that technical analyses always need collapsible sections.

2. Local test failures characterized as "pre-existing" without verification (3rd recurrence)

This is a recurrence of #1335 and #1384. The grounded-analysis guidance exists at lines 316-320 of the skill, but the bot doesn't apply it when running local tests before pushing — only when monitoring CI after pushing.

Evidence:

  • Run 22967660638 creating PR feat(switch): make --no-cd print-only in picker mode #1445: Bot ran local cargo test, saw 61 failures in ci_status and switch_pr test categories, and said "These look like CI-related test failures which are likely pre-existing or environment-dependent" — without running gh api to verify main branch CI. The bot then pushed and created the PR anyway.

Fix: Added an explicit "Verifying local test failures before pushing" subsection in the CI Monitoring section, placed right at the decision point where this pattern occurs.

Other sessions reviewed (no issues found)

Run Type PR/Issue Behavior
22968844823 review #1004 Correctly approved after CI fix commit
22968816869 review #1446 Clean approval of small change
22968147534 review #1445 Correctly stayed silent on self-authored PR
22967879945 review #1442 Correctly stayed silent after tests addressed
22967793926 review #1443 Correctly stayed silent on self-authored PR
22967450550 review #1004 Correctly dismissed approval after CI failure
22968542106 mention #1438 Clean wrap-up comment
22967695706 mention #1443 Clean inline reply + code fix
22967564028 mention #1442 Excellent diagnosis of codecov blind spot

Positive patterns: Shell quoting was correct across all 12 sessions (temp files with single-quoted heredocs). Self-authored PR handling was correct (silent, no self-approval). Review thread replies used the correct API endpoint.

Test plan

  • Verify the skill file is well-formed and the new sections read clearly
  • Monitor next hourly review for improvement in <details> tag usage

🤖 Generated with Claude Code

…ance

Two recurring issues found in hourly review of runs from the past hour:

1. Long comments posted without <details> tags (3 sessions). The existing
   guidance was too vague — "file-by-file breakdowns, code snippets" didn't
   cover technical analyses with headers, tables, and surveys. Added a concrete
   threshold (~15 lines, multiple headers/tables) and explicit examples.

2. Local test failures characterized as "pre-existing" without verification
   (run 22967660638). This is the 3rd recurrence of the pattern from #1335
   and #1384. The grounded-analysis guidance exists but isn't encountered at
   the decision point (local tests before pushing). Added an inline reminder
   in the CI Monitoring section with the verification command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@worktrunk-bot worktrunk-bot added the claude-behavior Issues with Claude CI bot behavior label Mar 11, 2026
@max-sixty
Copy link
Copy Markdown
Owner

  1. I think both of those comments are fine — someone asked for a survey, and it replied with a full response. We should be putting supporting details into <details> blocks, but it doesn't mean that no response can be long

  2. yes, good change

…ance

Soften the <details> tag rule: add a "when NOT to use" section clarifying
that long replies are fine when the content is the primary answer (e.g.,
a requested survey), not just supporting detail around a short conclusion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@worktrunk-bot
Copy link
Copy Markdown
Collaborator Author

Good call on #1 — those were direct answers to questions, not boilerplate around a short conclusion. I've added a "when NOT to use <details> tags" section clarifying that long replies are fine when the content is the answer (e.g., a requested survey). Kept the existing threshold guidance for cases where length is supporting detail.

Pushed in b7adfa8.

@max-sixty
Copy link
Copy Markdown
Owner

just make one cohesive paragraph of when to use details tags

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@worktrunk-bot
Copy link
Copy Markdown
Collaborator Author

Consolidated into a single paragraph — use <details> for supporting detail around a short conclusion, but keep the full content inline when the length is the answer. Pushed in 7d6a8bd.

@max-sixty max-sixty merged commit 3b0ea57 into main Mar 13, 2026
23 checks passed
@max-sixty max-sixty deleted the hourly/review-22969807526 branch March 13, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-behavior Issues with Claude CI bot behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants