fix(skill): strengthen details-tag and local-test-verification guidance by worktrunk-bot · Pull Request #1447 · max-sixty/worktrunk

worktrunk-bot · 2026-03-11T19:22:13Z

Summary

Two recurring behavioral issues found in hourly review of 14 Claude CI runs (12 successful, 2 cancelled) from the past hour:

1. Long comments posted without `<details>` tags (3 sessions)

The running-in-ci skill says to use collapsible sections for detailed analysis, but the guidance was too vague — it only mentioned "file-by-file breakdowns, code snippets." The bot consistently posts multi-section technical analyses (headers, tables, surveys) without collapsing them.

Evidence:

Run 22967850131 on PR merge: add --no-ff flag for merge commit (semi-linear history) #1438: 2,274-char comment with 3 headers, a table, and multi-paragraph analysis — all inline
Run 22967464601 on issue Allow worktree-path in project config + support config inheritance from parent directories #1413: 75-line survey of 25+ tools organized into 5 pattern categories with tables — all inline

Fix: Added a concrete threshold (~15 lines, multiple headers/tables) and made clear that technical analyses always need collapsible sections.

2. Local test failures characterized as "pre-existing" without verification (3rd recurrence)

This is a recurrence of #1335 and #1384. The grounded-analysis guidance exists at lines 316-320 of the skill, but the bot doesn't apply it when running local tests before pushing — only when monitoring CI after pushing.

Evidence:

Run 22967660638 creating PR feat(switch): make --no-cd print-only in picker mode #1445: Bot ran local cargo test, saw 61 failures in ci_status and switch_pr test categories, and said "These look like CI-related test failures which are likely pre-existing or environment-dependent" — without running gh api to verify main branch CI. The bot then pushed and created the PR anyway.

Fix: Added an explicit "Verifying local test failures before pushing" subsection in the CI Monitoring section, placed right at the decision point where this pattern occurs.

Other sessions reviewed (no issues found)

Run	Type	PR/Issue	Behavior
22968844823	review	#1004	Correctly approved after CI fix commit
22968816869	review	#1446	Clean approval of small change
22968147534	review	#1445	Correctly stayed silent on self-authored PR
22967879945	review	#1442	Correctly stayed silent after tests addressed
22967793926	review	#1443	Correctly stayed silent on self-authored PR
22967450550	review	#1004	Correctly dismissed approval after CI failure
22968542106	mention	#1438	Clean wrap-up comment
22967695706	mention	#1443	Clean inline reply + code fix
22967564028	mention	#1442	Excellent diagnosis of codecov blind spot

Positive patterns: Shell quoting was correct across all 12 sessions (temp files with single-quoted heredocs). Self-authored PR handling was correct (silent, no self-approval). Review thread replies used the correct API endpoint.

Test plan

Verify the skill file is well-formed and the new sections read clearly
Monitor next hourly review for improvement in <details> tag usage

🤖 Generated with Claude Code

…ance Two recurring issues found in hourly review of runs from the past hour: 1. Long comments posted without <details> tags (3 sessions). The existing guidance was too vague — "file-by-file breakdowns, code snippets" didn't cover technical analyses with headers, tables, and surveys. Added a concrete threshold (~15 lines, multiple headers/tables) and explicit examples. 2. Local test failures characterized as "pre-existing" without verification (run 22967660638). This is the 3rd recurrence of the pattern from #1335 and #1384. The grounded-analysis guidance exists but isn't encountered at the decision point (local tests before pushing). Added an inline reminder in the CI Monitoring section with the verification command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

max-sixty · 2026-03-11T22:34:44Z

I think both of those comments are fine — someone asked for a survey, and it replied with a full response. We should be putting supporting details into <details> blocks, but it doesn't mean that no response can be long
yes, good change

…ance Soften the <details> tag rule: add a "when NOT to use" section clarifying that long replies are fine when the content is the primary answer (e.g., a requested survey), not just supporting detail around a short conclusion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

worktrunk-bot · 2026-03-11T22:36:29Z

Good call on #1 — those were direct answers to questions, not boilerplate around a short conclusion. I've added a "when NOT to use <details> tags" section clarifying that long replies are fine when the content is the answer (e.g., a requested survey). Kept the existing threshold guidance for cases where length is supporting detail.

Pushed in b7adfa8.

max-sixty · 2026-03-12T10:01:48Z

just make one cohesive paragraph of when to use details tags

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

worktrunk-bot · 2026-03-12T10:03:46Z

Consolidated into a single paragraph — use <details> for supporting detail around a short conclusion, but keep the full content inline when the length is the answer. Pushed in 7d6a8bd.

…ce (#1447)

worktrunk-bot added the claude-behavior Issues with Claude CI bot behavior label Mar 11, 2026

fix(skill): consolidate details-tag guidance into one paragraph

7d6a8bd

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

max-sixty merged commit 3b0ea57 into main Mar 13, 2026
23 checks passed

max-sixty deleted the hourly/review-22969807526 branch March 13, 2026 13:27

max-sixty pushed a commit that referenced this pull request Mar 13, 2026

fix(skill): strengthen details-tag and local-test-verification guidan…

f564b04

…ce (#1447)

BrewTestBot mentioned this pull request Mar 13, 2026

worktrunk 0.29.2 Homebrew/homebrew-core#272173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skill): strengthen details-tag and local-test-verification guidance#1447

fix(skill): strengthen details-tag and local-test-verification guidance#1447
max-sixty merged 3 commits intomainfrom
hourly/review-22969807526

worktrunk-bot commented Mar 11, 2026

Uh oh!

max-sixty commented Mar 11, 2026

Uh oh!

worktrunk-bot commented Mar 11, 2026

Uh oh!

max-sixty commented Mar 12, 2026

Uh oh!

worktrunk-bot commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

worktrunk-bot commented Mar 11, 2026

Summary

1. Long comments posted without <details> tags (3 sessions)

2. Local test failures characterized as "pre-existing" without verification (3rd recurrence)

Other sessions reviewed (no issues found)

Test plan

Uh oh!

max-sixty commented Mar 11, 2026

Uh oh!

worktrunk-bot commented Mar 11, 2026

Uh oh!

max-sixty commented Mar 12, 2026

Uh oh!

worktrunk-bot commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Long comments posted without `<details>` tags (3 sessions)