Problem
In our SWE-bench-verified evaluation, 15 tasks consumed all 30 iterations on exploration (reading files, calling symbol_context, grepping) without ever making a single edit. The agent gets stuck in analysis paralysis.
Data
- 15 tasks with 30 iterations and 0 file edits
- These tasks have tool call patterns like:
symbol_context → Read → Grep → Read → symbol_context → Read → ... (repeating for 30 turns)
- The agent keeps gathering more context but never transitions to the "fix" phase
Root Cause
Two contributing factors:
- No iteration budget awareness: The agent doesn't know it has a 30-iteration limit and doesn't pace itself
- Rich exploration tools encourage over-exploration: When
symbol_context returns detailed context with callers, callees, and related symbols, the agent follows every lead instead of focusing
Impact
These 15 tasks are guaranteed losses. If even half transitioned to editing, that's +3-4 additional resolves.
Recommended Fixes
- Add budget awareness to instructions: "You have a limited number of turns. Spend no more than 10 turns exploring before making your first edit."
- Exploration cap hint: After the agent has made 8-10
symbol_context/Read calls without editing, include a hint in the next response: "Consider making your edit now — you've gathered significant context."
- Progressive brevity: Make tool responses progressively shorter as iteration count increases (if iteration count is available to the server)
- Structured workflow in instructions: "Phase 1 (turns 1-5): Understand the problem. Phase 2 (turns 6-20): Implement the fix. Phase 3 (turns 21-30): Test and refine."
Labels
performance, swe-bench
Problem
In our SWE-bench-verified evaluation, 15 tasks consumed all 30 iterations on exploration (reading files, calling symbol_context, grepping) without ever making a single edit. The agent gets stuck in analysis paralysis.
Data
symbol_context→Read→Grep→Read→symbol_context→Read→ ... (repeating for 30 turns)Root Cause
Two contributing factors:
symbol_contextreturns detailed context with callers, callees, and related symbols, the agent follows every lead instead of focusingImpact
These 15 tasks are guaranteed losses. If even half transitioned to editing, that's +3-4 additional resolves.
Recommended Fixes
symbol_context/Read calls without editing, include a hint in the next response: "Consider making your edit now — you've gathered significant context."Labels
performance, swe-bench