-
Notifications
You must be signed in to change notification settings - Fork 378
feat(agent-core): detect stalled turns and force text-only recovery #1312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
flame4
wants to merge
2
commits into
MoonshotAI:main
Choose a base branch
from
flame4:feat/progress-detector
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| --- | ||
| "@moonshot-ai/kimi-code": minor | ||
| --- | ||
|
|
||
| Detect stalled turns and force text-only recovery. When the agent emits consecutive tool calls that produce no external progress, the harness clears the available tool list and asks the model to respond in text instead of continuing the loop. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
158 changes: 158 additions & 0 deletions
158
packages/agent-core/src/agent/turn/progress-detector.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,158 @@ | ||
| /** | ||
| * Detects when a turn is spinning without making real progress. | ||
| * | ||
| * Progress is measured by looking at external, observable state rather than | ||
| * interpreting model outputs: | ||
| * | ||
| * - Information gain: successful tool outputs that are non-trivial and have | ||
| * not been seen before in this turn. | ||
| * - External state change: git working tree, background task lifecycle, or | ||
| * other host-provided snapshots. | ||
| * | ||
| * When a configurable number of consecutive steps pass without progress, the | ||
| * detector reports that the turn has stalled. The host can then force the model | ||
| * into text-only mode instead of letting it continue emitting placeholder tool | ||
| * calls. | ||
| */ | ||
|
|
||
| import { createHash } from 'node:crypto'; | ||
|
|
||
| import type { LoopRecordedEvent, LoopToolCallEvent, LoopToolResultEvent } from '../../loop/events'; | ||
|
|
||
| const PROGRESS_TOOLS = new Set(['Edit', 'Write']); | ||
|
|
||
| export interface ProgressSnapshot { | ||
| /** | ||
| * `git status --porcelain` output. Empty when git is unavailable or the tree | ||
| * is clean. Changes when the working tree actually changes. | ||
| */ | ||
| readonly gitStatus: string; | ||
| /** | ||
| * Snapshot of active/terminal background tasks. Changes when tasks are | ||
| * created, complete, fail, or are stopped. | ||
| */ | ||
| readonly backgroundTasks: string; | ||
| } | ||
|
|
||
| export type TakeProgressSnapshot = () => Promise<ProgressSnapshot> | ProgressSnapshot; | ||
|
|
||
| export interface ProgressDetectorOptions { | ||
| /** Called once per step to capture external world state. */ | ||
| readonly takeSnapshot: TakeProgressSnapshot; | ||
| /** | ||
| * Minimum successful output length to count as information gain. | ||
| * Outputs shorter than this are treated as trivial/no-op responses. | ||
| */ | ||
| readonly minInfoGainLength?: number | undefined; | ||
| } | ||
|
|
||
| const DEFAULT_MIN_INFO_GAIN_LENGTH = 60; | ||
|
|
||
| /** | ||
| * Tracks whether a turn is still advancing. | ||
| * | ||
| * The detector is intentionally stateful per-turn: it accumulates seen output | ||
| * hashes and the last external snapshot, and reports how many consecutive steps | ||
| * have passed without any progress signal. | ||
| */ | ||
| export class ProgressDetector { | ||
| private readonly takeSnapshot: TakeProgressSnapshot; | ||
| private readonly minInfoGainLength: number; | ||
| private readonly seenOutputHashes = new Set<string>(); | ||
| private previousSnapshot?: ProgressSnapshot; | ||
| private currentStepEvents: LoopRecordedEvent[] = []; | ||
| private readonly toolCallNames = new Map<string, string>(); | ||
| private lastProgressStep = 0; | ||
|
|
||
| constructor(options: ProgressDetectorOptions) { | ||
| this.takeSnapshot = options.takeSnapshot; | ||
| this.minInfoGainLength = options.minInfoGainLength ?? DEFAULT_MIN_INFO_GAIN_LENGTH; | ||
| } | ||
|
|
||
| /** Called for every recorded loop event so the detector can observe results. */ | ||
| onLoopEvent(event: LoopRecordedEvent): void { | ||
| this.currentStepEvents.push(event); | ||
| if (event.type === 'tool.call') { | ||
| const call = event as LoopToolCallEvent; | ||
| this.toolCallNames.set(call.toolCallId, call.name); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Evaluates the events collected since the last call and reports whether this | ||
| * step made progress. Resets the per-step event buffer. | ||
| */ | ||
| async recordStep(stepNumber: number): Promise<boolean> { | ||
| const snapshot = await this.takeSnapshot(); | ||
| const stateChanged = this.hasExternalStateChanged(snapshot); | ||
| this.previousSnapshot = snapshot; | ||
|
|
||
| const infoGained = this.hasInformationGain(); | ||
| this.currentStepEvents = []; | ||
|
|
||
| const progress = stateChanged || infoGained; | ||
| if (progress) { | ||
| this.lastProgressStep = stepNumber; | ||
| } | ||
| return progress; | ||
| } | ||
|
|
||
| /** Number of consecutive steps since the last progress signal. */ | ||
| stepsSinceLastProgress(currentStep: number): number { | ||
| return currentStep - this.lastProgressStep; | ||
| } | ||
|
|
||
| private hasExternalStateChanged(current: ProgressSnapshot): boolean { | ||
| if (this.previousSnapshot === undefined) { | ||
| return false; // First step has no previous snapshot to compare against. | ||
| } | ||
| return ( | ||
| this.previousSnapshot.gitStatus !== current.gitStatus || | ||
| this.previousSnapshot.backgroundTasks !== current.backgroundTasks | ||
| ); | ||
| } | ||
|
|
||
| private hasInformationGain(): boolean { | ||
| for (const event of this.currentStepEvents) { | ||
| if (event.type !== 'tool.result') { | ||
| continue; | ||
| } | ||
| const resultEvent = event as LoopToolResultEvent; | ||
| const result = resultEvent.result; | ||
| if (result.isError === true) { | ||
| continue; | ||
| } | ||
| // Successful writes/edits are real progress even when their output is | ||
| // short, because they change file contents. git status --porcelain does | ||
| // not capture repeated edits to an already-dirty file. | ||
| const toolName = this.toolCallNames.get(resultEvent.toolCallId); | ||
| if (toolName !== undefined && PROGRESS_TOOLS.has(toolName)) { | ||
| return true; | ||
| } | ||
| const text = extractOutputText(result.output); | ||
| if (text.length < this.minInfoGainLength) { | ||
| continue; | ||
| } | ||
| const hash = hashString(text); | ||
| if (!this.seenOutputHashes.has(hash)) { | ||
| this.seenOutputHashes.add(hash); | ||
| return true; | ||
| } | ||
| } | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| function extractOutputText(output: string | readonly { readonly type: string; readonly text?: string }[]): string { | ||
| if (typeof output === 'string') { | ||
| return output; | ||
| } | ||
| return output | ||
| .filter((part): part is { readonly type: string; readonly text: string } => part.type === 'text' && typeof part.text === 'string') | ||
| .map((part) => part.text) | ||
| .join(''); | ||
| } | ||
|
|
||
| function hashString(value: string): string { | ||
| return createHash('sha256').update(value, 'utf8').digest('hex'); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a turn keeps editing a file that is already modified or untracked,
git status --porcelainstays identical (for example,M src/foo.ts) even though the file contents changed; Edit/Write successes also often return short outputs below the 60-character information-gain threshold. In that common single-file refactor case, eight real edits can be classified as stalled and the next step is forced into text-only mode, preventing the agent from making further needed changes. Please include a content-sensitive signal (e.g. diff/hash/mtime for dirty paths) or otherwise count successful write/edit tool results as progress.Useful? React with 👍 / 👎.