Skip to content

fix: stop model removal from invalidating prompt caches (#304857)#304984

Open
Wolfe1 wants to merge 1 commit intomicrosoft:mainfrom
Wolfe1:fix/model-removal-no-invalidate
Open

fix: stop model removal from invalidating prompt caches (#304857)#304984
Wolfe1 wants to merge 1 commit intomicrosoft:mainfrom
Wolfe1:fix/model-removal-no-invalidate

Conversation

@Wolfe1
Copy link

@Wolfe1 Wolfe1 commented Mar 26, 2026

Summary

This PR addresses the high CPU usage reported in #304857 when multiple extensions register chatAgent / chatSkill prompt files.

Root Cause Analysis

Through investigation, we believe the high CPU stems from a feedback loop in the prompt caching system, specifically involving ModelChangeTracker in PromptsService:

  1. $acceptCustomAgents pushes agents to the extension host, firing onDidChangeCustomAgents on the API
  2. Extensions (e.g., Copilot Chat's ChatCustomAgentsService) respond by calling vscode.workspace.openTextDocument() for each agent URI to parse them
  3. BoundModelReferenceCollection in mainThreadDocuments.ts has a 50-document cap (_maxSize = 50). When 60+ agent/skill documents accumulate, the oldest references are evicted
  4. Model eviction fires IModelService.onModelRemoved, which ModelChangeTracker was treating as a content change signal — firing onDidPromptModelChange
  5. CachedPromise invalidates the agents/skills cache on this signal
  6. Cache invalidation fires onDidChangeCustomAgents again → sends $acceptCustomAgents to the extension host → GOTO 1

The key insight is that model removal (step 4) is a memory management concern, not a content change. When a model is evicted, the underlying file hasn't changed — parseNew() already falls back to fileService.readFile() when no in-memory model exists, and actual file-system changes are separately covered by the file locator events via getFileLocatorEvent().

Fix

ModelChangeTracker's onModelRemoved handler no longer fires onDidPromptModelChange. It only cleans up the per-model content-change listener and removes the entry from its tracking map. This breaks the feedback loop at step 4→5.

What still works:

  • Model content changes (edits while a document is open) still trigger cache invalidation ✅
  • Model addition (opening a new prompt file) still triggers cache invalidation ✅
  • Model language changes still trigger cache invalidation ✅
  • File-system changes (create/delete/rename of prompt files) still trigger cache invalidation via getFileLocatorEvent()

Testing

  • Unit test added: "model removal does not trigger onDidChangeCustomAgents" — creates a real model via ModelService, verifies content edits DO trigger onDidChangeCustomAgents, then verifies destroyModel does NOT
  • Full test suite: All 69 PromptsService tests pass, zero regressions
  • Manual testing: Verified with extensions registering 49+ agents and 17+ skills (the same configuration described in the issue) — CPU usage remains stable, no feedback loop observed

Caveat

While this fix resolves the issue in our testing and we believe the reasoning is sound, we acknowledge this may not be the direction the team would prefer. There may be edge cases we haven't considered where model removal should signal a cache refresh, or the team may prefer to address this at a different layer (e.g., deduplication in _pushCustomAgents, increasing the BoundModelReferenceCollection cap, or debouncing the cache invalidation). We're happy to adjust the approach based on feedback.

Fixes #304857

When BoundModelReferenceCollection evicts models (50-reference cap),
ModelChangeTracker was treating this as a content change, firing
onDidPromptModelChange. This caused a feedback loop:

1. Extensions parse agents via openTextDocument (creating model refs)
2. 50+ refs trigger eviction of oldest models
3. Model removal fired onDidPromptModelChange
4. Cache invalidation triggered onDidChangeCustomAgents
5. Extensions re-parse all agents → GOTO 1

Fix: ModelChangeTracker.onModelRemoved now only cleans up the per-model
content listener without firing a change signal. Model removal is a
memory management concern, not a content change. parseNew() already
falls back to fileService.readFile() when the model is gone, and
file-system changes are covered by getFileLocatorEvent().

Fixes microsoft#304857
@vs-code-engineering vs-code-engineering bot added this to the 1.114.0 milestone Mar 26, 2026
@aeschli
Copy link
Contributor

aeschli commented Mar 26, 2026

@Wolfe1 Thanks, I think its easier if I make the fix (together with the LLM), I understand the problem now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High CPU usage when multiple extensions register many chatAgent/chatSkill prompt files simultaneously

2 participants