|
| 1 | +tags:: [[MCP]], [[AI/Coding]], [[Claude Code]], [[OpenAI]] |
| 2 | + |
| 3 | +- # MCP Tool Search |
| 4 | + - Both [[Anthropic]] ([[Claude Code]]) and [[OpenAI]] have independently shipped a mechanism for loading [[MCP]] tool definitions **on-demand** rather than injecting all of them into the context window upfront — directly addressing the context bloat problem described in [[MCP/Concept/On-Demand Tool Loading]]. |
| 5 | + - ## The Problem |
| 6 | + - Geoffrey Huntley's [too many model context protocol servers and LLM allocations on the dance floor](https://simonwillison.net/2025/Aug/22/too-many-mcps/) (via [[Person/Simon Willison]]) crystallised the issue: |
| 7 | + - Claude's 200k-token window shrinks to ~176k after system prompts in tools like Amp or Cursor |
| 8 | + - The GitHub MCP server alone consumes ~55,000 of those tokens |
| 9 | + - Multiple MCP servers stack this penalty; LLMs also perform worse when irrelevant information fills the prompt |
| 10 | + - ## Anthropic's Solution: MCP Tool Search (Claude Code) |
| 11 | + - [Scale with MCP Tool Search - Claude Docs](https://code.claude.com/docs/en/mcp#scale-with-mcp-tool-search) |
| 12 | + - Enabled by default in [[Claude Code]] when MCP tool schemas would exceed **10% of the context window** |
| 13 | + - When triggered: |
| 14 | + - MCP tools are **deferred** — not loaded into context upfront |
| 15 | + - Claude uses a built-in `MCPSearch` tool to discover relevant tools when it needs them |
| 16 | + - Only the tools actually needed are loaded into context for that turn |
| 17 | + - Controlled via `ENABLE_TOOL_SEARCH` env var; requires Sonnet 4+ or Opus 4+ |
| 18 | + - See also: [[Claude Code/Q/In what sense are MCP tools loaded on-demand]] |
| 19 | + - ## OpenAI's Solution: Tool Search (GPT-5.4+) |
| 20 | + - [Tool Search - OpenAI API Docs](https://developers.openai.com/api/docs/guides/tools-tool-search) |
| 21 | + - Introduced with `gpt-5.4`; tools marked `"defer_loading": true` are not injected upfront |
| 22 | + - Two modes: |
| 23 | + - **Hosted**: OpenAI's servers handle discovery automatically; model receives a `tool_search_call` then `tool_search_output` |
| 24 | + - **Client-executed**: The application controls discovery logic and returns matching tools via `tool_search_output` |
| 25 | + - Loaded tools are injected at the **end** of the context window, preserving cache across requests |
| 26 | + - Works with functions, namespaces, or MCP servers |
| 27 | + - ## Significance |
| 28 | + - Both implementations converge on the same architecture: tools as a **searchable index** rather than a flat pre-loaded list |
| 29 | + - This makes large MCP inventories (dozens of servers, hundreds of tools) practical without the context tax |
| 30 | + - Likely to become a baseline expectation for AI coding tools and agent frameworks going forward |
0 commit comments