-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat(client): SEP-2549 — honor cacheHints (ttlMs/scope) on the response-cache substrate #2340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v2-2026-07-28
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| --- | ||
| '@modelcontextprotocol/client': minor | ||
| --- | ||
|
|
||
| `Client` now **honours** the server-stamped SEP-2549 `ttlMs`/`cacheScope` cache hints on the cacheable verbs (`listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()`, `readResource()`): a still-fresh held entry is served without a round trip. New `CacheableRequestOptions.cacheMode` (`'use'` — the default; `'refresh'` — always fetch and re-store; `'bypass'` — fetch without consulting or writing the cache) gives per-call control. The behaviour is opt-in by hint: a server that sends `ttlMs: 0` (the conservative default this SDK's server stamps) sees byte-identical behaviour — every call fetches. | ||
|
|
||
| Entries are automatically scoped by connected-server identity (derived from `serverInfo` after connect, encoded collision-free via `JSON.stringify`); `ClientOptions.cachePartition` is the opaque per-principal slot for `'private'`-scoped entries — set it to your principal identifier (e.g. the auth subject) when one `responseCacheStore` backs several principals. With the default `''` every entry lives at the connected server's shared partition (the safe single-tenant posture). `ClientOptions.defaultCacheTtlMs` (default `0`) supplies the TTL when a result lacks one (e.g. a legacy-era response); the server-supplied `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`). The list verbs always store the aggregate (so `callTool`'s mirroring/output-validation index keeps working at any TTL); `readResource` stores only when the resolved TTL is positive. `notifications/resources/updated` evicts the cached `resources/read` body for that URI. `ResponseCacheStore` gained `delete(key)`; `InMemoryResponseCacheStore` is now bounded (`{ maxEntries }`, default 512, oldest-first eviction). New exports: `CacheMode`, `CacheableRequestOptions`, `InMemoryResponseCacheStoreOptions`, `MAX_CACHE_TTL_MS`. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| --- | ||
| '@modelcontextprotocol/core': patch | ||
| --- | ||
|
|
||
| `Protocol.request()` now rejects with `SdkError(RequestTimeout, reason)` when called with an already-aborted signal, matching in-flight aborts. Previously the raw `signal.reason` was thrown. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -575,6 +575,25 @@ | |
|
|
||
| The auto-aggregate walk is capped at `ClientOptions.listMaxPages` pages (default 64; `0` disables) and throws an `SdkError` with `SdkErrorCode.ListPaginationExceeded` if the server's pagination does not converge, so a partial aggregate is never returned. The cap applies only to the no-`cursor` aggregate path; explicit per-page calls are never capped. The aggregated result is also written to the client's response cache (the source for `callTool`'s output-schema validation and SEP-2243 header mirroring). | ||
|
|
||
| ### Client honours server cache hints (SEP-2549) | ||
|
|
||
| On a 2026-07-28 connection the cacheable verbs — `listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()`, and `readResource()` — now serve a still-fresh held entry without a round trip when the server-stamped `ttlMs` has not elapsed. The behaviour is opt-in **by server hint**: a server that sends `ttlMs: 0` (the conservative default the SDK's `McpServer` stamps unless configured otherwise) sees byte-identical behaviour — every call fetches. A `list_changed` notification still evicts immediately regardless of TTL. | ||
|
|
||
| Per-call control via the new `CacheableRequestOptions.cacheMode` (`'use'` is the default): | ||
|
|
||
| ```typescript | ||
| await client.listTools(); // serve from cache if fresh | ||
| await client.listTools(undefined, { cacheMode: 'refresh' }); // always fetch, then re-store | ||
| await client.listTools(undefined, { cacheMode: 'bypass' }); // fetch; do not read or write the cache | ||
| ``` | ||
|
|
||
| New `ClientOptions`: | ||
|
|
||
| - `cachePartition?: string` — the opaque per-principal identifier for `'private'`-scoped entries (the spec's "MUST NOT share across authorization contexts"). Entries are automatically scoped by connected-server identity (derived from `serverInfo`), so one `responseCacheStore` may back several clients without consumer-side encoding; set `cachePartition` to your principal identifier (e.g. the auth subject) when sharing a store across principals. With the default `''` every entry — public or private — lives at the connected server's shared partition (the safe single-tenant posture). Note `serverInfo` is self-reported, so a server that deliberately impersonates another's `name`/`version` shares its `'public'` slot; the per-principal isolation holds regardless. | ||
| - `defaultCacheTtlMs?: number` — applied when a cacheable result lacks `ttlMs` (e.g. a legacy-era response). Default `0` — never serve from cache; the list aggregate is still **stored** so `callTool`'s mirroring/output-validation index keeps working regardless. The server-supplied `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`). | ||
|
|
||
| The `ResponseCacheStore` interface gained `delete(key)` (the per-URI invalidation `notifications/resources/updated` drives) — custom stores written against the alpha substrate need to add it. The default `InMemoryResponseCacheStore` is now bounded (default 512 entries, oldest-first eviction; configurable via `{ maxEntries }`). | ||
|
Check warning on line 595 in docs/migration.md
|
||
|
Comment on lines
+578
to
+595
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 The new SEP-2549 cache-honouring behaviour (cache-served listTools()/readResource(), the per-call cacheMode option, and the new ClientOptions cachePartition/defaultCacheTtlMs) is documented here in migration.md and in examples/caching/README.md, but the canonical client feature guide docs/client.md was not updated — its Tools and Resources sections still describe these verbs as always reaching the server, with no mention of cache-serving or how to force a fetch. Consider adding a short 'Response caching (SEP-2549)' subsection (or a sentence in the Tools/Resources sections) of docs/client.md covering the cache-serving behaviour, cacheMode, and cachePartition/defaultCacheTtlMs. Extended reasoning...The gap. This PR introduces user-visible client behaviour: What docs/client.md says today. A grep of Why this matters. The migration guide targets upgraders and the example README targets the example; a user consulting the feature reference for Why this is an inconsistency, not a different convention. The repo's own precedent is that comparable client-side behaviour changes in this stack got prose in Concrete walkthrough of the reader-facing gap. (1) A developer's host calls Suggested fix. Add a short 'Response caching (SEP-2549)' subsection to |
||
|
|
||
| **Output-schema validator lifecycle (every era):** validator compilation is now lazy — validators are compiled on the first `callTool()` against the cached `tools/list` entry, not eagerly inside `listTools()` — and non-throwing: an uncompilable `outputSchema` is `console.warn`-ed | ||
| and validation is skipped for that tool only. In v1, `listTools()` threw on an uncompilable `outputSchema`; now it succeeds, and a pluggable `jsonSchemaValidator` provider observes compilation at `callTool` time, not `listTools` time. The legacy-era `listTools()` path is | ||
| unchanged at the wire level but is observably different at the validator-lifecycle level. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,52 @@ | ||
| # caching | ||
|
|
||
| `CacheableResult` freshness hints (protocol revision 2026-07-28). The server declares hints at two layers — a per-registration `cacheHint` on the resource and server-level `ServerOptions.cacheHints` — and the SDK resolves most-specific-author-first (handler-return fields would | ||
| take precedence over both) and stamps `ttlMs`/`cacheScope` on the wire toward modern clients only. The client reads the stamped values back. | ||
|
|
||
| > Full client-side cache **honouring** (re-using a still-fresh result instead of re-requesting) is a follow-up; this example reads what the server emits today. | ||
| take precedence over both) and stamps `ttlMs`/`cacheScope` on the wire toward modern clients only. The client honours the stamped values: a still-fresh held entry is served without a round trip. | ||
|
|
||
| ```bash | ||
| pnpm tsx examples/caching/client.ts | ||
| ``` | ||
|
|
||
| The client calls `listTools()` and `readResource()` twice each; the second of each pair is served from the response cache. The server exposes a `request-count` tool (how many `tools/list` requests reached it) and a `read-count` tool (how many times the resource handler ran), so the example asserts each counter is unchanged after the cache-served call and increments after `cacheMode: 'refresh'`. | ||
|
|
||
| ## `cacheMode` | ||
|
|
||
| Per-call control on the cacheable verbs (`listTools()` / `listPrompts()` / `listResources()` / `listResourceTemplates()` / `readResource()`): | ||
|
|
||
| ```ts | ||
| await client.readResource({ uri: 'config://app' }); // 'use' (default): serve from cache if fresh | ||
| await client.readResource({ uri: 'config://app' }, { cacheMode: 'refresh' }); // always fetch, then re-store | ||
| await client.readResource({ uri: 'config://app' }, { cacheMode: 'bypass' }); // fetch; do not read or write the cache | ||
| ``` | ||
|
|
||
| A `list_changed` notification still evicts immediately regardless of TTL. | ||
|
|
||
| ## Custom store | ||
|
|
||
| The default per-client `InMemoryResponseCacheStore` (bounded at 512 entries by default) is enough for most hosts. To back the cache with something persistent (Redis, KV, IndexedDB), implement the five-method `ResponseCacheStore` interface — the store is a dumb keyed-value carrier; freshness and partitioning are the client's job: | ||
|
|
||
| ```ts | ||
| import type { CacheEntry, CacheKey, CacheScope, ResponseCacheStore } from '@modelcontextprotocol/client'; | ||
|
|
||
| class MyStore implements ResponseCacheStore { | ||
| async get(key: CacheKey): Promise<CacheEntry | undefined> { | ||
| /* read {value, stamp, expiresAt, scope} from your backend */ | ||
| } | ||
| async set(key: CacheKey, entry: { value: unknown; expiresAt?: number; scope?: CacheScope }): Promise<number> { | ||
| /* write entry under key; return a monotonically-increasing stamp */ | ||
| } | ||
| async delete(key: CacheKey): Promise<void> { | ||
| /* drop the single entry under key (no-op if absent) */ | ||
| } | ||
| async evict(method: string): Promise<void> { | ||
| /* drop every entry whose key.method === method (across every partition) */ | ||
| } | ||
| async clear(): Promise<void> { | ||
| /* drop everything */ | ||
| } | ||
| } | ||
|
|
||
| const client = new Client({ name: 'host', version: '1.0.0' }, { responseCacheStore: new MyStore(), cachePartition: principalId }); | ||
| ``` | ||
|
|
||
| The SDK scopes every entry by the connected server's identity automatically — you do not encode server identity into `cachePartition` or the store key yourself. When one store backs several principals against the same server, set `ClientOptions.cachePartition` to a stable identity of the authorization context (e.g. the auth subject) so `'private'`-scoped entries are isolated per principal; `'public'`-scoped entries are shared within the connected server's namespace automatically. Note `serverInfo` is self-reported, so a server that deliberately impersonates another's `name`/`version` shares its `'public'` slot; the per-principal isolation holds regardless. |
Uh oh!
There was an error while loading. Please reload this page.