feat(client): SEP-2549 — honor cacheHints (ttlMs/scope) on the response-cache substrate#2340
feat(client): SEP-2549 — honor cacheHints (ttlMs/scope) on the response-cache substrate#2340felixweinberger wants to merge 2 commits into
Conversation
🦋 Changeset detectedLatest commit: bf3bc85 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@modelcontextprotocol/client
@modelcontextprotocol/codemod
@modelcontextprotocol/server
@modelcontextprotocol/server-legacy
@modelcontextprotocol/express
@modelcontextprotocol/fastify
@modelcontextprotocol/hono
@modelcontextprotocol/node
commit: |
80597d6 to
10f7d17
Compare
10f7d17 to
de00587
Compare
de00587 to
3a2b2b4
Compare
3a2b2b4 to
362d306
Compare
362d306 to
f3e7d47
Compare
f3e7d47 to
3fd9970
Compare
3fd9970 to
16bb283
Compare
…st*/readResource The four list verbs and readResource now serve a still-fresh ResponseCacheStore entry without a round trip when the server-stamped ttlMs has not elapsed. Additive on the substrate (#2336): _listAllPages now stamps {expiresAt, scope} on the aggregate write; a _serveFromCache front gates each verb on freshness; readResource is newly cached (URI-keyed; only stored when ttl > 0, since the URI keyspace is unbounded and there is no derived index). Per-call CacheableRequestOptions.cacheMode ('use' | 'refresh' | 'bypass') maps to mcp.d's CacheMode. ClientOptions.cachePartition is the per-principal slot for 'private'-scoped entries (the spec's MUST-NOT-share-across-authz-contexts); 'public' entries always live at partition '' so a shared store serves them to every co-tenant. ClientResponseCache reads probe own-partition then '' (mcp.d's two-probe order — own-first because scope is only known after a fetch); the toolDefinition/outputValidator derived indices use the same probe so SEP-2243 mirroring works under partitioning. readResource applies the same partition derivation as the list verbs and treats absent cacheScope as 'private', so a shared store cannot serve one principal's resource body to another. ClientOptions.defaultCacheTtlMs (default 0) supplies the TTL when the result lacks one (e.g. a legacy-era response); an explicit server-sent ttlMs:0 is honoured as immediately stale. List aggregates are always stored regardless of TTL (mcp.d's retainForSchema posture) so callTool's mirroring/output-validation index keeps working at any TTL while the freshness gate never serves a stale entry. A list_changed eviction beats TTL (the existing partition-agnostic evict). Clock seam (now) injectable on ClientResponseCache for tests. New exports: CacheMode, CacheableRequestOptions.
…dds cacheMode + custom-store sections The client now calls listTools() and readResource() twice each and asserts the second of each pair is cache-served — the server's resource handler counts how many times it ran and exposes that via a read-count tool, so the example verifies (server-side) that the cache hit never reached the wire. Demonstrates cacheMode:'refresh' and the post-refresh return to cache-serving. README drops the follow-up note (honouring is shipped), adds a §cacheMode section, and adds a §Custom store section showing the four-method ResponseCacheStore interface shape with the cachePartition guidance for shared stores.
16bb283 to
bf3bc85
Compare
| ### Client honours server cache hints (SEP-2549) | ||
|
|
||
| On a 2026-07-28 connection the cacheable verbs — `listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()`, and `readResource()` — now serve a still-fresh held entry without a round trip when the server-stamped `ttlMs` has not elapsed. The behaviour is opt-in **by server hint**: a server that sends `ttlMs: 0` (the conservative default the SDK's `McpServer` stamps unless configured otherwise) sees byte-identical behaviour — every call fetches. A `list_changed` notification still evicts immediately regardless of TTL. | ||
|
|
||
| Per-call control via the new `CacheableRequestOptions.cacheMode` (`'use'` is the default): | ||
|
|
||
| ```typescript | ||
| await client.listTools(); // serve from cache if fresh | ||
| await client.listTools(undefined, { cacheMode: 'refresh' }); // always fetch, then re-store | ||
| await client.listTools(undefined, { cacheMode: 'bypass' }); // fetch; do not read or write the cache | ||
| ``` | ||
|
|
||
| New `ClientOptions`: | ||
|
|
||
| - `cachePartition?: string` — the opaque per-principal identifier for `'private'`-scoped entries (the spec's "MUST NOT share across authorization contexts"). Entries are automatically scoped by connected-server identity (derived from `serverInfo`), so one `responseCacheStore` may back several clients without consumer-side encoding; set `cachePartition` to your principal identifier (e.g. the auth subject) when sharing a store across principals. With the default `''` every entry — public or private — lives at the connected server's shared partition (the safe single-tenant posture). Note `serverInfo` is self-reported, so a server that deliberately impersonates another's `name`/`version` shares its `'public'` slot; the per-principal isolation holds regardless. | ||
| - `defaultCacheTtlMs?: number` — applied when a cacheable result lacks `ttlMs` (e.g. a legacy-era response). Default `0` — never serve from cache; the list aggregate is still **stored** so `callTool`'s mirroring/output-validation index keeps working regardless. The server-supplied `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`). | ||
|
|
||
| The `ResponseCacheStore` interface gained `delete(key)` (the per-URI invalidation `notifications/resources/updated` drives) — custom stores written against the alpha substrate need to add it. The default `InMemoryResponseCacheStore` is now bounded (default 512 entries, oldest-first eviction; configurable via `{ maxEntries }`). |
There was a problem hiding this comment.
🟡 The new SEP-2549 cache-honouring behaviour (cache-served listTools()/readResource(), the per-call cacheMode option, and the new ClientOptions cachePartition/defaultCacheTtlMs) is documented here in migration.md and in examples/caching/README.md, but the canonical client feature guide docs/client.md was not updated — its Tools and Resources sections still describe these verbs as always reaching the server, with no mention of cache-serving or how to force a fetch. Consider adding a short 'Response caching (SEP-2549)' subsection (or a sentence in the Tools/Resources sections) of docs/client.md covering the cache-serving behaviour, cacheMode, and cachePartition/defaultCacheTtlMs.
Extended reasoning...
The gap. This PR introduces user-visible client behaviour: listTools(), listPrompts(), listResources(), listResourceTemplates(), and readResource() may now be served from the response cache without a round trip when the server-stamped ttlMs has not elapsed, plus the new per-call cacheMode option ('use' | 'refresh' | 'bypass'), the new ClientOptions.cachePartition / defaultCacheTtlMs semantics, and new public exports (CacheMode, CacheableRequestOptions, InMemoryResponseCacheStoreOptions, MAX_CACHE_TTL_MS). Prose was added to docs/migration.md (this hunk), docs/migration-SKILL.md, and examples/caching/README.md — but docs/client.md, the canonical client feature reference, is not touched by the PR.
What docs/client.md says today. A grep of docs/client.md for cacheMode / cacheHint / ttlMs / cachePartition / defaultCacheTtlMs / responseCacheStore returns nothing related to this feature; the only cache-related prose is the SEP-2243 'internal tools/list cache' paragraph and the listChanged local-cache option. Its Tools section (~line 255) describes listTools() as 'walks every page on your behalf' and the Resources section (~lines 318–332) describes listResources()/readResource() purely as discovering and reading server-provided data — the readResource example there even uses the same config://app URI the caching example now cache-serves. Nothing tells a reader of the feature guide that, after this PR, a second call within the server's ttlMs may never reach the server, or that cacheMode: 'refresh' / 'bypass' exists to force a fetch.
Why this matters. The migration guide targets upgraders and the example README targets the example; a user consulting the feature reference for listTools()/readResource() (e.g. while debugging why a request never hit their server) will not learn that calls can be cache-served, nor how to opt out per call. The repo's review checklist asks for prose documentation of new features and for updating docs that describe the pre-change behaviour.
Why this is an inconsistency, not a different convention. The repo's own precedent is that comparable client-side behaviour changes in this stack got prose in docs/client.md: the predecessor PR's auto-aggregation behaviour is the source of the 'walks every page on your behalf' wording there, and the SEP-2243 mirroring feature has its own subsection. SEP-2549 cache honouring is the same kind of user-visible Client behaviour change and is the only one of the set absent from client.md.
Concrete walkthrough of the reader-facing gap. (1) A developer's host calls client.readResource({ uri: 'config://app' }) against a server stamping ttlMs: 60_000; they then change the resource server-side and call readResource again within 60 s. (2) The second call returns the old body with no wire request — by design. (3) They open docs/client.md → Resources to understand why; the section describes readResource() as reading server data with no mention of the response cache, ttlMs, or cacheMode, so the behaviour looks like a bug rather than a documented feature with a documented escape hatch ({ cacheMode: 'refresh' }).
Suggested fix. Add a short 'Response caching (SEP-2549)' subsection to docs/client.md (or a sentence each in the Tools and Resources sections) stating that the cacheable verbs serve a still-fresh entry without a round trip when the server stamps a positive ttlMs, that cacheMode: 'refresh' | 'bypass' forces a fetch, and pointing at ClientOptions.cachePartition / defaultCacheTtlMs / responseCacheStore for shared-store setups — largely a condensed copy of the migration.md section added in this PR. Filed as a nit: the feature is documented (migration guide, example README, JSDoc), just not in the feature reference users of these verbs actually consult.
| // The aggregate is ALWAYS written: even when the resolved TTL is ≤0 | ||
| // the entry is stored already-stale (mcp.d's `retainForSchema` | ||
| // posture) so the `tools/list`-derived index keeps working regardless, | ||
| // while the freshness gate in `_serveFromCache` never serves it. | ||
| // Page-1 carries the result-level `ttlMs`/`cacheScope` (`acc` IS the | ||
| // mutated page-1 object). | ||
| await this._cache.write(method, acc, generation, this._freshness(acc)); | ||
| return acc; |
There was a problem hiding this comment.
🟡 When _listAllPages aggregates a multi-page list, the terminal cache write computes freshness from this._freshness(acc) where acc is the page-1 result object, so ttlMs/cacheScope hints carried by pages 2..N are silently discarded. A later page's stricter hint is therefore ignored: a page-2 ttlMs: 0 ("do not cache") aggregate is served from cache for page-1's full TTL, and a page-2 cacheScope: 'private' is downgraded to page-1's 'public', storing the private-scoped page contents at the shared [serverIdentity, ''] partition where another principal's shared-partition probe will serve them on a shared store. Resolving most-restrictively while walking (min ttlMs across pages, 'private' if any page is private) is a small change in the loop and matches the conservative posture this PR takes everywhere else.
Extended reasoning...
The mechanism. _listAllPages (packages/client/src/client/client.ts:1568-1575) aggregates every page into acc, which IS the page-1 result object — append(acc, page) only pushes the later pages' items, and the per-page page objects are then discarded. The terminal cache write is await this._cache.write(method, acc, generation, this._freshness(acc)), and _freshness (lines 1592-1600) reads acc.ttlMs / acc.cacheScope — i.e. page 1's hints only. The ttlMs/cacheScope fields carried by pages 2..N are never consulted anywhere. The inline comment "Page-1 carries the result-level ttlMs/cacheScope" asserts hint uniformity rather than choosing a resolution for the heterogeneous case.\n\nWhy heterogeneous per-page hints are spec-legal and expressible. SEP-2549's CacheableResult fields are per-result, and each page of a paginated walk is an independent result that the 2026-07-28 codec requires to carry them. This SDK's own server resolves hints most-specific-author-first — attachCacheHintFallback only fills fields the handler did not set — so a low-level paginated list handler can legitimately return different ttlMs/cacheScope per page (e.g. a volatile or per-principal tail page), and third-party servers can too. Every multi-page test in responseCache.test.ts / mcpParamMirroring.test.ts stamps identical hints on all pages, which is why nothing pins this.\n\nConsequence 1 — TTL over-caching. Page 1 stamps ttlMs: 60_000; page 2 stamps ttlMs: 0 (the spec's "immediately stale" / do-not-cache). The aggregate — including the page-2 items the server asked not to cache — is stored with expiresAt = now + 60s and served from cache for the full minute. The server's only remaining lever is a list_changed notification, which it has no reason to send (the list did not change; it just declared part of it uncacheable).\n\nConsequence 2 — scope downgrade onto the shared/public partition. Page 1 stamps cacheScope: 'public'; a later page stamps 'private' (per-principal items mixed into the tail). _freshness(acc) resolves scope: 'public', so write() stores the whole aggregate — including the private-scoped page's contents — at the shared partition [serverIdentity, ''] with scope: 'public'. On a responseCacheStore shared across principals (the arrangement this PR's docs/README explicitly endorse, with cachePartition set per principal), another principal's client then gets a shared-partition hit — _probe's fallback is gated on the stored scope === 'public', which this entry now claims — and is served the private-scoped page contents without a round trip. That is exactly the cross-authorization-context sharing the spec's private scope forbids and that the rest of this PR's partition design (the two-probe scope gate, the misconfigured-co-tenant guard test, the JSON-encoded partition) is built to prevent.\n\nStep-by-step proof. (1) Configure a scripted modern server whose tools/list page 1 returns { ttlMs: 60_000, cacheScope: 'public', tools: [...], nextCursor: '1' } and page 2 returns { ttlMs: 0, cacheScope: 'private', tools: [privateTool] }. (2) Client (cachePartition: 'alice') calls listTools() → _listAllPages walks both pages, acc is the page-1 object with page-2's tools appended. (3) _freshness(acc) reads acc.ttlMs = 60_000, acc.cacheScope = 'public' → write() stores the aggregate at [serverIdentity, ''] with scope: 'public', expiresAt = now + 60s. (4) A second client (cachePartition: 'bob') on the same store calls listTools() → _probe's own-partition miss falls through to the shared partition, finds the entry with scope === 'public', and serves it — including privateTool from the page the server marked private and ttlMs: 0 — with no wire request, for up to 60 s.\n\nWhy nothing else prevents it. The freshness seam is the single _freshness(acc) call; no other code reads later pages' hint fields. The list_changed / HEADER_MISMATCH evictions are orthogonal. The single-page case and the SDK server with uniform per-method ServerOptions.cacheHints are unaffected, which is why no test catches it.\n\nFix. Resolve the aggregate's freshness most-restrictively while walking: track ttlMs = min(...) across pages and scope = 'private' if any page is private (one or two extra lines in the page loop), and pass that to _freshness/write instead of reading acc (page 1) alone. Alternatively, document that only page-1 hints are honoured — but the most-restrictive resolution is cheap and matches the conservative posture the PR takes everywhere else (24h clamp, private-by-default, scope-gated shared probe).\n\nSeverity. Filed as a non-blocking nit: the trigger requires a server emitting heterogeneous per-page hints (unusual but spec-legal), and consequence 2 additionally requires a store shared across principals — but that sharing is an explicitly documented configuration of this same PR, so the scope-downgrade half is a genuine gap in the isolation story the PR documents.
Client-side honoring of the SEP-2549
CacheableResultfreshness hints (ttlMs,cacheScope) on the response-cache substrate from #2336.Motivation and Context
The 2026-07-28 spec requires
tools/list,prompts/list,resources/list,resources/templates/list,resources/read, andserver/discoverresults to carryttlMsandcacheScope. The server SDK already stamps these (theexamples/caching/story); this PR makes the client honor them: a still-fresh entry is served from the response cache without a round-trip;list_changedandresources/updatedevict regardless of TTL.How Has This Been Tested?
Client suite (581, +34 over base), full e2e (2594p/157xf),
run:examples63/63 (the caching story now asserts the secondlistTools()is cache-served via a server-side request counter). Partition isolation is covered by an adversarial-server-name test (a server craftingserverInfo.nameto collide with another server's principal partition does not succeed).Breaking Changes
None — all options additive. With no options set, the only observable change is that a second no-arg
listTools()within the server'sttlMsis served from cache (no round-trip).Types of changes
Checklist
Additional context
New options:
ClientOptions.cachePartition?: string(the principal slice — e.g., the auth subject — for stores shared across principals),ClientOptions.defaultCacheTtlMs?: number(applied when the server omits the hint; default 0 = always fetch but still store for SEP-2243 mirroring),RequestOptions.cacheMode?: 'use' | 'refresh' | 'bypass'.Partition model: every cache entry is automatically scoped by the connected server's identity (derived from
serverInfo.name@version). The full partition isJSON.stringify([serverIdentity, principal])— collision-free by construction regardless of what a server puts in its name.publicentries land at[serverIdentity, ''](shared across principals on this server);privateat[serverIdentity, cachePartition]. The shared-partition fallback only serves entries withscope === 'public'. Note:serverIdentityis self-reported by the server — two distinct origins claiming the sameImplementationwould share a public slice on a shared store; treat the store boundary accordingly.InMemoryResponseCacheStorenow has amaxEntriescap (default 512, oldest-out) so per-URIresources/readwrites cannot grow unbounded.notifications/resources/updatedevicts the matchingresources/readentry.ttlMsis clamped to 24h.keyOfuses JSON encoding so NUL/quote in resource URIs cannot cause key collisions.