feat(client): SEP-2549 — honor cacheHints (ttlMs/scope) on the response-cache substrate by felixweinberger · Pull Request #2340 · modelcontextprotocol/typescript-sdk

felixweinberger · 2026-06-22T21:44:39Z

Client-side honoring of the SEP-2549 CacheableResult freshness hints (ttlMs, cacheScope) on the response-cache substrate from #2336.

Motivation and Context

The 2026-07-28 spec requires tools/list, prompts/list, resources/list, resources/templates/list, resources/read, and server/discover results to carry ttlMs and cacheScope. The server SDK already stamps these (the examples/caching/ story); this PR makes the client honor them: a still-fresh entry is served from the response cache without a round-trip; list_changed and resources/updated evict regardless of TTL.

How Has This Been Tested?

Client suite (581, +34 over base), full e2e (2594p/157xf), run:examples 63/63 (the caching story now asserts the second listTools() is cache-served via a server-side request counter). Partition isolation is covered by an adversarial-server-name test (a server crafting serverInfo.name to collide with another server's principal partition does not succeed).

Breaking Changes

None — all options additive. With no options set, the only observable change is that a second no-arg listTools() within the server's ttlMs is served from cache (no round-trip).

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

New options: ClientOptions.cachePartition?: string (the principal slice — e.g., the auth subject — for stores shared across principals), ClientOptions.defaultCacheTtlMs?: number (applied when the server omits the hint; default 0 = always fetch but still store for SEP-2243 mirroring), RequestOptions.cacheMode?: 'use' | 'refresh' | 'bypass'.

Partition model: every cache entry is automatically scoped by the connected server's identity (derived from serverInfo.name@version). The full partition is JSON.stringify([serverIdentity, principal]) — collision-free by construction regardless of what a server puts in its name. public entries land at [serverIdentity, ''] (shared across principals on this server); private at [serverIdentity, cachePartition]. The shared-partition fallback only serves entries with scope === 'public'. Note: serverIdentity is self-reported by the server — two distinct origins claiming the same Implementation would share a public slice on a shared store; treat the store boundary accordingly.

InMemoryResponseCacheStore now has a maxEntries cap (default 512, oldest-out) so per-URI resources/read writes cannot grow unbounded. notifications/resources/updated evicts the matching resources/read entry. ttlMs is clamped to 24h. keyOf uses JSON encoding so NUL/quote in resource URIs cannot cause key collisions.

changeset-bot · 2026-06-22T21:44:47Z

🦋 Changeset detected

Latest commit: bf3bc85

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages

Name	Type
@modelcontextprotocol/client	Major
@modelcontextprotocol/core	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2026-06-22T21:46:49Z

Open in StackBlitz

@modelcontextprotocol/client

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/client@2340

@modelcontextprotocol/codemod

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/codemod@2340

@modelcontextprotocol/server

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/server@2340

@modelcontextprotocol/server-legacy

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/server-legacy@2340

@modelcontextprotocol/express

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/express@2340

@modelcontextprotocol/fastify

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/fastify@2340

@modelcontextprotocol/hono

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/hono@2340

@modelcontextprotocol/node

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/node@2340

commit: bf3bc85

…st*/readResource The four list verbs and readResource now serve a still-fresh ResponseCacheStore entry without a round trip when the server-stamped ttlMs has not elapsed. Additive on the substrate (#2336): _listAllPages now stamps {expiresAt, scope} on the aggregate write; a _serveFromCache front gates each verb on freshness; readResource is newly cached (URI-keyed; only stored when ttl > 0, since the URI keyspace is unbounded and there is no derived index). Per-call CacheableRequestOptions.cacheMode ('use' | 'refresh' | 'bypass') maps to mcp.d's CacheMode. ClientOptions.cachePartition is the per-principal slot for 'private'-scoped entries (the spec's MUST-NOT-share-across-authz-contexts); 'public' entries always live at partition '' so a shared store serves them to every co-tenant. ClientResponseCache reads probe own-partition then '' (mcp.d's two-probe order — own-first because scope is only known after a fetch); the toolDefinition/outputValidator derived indices use the same probe so SEP-2243 mirroring works under partitioning. readResource applies the same partition derivation as the list verbs and treats absent cacheScope as 'private', so a shared store cannot serve one principal's resource body to another. ClientOptions.defaultCacheTtlMs (default 0) supplies the TTL when the result lacks one (e.g. a legacy-era response); an explicit server-sent ttlMs:0 is honoured as immediately stale. List aggregates are always stored regardless of TTL (mcp.d's retainForSchema posture) so callTool's mirroring/output-validation index keeps working at any TTL while the freshness gate never serves a stale entry. A list_changed eviction beats TTL (the existing partition-agnostic evict). Clock seam (now) injectable on ClientResponseCache for tests. New exports: CacheMode, CacheableRequestOptions.

…dds cacheMode + custom-store sections The client now calls listTools() and readResource() twice each and asserts the second of each pair is cache-served — the server's resource handler counts how many times it ran and exposes that via a read-count tool, so the example verifies (server-side) that the cache hit never reached the wire. Demonstrates cacheMode:'refresh' and the post-refresh return to cache-serving. README drops the follow-up note (honouring is shipped), adds a §cacheMode section, and adds a §Custom store section showing the four-method ResponseCacheStore interface shape with the cachePartition guidance for shared stores.

claude · 2026-06-23T01:40:55Z

+### Client honours server cache hints (SEP-2549)
+
+On a 2026-07-28 connection the cacheable verbs — `listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()`, and `readResource()` — now serve a still-fresh held entry without a round trip when the server-stamped `ttlMs` has not elapsed. The behaviour is opt-in **by server hint**: a server that sends `ttlMs: 0` (the conservative default the SDK's `McpServer` stamps unless configured otherwise) sees byte-identical behaviour — every call fetches. A `list_changed` notification still evicts immediately regardless of TTL.
+
+Per-call control via the new `CacheableRequestOptions.cacheMode` (`'use'` is the default):
+
+```typescript
+await client.listTools(); // serve from cache if fresh
+await client.listTools(undefined, { cacheMode: 'refresh' }); // always fetch, then re-store
+await client.listTools(undefined, { cacheMode: 'bypass' }); // fetch; do not read or write the cache
+```
+
+New `ClientOptions`:
+
+- `cachePartition?: string` — the opaque per-principal identifier for `'private'`-scoped entries (the spec's "MUST NOT share across authorization contexts"). Entries are automatically scoped by connected-server identity (derived from `serverInfo`), so one `responseCacheStore` may back several clients without consumer-side encoding; set `cachePartition` to your principal identifier (e.g. the auth subject) when sharing a store across principals. With the default `''` every entry — public or private — lives at the connected server's shared partition (the safe single-tenant posture). Note `serverInfo` is self-reported, so a server that deliberately impersonates another's `name`/`version` shares its `'public'` slot; the per-principal isolation holds regardless.
+- `defaultCacheTtlMs?: number` — applied when a cacheable result lacks `ttlMs` (e.g. a legacy-era response). Default `0` — never serve from cache; the list aggregate is still **stored** so `callTool`'s mirroring/output-validation index keeps working regardless. The server-supplied `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`).
+
+The `ResponseCacheStore` interface gained `delete(key)` (the per-URI invalidation `notifications/resources/updated` drives) — custom stores written against the alpha substrate need to add it. The default `InMemoryResponseCacheStore` is now bounded (default 512 entries, oldest-first eviction; configurable via `{ maxEntries }`).


🟡 The new SEP-2549 cache-honouring behaviour (cache-served listTools()/readResource(), the per-call cacheMode option, and the new ClientOptions cachePartition/defaultCacheTtlMs) is documented here in migration.md and in examples/caching/README.md, but the canonical client feature guide docs/client.md was not updated — its Tools and Resources sections still describe these verbs as always reaching the server, with no mention of cache-serving or how to force a fetch. Consider adding a short 'Response caching (SEP-2549)' subsection (or a sentence in the Tools/Resources sections) of docs/client.md covering the cache-serving behaviour, cacheMode, and cachePartition/defaultCacheTtlMs.

Extended reasoning...

The gap. This PR introduces user-visible client behaviour: listTools(), listPrompts(), listResources(), listResourceTemplates(), and readResource() may now be served from the response cache without a round trip when the server-stamped ttlMs has not elapsed, plus the new per-call cacheMode option ('use' | 'refresh' | 'bypass'), the new ClientOptions.cachePartition / defaultCacheTtlMs semantics, and new public exports (CacheMode, CacheableRequestOptions, InMemoryResponseCacheStoreOptions, MAX_CACHE_TTL_MS). Prose was added to docs/migration.md (this hunk), docs/migration-SKILL.md, and examples/caching/README.md — but docs/client.md, the canonical client feature reference, is not touched by the PR.

What docs/client.md says today. A grep of docs/client.md for cacheMode / cacheHint / ttlMs / cachePartition / defaultCacheTtlMs / responseCacheStore returns nothing related to this feature; the only cache-related prose is the SEP-2243 'internal tools/list cache' paragraph and the listChanged local-cache option. Its Tools section (~line 255) describes listTools() as 'walks every page on your behalf' and the Resources section (~lines 318–332) describes listResources()/readResource() purely as discovering and reading server-provided data — the readResource example there even uses the same config://app URI the caching example now cache-serves. Nothing tells a reader of the feature guide that, after this PR, a second call within the server's ttlMs may never reach the server, or that cacheMode: 'refresh' / 'bypass' exists to force a fetch.

Why this matters. The migration guide targets upgraders and the example README targets the example; a user consulting the feature reference for listTools()/readResource() (e.g. while debugging why a request never hit their server) will not learn that calls can be cache-served, nor how to opt out per call. The repo's review checklist asks for prose documentation of new features and for updating docs that describe the pre-change behaviour.

Why this is an inconsistency, not a different convention. The repo's own precedent is that comparable client-side behaviour changes in this stack got prose in docs/client.md: the predecessor PR's auto-aggregation behaviour is the source of the 'walks every page on your behalf' wording there, and the SEP-2243 mirroring feature has its own subsection. SEP-2549 cache honouring is the same kind of user-visible Client behaviour change and is the only one of the set absent from client.md.

Concrete walkthrough of the reader-facing gap. (1) A developer's host calls client.readResource({ uri: 'config://app' }) against a server stamping ttlMs: 60_000; they then change the resource server-side and call readResource again within 60 s. (2) The second call returns the old body with no wire request — by design. (3) They open docs/client.md → Resources to understand why; the section describes readResource() as reading server data with no mention of the response cache, ttlMs, or cacheMode, so the behaviour looks like a bug rather than a documented feature with a documented escape hatch ({ cacheMode: 'refresh' }).

Suggested fix. Add a short 'Response caching (SEP-2549)' subsection to docs/client.md (or a sentence each in the Tools and Resources sections) stating that the cacheable verbs serve a still-fresh entry without a round trip when the server stamps a positive ttlMs, that cacheMode: 'refresh' | 'bypass' forces a fetch, and pointing at ClientOptions.cachePartition / defaultCacheTtlMs / responseCacheStore for shared-store setups — largely a condensed copy of the migration.md section added in this PR. Filed as a nit: the feature is documented (migration guide, example README, JSDoc), just not in the feature reference users of these verbs actually consult.

claude · 2026-06-23T01:40:55Z

+        // The aggregate is ALWAYS written: even when the resolved TTL is ≤0
+        // the entry is stored already-stale (mcp.d's `retainForSchema`
+        // posture) so the `tools/list`-derived index keeps working regardless,
+        // while the freshness gate in `_serveFromCache` never serves it.
+        // Page-1 carries the result-level `ttlMs`/`cacheScope` (`acc` IS the
+        // mutated page-1 object).
+        await this._cache.write(method, acc, generation, this._freshness(acc));
        return acc;


🟡 When _listAllPages aggregates a multi-page list, the terminal cache write computes freshness from this._freshness(acc) where acc is the page-1 result object, so ttlMs/cacheScope hints carried by pages 2..N are silently discarded. A later page's stricter hint is therefore ignored: a page-2 ttlMs: 0 ("do not cache") aggregate is served from cache for page-1's full TTL, and a page-2 cacheScope: 'private' is downgraded to page-1's 'public', storing the private-scoped page contents at the shared [serverIdentity, ''] partition where another principal's shared-partition probe will serve them on a shared store. Resolving most-restrictively while walking (min ttlMs across pages, 'private' if any page is private) is a small change in the loop and matches the conservative posture this PR takes everywhere else.

Extended reasoning...

The mechanism. _listAllPages (packages/client/src/client/client.ts:1568-1575) aggregates every page into acc, which IS the page-1 result object — append(acc, page) only pushes the later pages' items, and the per-page page objects are then discarded. The terminal cache write is await this._cache.write(method, acc, generation, this._freshness(acc)), and _freshness (lines 1592-1600) reads acc.ttlMs / acc.cacheScope — i.e. page 1's hints only. The ttlMs/cacheScope fields carried by pages 2..N are never consulted anywhere. The inline comment "Page-1 carries the result-level ttlMs/cacheScope" asserts hint uniformity rather than choosing a resolution for the heterogeneous case.\n\nWhy heterogeneous per-page hints are spec-legal and expressible. SEP-2549's CacheableResult fields are per-result, and each page of a paginated walk is an independent result that the 2026-07-28 codec requires to carry them. This SDK's own server resolves hints most-specific-author-first — attachCacheHintFallback only fills fields the handler did not set — so a low-level paginated list handler can legitimately return different ttlMs/cacheScope per page (e.g. a volatile or per-principal tail page), and third-party servers can too. Every multi-page test in responseCache.test.ts / mcpParamMirroring.test.ts stamps identical hints on all pages, which is why nothing pins this.\n\nConsequence 1 — TTL over-caching. Page 1 stamps ttlMs: 60_000; page 2 stamps ttlMs: 0 (the spec's "immediately stale" / do-not-cache). The aggregate — including the page-2 items the server asked not to cache — is stored with expiresAt = now + 60s and served from cache for the full minute. The server's only remaining lever is a list_changed notification, which it has no reason to send (the list did not change; it just declared part of it uncacheable).\n\nConsequence 2 — scope downgrade onto the shared/public partition. Page 1 stamps cacheScope: 'public'; a later page stamps 'private' (per-principal items mixed into the tail). _freshness(acc) resolves scope: 'public', so write() stores the whole aggregate — including the private-scoped page's contents — at the shared partition [serverIdentity, ''] with scope: 'public'. On a responseCacheStore shared across principals (the arrangement this PR's docs/README explicitly endorse, with cachePartition set per principal), another principal's client then gets a shared-partition hit — _probe's fallback is gated on the stored scope === 'public', which this entry now claims — and is served the private-scoped page contents without a round trip. That is exactly the cross-authorization-context sharing the spec's private scope forbids and that the rest of this PR's partition design (the two-probe scope gate, the misconfigured-co-tenant guard test, the JSON-encoded partition) is built to prevent.\n\nStep-by-step proof. (1) Configure a scripted modern server whose tools/list page 1 returns { ttlMs: 60_000, cacheScope: 'public', tools: [...], nextCursor: '1' } and page 2 returns { ttlMs: 0, cacheScope: 'private', tools: [privateTool] }. (2) Client (cachePartition: 'alice') calls listTools() → _listAllPages walks both pages, acc is the page-1 object with page-2's tools appended. (3) _freshness(acc) reads acc.ttlMs = 60_000, acc.cacheScope = 'public' → write() stores the aggregate at [serverIdentity, ''] with scope: 'public', expiresAt = now + 60s. (4) A second client (cachePartition: 'bob') on the same store calls listTools() → _probe's own-partition miss falls through to the shared partition, finds the entry with scope === 'public', and serves it — including privateTool from the page the server marked private and ttlMs: 0 — with no wire request, for up to 60 s.\n\nWhy nothing else prevents it. The freshness seam is the single _freshness(acc) call; no other code reads later pages' hint fields. The list_changed / HEADER_MISMATCH evictions are orthogonal. The single-page case and the SDK server with uniform per-method ServerOptions.cacheHints are unaffected, which is why no test catches it.\n\nFix. Resolve the aggregate's freshness most-restrictively while walking: track ttlMs = min(...) across pages and scope = 'private' if any page is private (one or two extra lines in the page loop), and pass that to _freshness/write instead of reading acc (page 1) alone. Alternatively, document that only page-1 hints are honoured — but the most-restrictive resolution is cheap and matches the conservative posture the PR takes everywhere else (24h clamp, private-by-default, scope-gated shared probe).\n\nSeverity. Filed as a non-blocking nit: the trigger requires a server emitting heterogeneous per-page hints (unusual but spec-legal), and consequence 2 additionally requires a store shared across principals — but that sharing is an explicitly documented configuration of this same PR, so the scope-downgrade half is a genuine gap in the isolation story the PR documents.

felixweinberger requested a review from a team as a code owner June 22, 2026 21:44

felixweinberger force-pushed the fweinberger/cachehints-honoring branch from 80597d6 to 10f7d17 Compare June 22, 2026 21:53