modelcontextprotocol · felixweinberger · Jun 22, 2026 · Jun 22, 2026 · claude · Jun 23, 2026
@@ -0,0 +1,7 @@
+---
+'@modelcontextprotocol/client': minor
+---
+
+`Client` now **honours** the server-stamped SEP-2549 `ttlMs`/`cacheScope` cache hints on the cacheable verbs (`listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()`, `readResource()`): a still-fresh held entry is served without a round trip. New `CacheableRequestOptions.cacheMode` (`'use'` — the default; `'refresh'` — always fetch and re-store; `'bypass'` — fetch without consulting or writing the cache) gives per-call control. The behaviour is opt-in by hint: a server that sends `ttlMs: 0` (the conservative default this SDK's server stamps) sees byte-identical behaviour — every call fetches.
+
+Entries are automatically scoped by connected-server identity (derived from `serverInfo` after connect, encoded collision-free via `JSON.stringify`); `ClientOptions.cachePartition` is the opaque per-principal slot for `'private'`-scoped entries — set it to your principal identifier (e.g. the auth subject) when one `responseCacheStore` backs several principals. With the default `''` every entry lives at the connected server's shared partition (the safe single-tenant posture). `ClientOptions.defaultCacheTtlMs` (default `0`) supplies the TTL when a result lacks one (e.g. a legacy-era response); the server-supplied `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`). The list verbs always store the aggregate (so `callTool`'s mirroring/output-validation index keeps working at any TTL); `readResource` stores only when the resolved TTL is positive. `notifications/resources/updated` evicts the cached `resources/read` body for that URI. `ResponseCacheStore` gained `delete(key)`; `InMemoryResponseCacheStore` is now bounded (`{ maxEntries }`, default 512, oldest-first eviction). New exports: `CacheMode`, `CacheableRequestOptions`, `InMemoryResponseCacheStoreOptions`, `MAX_CACHE_TTL_MS`.
@@ -2,6 +2,6 @@
 '@modelcontextprotocol/client': major
 ---
 
-`Client.listTools()` / `listPrompts()` / `listResources()` / `listResourceTemplates()` now **auto-aggregate every page** when called without a `cursor` and return the complete result with `nextCursor: undefined` (matching the C#, Java, and mcp.d SDKs). Pass an explicit `{ cursor }` string to fetch a single page; the per-page path is unchanged. Existing manual pagination loops keep working — the first iteration returns everything and the loop exits — but can be deleted. The aggregated result is written to the new pluggable `ResponseCacheStore` (default: a fresh per-instance `InMemoryResponseCacheStore`); a `ClientResponseCache` collaborator owns the eviction-generation guard and the derived `tools/list` index that `callTool`'s output validation and SEP-2243 `Mcp-Param-*` mirroring read. New exports: `ResponseCacheStore`, `CacheKey`, `CacheEntry`, `CacheScope`, `MaybePromise`, `InMemoryResponseCacheStore`; new `ClientOptions.responseCacheStore` / `ClientOptions.listMaxPages` (caps the auto-aggregate walk at 64 pages by default; throws `SdkError` with `SdkErrorCode.ListPaginationExceeded` on overrun so a partial aggregate is never cached). The store interface is async-ready (`MaybePromise<…>`); the in-memory default stays synchronous. **A store instance must not be shared across `Client` instances at all in v2.0.x** — entries are keyed by method only (server-identity confusion + `clear()`/`evict()` cross-talk); per-principal partitioning that enables safe sharing arrives with the full caching engine.
+`Client.listTools()` / `listPrompts()` / `listResources()` / `listResourceTemplates()` now **auto-aggregate every page** when called without a `cursor` and return the complete result with `nextCursor: undefined` (matching the C#, Java, and mcp.d SDKs). Pass an explicit `{ cursor }` string to fetch a single page; the per-page path is unchanged. Existing manual pagination loops keep working — the first iteration returns everything and the loop exits — but can be deleted. The aggregated result is written to the new pluggable `ResponseCacheStore` (default: a fresh per-instance `InMemoryResponseCacheStore`); a `ClientResponseCache` collaborator owns the eviction-generation guard and the derived `tools/list` index that `callTool`'s output validation and SEP-2243 `Mcp-Param-*` mirroring read. New exports: `ResponseCacheStore`, `CacheKey`, `CacheEntry`, `CacheScope`, `MaybePromise`, `InMemoryResponseCacheStore`; new `ClientOptions.responseCacheStore` / `ClientOptions.listMaxPages` (caps the auto-aggregate walk at 64 pages by default; throws `SdkError` with `SdkErrorCode.ListPaginationExceeded` on overrun so a partial aggregate is never cached). The store interface is async-ready (`MaybePromise<…>`); the in-memory default stays synchronous. Entries are automatically scoped by the connected server's identity and (when set) the consumer-supplied `cachePartition`, so a shared store does not collide across servers or principals; evictions are likewise scoped to the connected server's partitions.
 
 **Behavior change (every era):** output-schema validator compilation is now lazy — validators are compiled on the first `callTool()` against the cached `tools/list` entry, not eagerly inside `listTools()` — and non-throwing: an uncompilable `outputSchema` is `console.warn`-ed and validation is skipped for that tool only (previously `listTools()` threw). A pluggable `jsonSchemaValidator` provider therefore observes compilation at `callTool` time, not `listTools` time. The legacy-era `listTools()` path is unchanged at the wire level but is observably different at the validator-lifecycle level.
@@ -0,0 +1,5 @@
+---
+'@modelcontextprotocol/core': patch
+---
+
+`Protocol.request()` now rejects with `SdkError(RequestTimeout, reason)` when called with an already-aborted signal, matching in-flight aborts. Previously the raw `signal.reason` was thrown.
@@ -572,6 +572,8 @@ side: auto-fulfilment is on by default (`ClientOptions.inputRequired`, `maxRound
 
 `Client.listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()` called without a `cursor` now auto-aggregate every page and return the complete result (`nextCursor: undefined`); an explicit `{ cursor }` string still returns one page. Manual `do { … } while (cursor !== undefined)` loops keep working (the first call returns everything and the loop exits after one iteration) — replace them with the bare no-arg call. New `ClientOptions.listMaxPages` (default 64) caps the aggregate walk only; overrun throws `SdkError` (`SdkErrorCode.ListPaginationExceeded`).
 
+`Client.listTools()` / `listPrompts()` / `listResources()` / `listResourceTemplates()` / `readResource()` now honour the server-stamped SEP-2549 `ttlMs`/`cacheScope`: a still-fresh cached entry is returned without a round trip. Opt-in by server hint — a server that sends `ttlMs: 0` (the SDK's default stamp) sees no behaviour change. Per-call override: pass `{ cacheMode: 'refresh' }` (always fetch and re-store) or `{ cacheMode: 'bypass' }` (fetch without touching the cache). Server `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`). Entries are automatically scoped by connected-server identity; new `ClientOptions.cachePartition` (per-principal slot for `'private'`-scoped entries on a shared `responseCacheStore`; default `''`) and `ClientOptions.defaultCacheTtlMs` (TTL when the result lacks one, e.g. legacy-era responses; default `0`). `ResponseCacheStore` gained `delete(key)` (driven by `notifications/resources/updated`); `InMemoryResponseCacheStore` is now bounded (`{ maxEntries }`, default 512).
+
 Output-schema validator compilation is now lazy (first `callTool()` against the cached `tools/list` entry) and non-throwing (an uncompilable `outputSchema` is `console.warn`-ed and validation is skipped for that tool only); `listTools()` no longer throws on an uncompilable `outputSchema`. Applies on every era — the legacy-era `listTools()` path is unchanged at the wire level only.
 
 No code changes required; wire-behavior note: on a 2026-07-28 Streamable HTTP connection, aborting an in-flight client request (caller `signal` / timeout) closes that request's SSE response stream as the spec cancellation signal — `notifications/cancelled` is no longer POSTed

@@ -575,6 +575,25 @@
 
 The auto-aggregate walk is capped at `ClientOptions.listMaxPages` pages (default 64; `0` disables) and throws an `SdkError` with `SdkErrorCode.ListPaginationExceeded` if the server's pagination does not converge, so a partial aggregate is never returned. The cap applies only to the no-`cursor` aggregate path; explicit per-page calls are never capped. The aggregated result is also written to the client's response cache (the source for `callTool`'s output-schema validation and SEP-2243 header mirroring).
 
+### Client honours server cache hints (SEP-2549)
+
+On a 2026-07-28 connection the cacheable verbs — `listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()`, and `readResource()` — now serve a still-fresh held entry without a round trip when the server-stamped `ttlMs` has not elapsed. The behaviour is opt-in **by server hint**: a server that sends `ttlMs: 0` (the conservative default the SDK's `McpServer` stamps unless configured otherwise) sees byte-identical behaviour — every call fetches. A `list_changed` notification still evicts immediately regardless of TTL.
+
+Per-call control via the new `CacheableRequestOptions.cacheMode` (`'use'` is the default):
+
+```typescript
+await client.listTools(); // serve from cache if fresh
+await client.listTools(undefined, { cacheMode: 'refresh' }); // always fetch, then re-store
+await client.listTools(undefined, { cacheMode: 'bypass' }); // fetch; do not read or write the cache
+```
+
+New `ClientOptions`:
+
+- `cachePartition?: string` — the opaque per-principal identifier for `'private'`-scoped entries (the spec's "MUST NOT share across authorization contexts"). Entries are automatically scoped by connected-server identity (derived from `serverInfo`), so one `responseCacheStore` may back several clients without consumer-side encoding; set `cachePartition` to your principal identifier (e.g. the auth subject) when sharing a store across principals. With the default `''` every entry — public or private — lives at the connected server's shared partition (the safe single-tenant posture). Note `serverInfo` is self-reported, so a server that deliberately impersonates another's `name`/`version` shares its `'public'` slot; the per-principal isolation holds regardless.
+- `defaultCacheTtlMs?: number` — applied when a cacheable result lacks `ttlMs` (e.g. a legacy-era response). Default `0` — never serve from cache; the list aggregate is still **stored** so `callTool`'s mirroring/output-validation index keeps working regardless. The server-supplied `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`).
+
+The `ResponseCacheStore` interface gained `delete(key)` (the per-URI invalidation `notifications/resources/updated` drives) — custom stores written against the alpha substrate need to add it. The default `InMemoryResponseCacheStore` is now bounded (default 512 entries, oldest-first eviction; configurable via `{ maxEntries }`).
+
 **Output-schema validator lifecycle (every era):** validator compilation is now lazy — validators are compiled on the first `callTool()` against the cached `tools/list` entry, not eagerly inside `listTools()` — and non-throwing: an uncompilable `outputSchema` is `console.warn`-ed
 and validation is skipped for that tool only. In v1, `listTools()` threw on an uncompilable `outputSchema`; now it succeeds, and a pluggable `jsonSchemaValidator` provider observes compilation at `callTool` time, not `listTools` time. The legacy-era `listTools()` path is
 unchanged at the wire level but is observably different at the validator-lifecycle level.

@@ -1,10 +1,52 @@
 # caching
 
 `CacheableResult` freshness hints (protocol revision 2026-07-28). The server declares hints at two layers — a per-registration `cacheHint` on the resource and server-level `ServerOptions.cacheHints` — and the SDK resolves most-specific-author-first (handler-return fields would
-take precedence over both) and stamps `ttlMs`/`cacheScope` on the wire toward modern clients only. The client reads the stamped values back.
-
-> Full client-side cache **honouring** (re-using a still-fresh result instead of re-requesting) is a follow-up; this example reads what the server emits today.
+take precedence over both) and stamps `ttlMs`/`cacheScope` on the wire toward modern clients only. The client honours the stamped values: a still-fresh held entry is served without a round trip.
 
 ```bash
 pnpm tsx examples/caching/client.ts
 ```
+
+The client calls `listTools()` and `readResource()` twice each; the second of each pair is served from the response cache. The server exposes a `request-count` tool (how many `tools/list` requests reached it) and a `read-count` tool (how many times the resource handler ran), so the example asserts each counter is unchanged after the cache-served call and increments after `cacheMode: 'refresh'`.
+
+## `cacheMode`
+
+Per-call control on the cacheable verbs (`listTools()` / `listPrompts()` / `listResources()` / `listResourceTemplates()` / `readResource()`):
+
+```ts
+await client.readResource({ uri: 'config://app' }); // 'use' (default): serve from cache if fresh
+await client.readResource({ uri: 'config://app' }, { cacheMode: 'refresh' }); // always fetch, then re-store
+await client.readResource({ uri: 'config://app' }, { cacheMode: 'bypass' }); // fetch; do not read or write the cache
+```
+
+A `list_changed` notification still evicts immediately regardless of TTL.
+
+## Custom store
+
+The default per-client `InMemoryResponseCacheStore` (bounded at 512 entries by default) is enough for most hosts. To back the cache with something persistent (Redis, KV, IndexedDB), implement the five-method `ResponseCacheStore` interface — the store is a dumb keyed-value carrier; freshness and partitioning are the client's job:
+
+```ts
+import type { CacheEntry, CacheKey, CacheScope, ResponseCacheStore } from '@modelcontextprotocol/client';
+
+class MyStore implements ResponseCacheStore {
+    async get(key: CacheKey): Promise<CacheEntry | undefined> {
+        /* read {value, stamp, expiresAt, scope} from your backend */
+    }
+    async set(key: CacheKey, entry: { value: unknown; expiresAt?: number; scope?: CacheScope }): Promise<number> {
+        /* write entry under key; return a monotonically-increasing stamp */
+    }
+    async delete(key: CacheKey): Promise<void> {
+        /* drop the single entry under key (no-op if absent) */
+    }
+    async evict(method: string): Promise<void> {
+        /* drop every entry whose key.method === method (across every partition) */
+    }
+    async clear(): Promise<void> {
+        /* drop everything */
+    }
+}
+
+const client = new Client({ name: 'host', version: '1.0.0' }, { responseCacheStore: new MyStore(), cachePartition: principalId });
+```
+
+The SDK scopes every entry by the connected server's identity automatically — you do not encode server identity into `cachePartition` or the store key yourself. When one store backs several principals against the same server, set `ClientOptions.cachePartition` to a stable identity of the authorization context (e.g. the auth subject) so `'private'`-scoped entries are isolated per principal; `'public'`-scoped entries are shared within the connected server's namespace automatically. Note `serverInfo` is self-reported, so a server that deliberately impersonates another's `name`/`version` shares its `'public'` slot; the per-principal isolation holds regardless.
@@ -1,8 +1,7 @@
 /**
  * Reads the cache hints emitted on cacheable results (2026-07-28 connections
- * only) and asserts the configured values reached the wire. Full client-side
- * cache *honouring* (re-using a fresh result instead of re-requesting) is a
- * follow-up — see the SDK's tracking issue for client cache support.
+ * only) and asserts the client honours them: a still-fresh cached entry is
+ * served without a round trip.
  */
 import { check, connectFromArgs, runClient } from '../harness.js';
 
@@ -11,22 +10,60 @@ interface Cacheable {
     cacheScope?: 'public' | 'private';
 }
 
+async function callCount(client: Awaited<ReturnType<typeof connectFromArgs>>, name: 'read-count' | 'request-count'): Promise<number> {
+    const r = await client.callTool({ name });
+    return Number((r.content[0] as { text: string }).text);
+}
+
 runClient('caching', async () => {
     // connectFromArgs picks transport (default: spawn ./server.ts over stdio; --http <url>) and era (--legacy) from argv. Your code would construct a Client and connect over your chosen transport directly.
     const client = await connectFromArgs(import.meta.dirname);
     check.equal(client.getNegotiatedProtocolVersion(), '2026-07-28');
 
+    // The server stamps `tools/list` with `ttlMs: 30_000, cacheScope: 'public'`.
     const tools = (await client.listTools()) as Cacheable & Awaited<ReturnType<typeof client.listTools>>;
     check.equal(tools.ttlMs, 30_000);
     check.equal(tools.cacheScope, 'public');
+    // `request-count` proves the wire was reached exactly once.
+    check.equal(await callCount(client, 'request-count'), 1);
+
+    // The second call is served from the response cache: the server-side
+    // `tools/list` counter is unchanged, and the result is a fresh copy of the
+    // held entry (so mutating it cannot reach the cache).
+    const toolsAgain = await client.listTools();
+    check.deepEqual(
+        toolsAgain.tools.map(t => t.name),
+        tools.tools.map(t => t.name)
+    );
+    check.equal(await callCount(client, 'request-count'), 1);
+
+    // `cacheMode: 'refresh'` always fetches and re-stores: the counter moves.
+    await client.listTools(undefined, { cacheMode: 'refresh' });
+    check.equal(await callCount(client, 'request-count'), 2);
 
     const resources = (await client.listResources()) as Cacheable & Awaited<ReturnType<typeof client.listResources>>;
     check.equal(resources.ttlMs, 5000);
     check.equal(resources.cacheScope, 'public');
 
+    // `readResource`: the resource handler counts how many times it ran, and
+    // the `read-count` tool exposes that counter.
     const read = (await client.readResource({ uri: 'config://app' })) as Cacheable & Awaited<ReturnType<typeof client.readResource>>;
     check.equal(read.ttlMs, 60_000);
     check.equal(read.cacheScope, 'private');
+    check.equal(await callCount(client, 'read-count'), 1);
+
+    // Within TTL, default `cacheMode: 'use'` → served from cache; the server
+    // handler does not run.
+    await client.readResource({ uri: 'config://app' });
+    check.equal(await callCount(client, 'read-count'), 1);
+
+    // `cacheMode: 'refresh'` always fetches and re-stores.
+    await client.readResource({ uri: 'config://app' }, { cacheMode: 'refresh' });
+    check.equal(await callCount(client, 'read-count'), 2);
+
+    // After the refresh the entry is fresh again — back to cache-served.
+    await client.readResource({ uri: 'config://app' });
+    check.equal(await callCount(client, 'read-count'), 2);
 
     await client.close();
 });