Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/client-honor-cache-hints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
'@modelcontextprotocol/client': minor
---

`Client` now **honours** the server-stamped SEP-2549 `ttlMs`/`cacheScope` cache hints on the cacheable verbs (`listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()`, `readResource()`): a still-fresh held entry is served without a round trip. New `CacheableRequestOptions.cacheMode` (`'use'` — the default; `'refresh'` — always fetch and re-store; `'bypass'` — fetch without consulting or writing the cache) gives per-call control. The behaviour is opt-in by hint: a server that sends `ttlMs: 0` (the conservative default this SDK's server stamps) sees byte-identical behaviour — every call fetches.

Entries are automatically scoped by connected-server identity (derived from `serverInfo` after connect, encoded collision-free via `JSON.stringify`); `ClientOptions.cachePartition` is the opaque per-principal slot for `'private'`-scoped entries — set it to your principal identifier (e.g. the auth subject) when one `responseCacheStore` backs several principals. With the default `''` every entry lives at the connected server's shared partition (the safe single-tenant posture). `ClientOptions.defaultCacheTtlMs` (default `0`) supplies the TTL when a result lacks one (e.g. a legacy-era response); the server-supplied `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`). The list verbs always store the aggregate (so `callTool`'s mirroring/output-validation index keeps working at any TTL); `readResource` stores only when the resolved TTL is positive. `notifications/resources/updated` evicts the cached `resources/read` body for that URI. `ResponseCacheStore` gained `delete(key)`; `InMemoryResponseCacheStore` is now bounded (`{ maxEntries }`, default 512, oldest-first eviction). New exports: `CacheMode`, `CacheableRequestOptions`, `InMemoryResponseCacheStoreOptions`, `MAX_CACHE_TTL_MS`.
Comment thread
claude[bot] marked this conversation as resolved.
2 changes: 1 addition & 1 deletion .changeset/client-response-cache-substrate.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
'@modelcontextprotocol/client': major
---

`Client.listTools()` / `listPrompts()` / `listResources()` / `listResourceTemplates()` now **auto-aggregate every page** when called without a `cursor` and return the complete result with `nextCursor: undefined` (matching the C#, Java, and mcp.d SDKs). Pass an explicit `{ cursor }` string to fetch a single page; the per-page path is unchanged. Existing manual pagination loops keep working — the first iteration returns everything and the loop exits — but can be deleted. The aggregated result is written to the new pluggable `ResponseCacheStore` (default: a fresh per-instance `InMemoryResponseCacheStore`); a `ClientResponseCache` collaborator owns the eviction-generation guard and the derived `tools/list` index that `callTool`'s output validation and SEP-2243 `Mcp-Param-*` mirroring read. New exports: `ResponseCacheStore`, `CacheKey`, `CacheEntry`, `CacheScope`, `MaybePromise`, `InMemoryResponseCacheStore`; new `ClientOptions.responseCacheStore` / `ClientOptions.listMaxPages` (caps the auto-aggregate walk at 64 pages by default; throws `SdkError` with `SdkErrorCode.ListPaginationExceeded` on overrun so a partial aggregate is never cached). The store interface is async-ready (`MaybePromise<…>`); the in-memory default stays synchronous. **A store instance must not be shared across `Client` instances at all in v2.0.x** — entries are keyed by method only (server-identity confusion + `clear()`/`evict()` cross-talk); per-principal partitioning that enables safe sharing arrives with the full caching engine.
`Client.listTools()` / `listPrompts()` / `listResources()` / `listResourceTemplates()` now **auto-aggregate every page** when called without a `cursor` and return the complete result with `nextCursor: undefined` (matching the C#, Java, and mcp.d SDKs). Pass an explicit `{ cursor }` string to fetch a single page; the per-page path is unchanged. Existing manual pagination loops keep working — the first iteration returns everything and the loop exits — but can be deleted. The aggregated result is written to the new pluggable `ResponseCacheStore` (default: a fresh per-instance `InMemoryResponseCacheStore`); a `ClientResponseCache` collaborator owns the eviction-generation guard and the derived `tools/list` index that `callTool`'s output validation and SEP-2243 `Mcp-Param-*` mirroring read. New exports: `ResponseCacheStore`, `CacheKey`, `CacheEntry`, `CacheScope`, `MaybePromise`, `InMemoryResponseCacheStore`; new `ClientOptions.responseCacheStore` / `ClientOptions.listMaxPages` (caps the auto-aggregate walk at 64 pages by default; throws `SdkError` with `SdkErrorCode.ListPaginationExceeded` on overrun so a partial aggregate is never cached). The store interface is async-ready (`MaybePromise<…>`); the in-memory default stays synchronous. Entries are automatically scoped by the connected server's identity and (when set) the consumer-supplied `cachePartition`, so a shared store does not collide across servers or principals; evictions are likewise scoped to the connected server's partitions.

**Behavior change (every era):** output-schema validator compilation is now lazy — validators are compiled on the first `callTool()` against the cached `tools/list` entry, not eagerly inside `listTools()` — and non-throwing: an uncompilable `outputSchema` is `console.warn`-ed and validation is skipped for that tool only (previously `listTools()` threw). A pluggable `jsonSchemaValidator` provider therefore observes compilation at `callTool` time, not `listTools` time. The legacy-era `listTools()` path is unchanged at the wire level but is observably different at the validator-lifecycle level.
5 changes: 5 additions & 0 deletions .changeset/protocol-pre-aborted-signal-wrap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@modelcontextprotocol/core': patch
---

`Protocol.request()` now rejects with `SdkError(RequestTimeout, reason)` when called with an already-aborted signal, matching in-flight aborts. Previously the raw `signal.reason` was thrown.
2 changes: 2 additions & 0 deletions docs/migration-SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -572,6 +572,8 @@ side: auto-fulfilment is on by default (`ClientOptions.inputRequired`, `maxRound

`Client.listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()` called without a `cursor` now auto-aggregate every page and return the complete result (`nextCursor: undefined`); an explicit `{ cursor }` string still returns one page. Manual `do { … } while (cursor !== undefined)` loops keep working (the first call returns everything and the loop exits after one iteration) — replace them with the bare no-arg call. New `ClientOptions.listMaxPages` (default 64) caps the aggregate walk only; overrun throws `SdkError` (`SdkErrorCode.ListPaginationExceeded`).

`Client.listTools()` / `listPrompts()` / `listResources()` / `listResourceTemplates()` / `readResource()` now honour the server-stamped SEP-2549 `ttlMs`/`cacheScope`: a still-fresh cached entry is returned without a round trip. Opt-in by server hint — a server that sends `ttlMs: 0` (the SDK's default stamp) sees no behaviour change. Per-call override: pass `{ cacheMode: 'refresh' }` (always fetch and re-store) or `{ cacheMode: 'bypass' }` (fetch without touching the cache). Server `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`). Entries are automatically scoped by connected-server identity; new `ClientOptions.cachePartition` (per-principal slot for `'private'`-scoped entries on a shared `responseCacheStore`; default `''`) and `ClientOptions.defaultCacheTtlMs` (TTL when the result lacks one, e.g. legacy-era responses; default `0`). `ResponseCacheStore` gained `delete(key)` (driven by `notifications/resources/updated`); `InMemoryResponseCacheStore` is now bounded (`{ maxEntries }`, default 512).

Output-schema validator compilation is now lazy (first `callTool()` against the cached `tools/list` entry) and non-throwing (an uncompilable `outputSchema` is `console.warn`-ed and validation is skipped for that tool only); `listTools()` no longer throws on an uncompilable `outputSchema`. Applies on every era — the legacy-era `listTools()` path is unchanged at the wire level only.

No code changes required; wire-behavior note: on a 2026-07-28 Streamable HTTP connection, aborting an in-flight client request (caller `signal` / timeout) closes that request's SSE response stream as the spec cancellation signal — `notifications/cancelled` is no longer POSTed
Expand Down
19 changes: 19 additions & 0 deletions docs/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -575,6 +575,25 @@

The auto-aggregate walk is capped at `ClientOptions.listMaxPages` pages (default 64; `0` disables) and throws an `SdkError` with `SdkErrorCode.ListPaginationExceeded` if the server's pagination does not converge, so a partial aggregate is never returned. The cap applies only to the no-`cursor` aggregate path; explicit per-page calls are never capped. The aggregated result is also written to the client's response cache (the source for `callTool`'s output-schema validation and SEP-2243 header mirroring).

### Client honours server cache hints (SEP-2549)

On a 2026-07-28 connection the cacheable verbs — `listTools()`, `listPrompts()`, `listResources()`, `listResourceTemplates()`, and `readResource()` — now serve a still-fresh held entry without a round trip when the server-stamped `ttlMs` has not elapsed. The behaviour is opt-in **by server hint**: a server that sends `ttlMs: 0` (the conservative default the SDK's `McpServer` stamps unless configured otherwise) sees byte-identical behaviour — every call fetches. A `list_changed` notification still evicts immediately regardless of TTL.

Per-call control via the new `CacheableRequestOptions.cacheMode` (`'use'` is the default):

```typescript
await client.listTools(); // serve from cache if fresh
await client.listTools(undefined, { cacheMode: 'refresh' }); // always fetch, then re-store
await client.listTools(undefined, { cacheMode: 'bypass' }); // fetch; do not read or write the cache
```

New `ClientOptions`:

- `cachePartition?: string` — the opaque per-principal identifier for `'private'`-scoped entries (the spec's "MUST NOT share across authorization contexts"). Entries are automatically scoped by connected-server identity (derived from `serverInfo`), so one `responseCacheStore` may back several clients without consumer-side encoding; set `cachePartition` to your principal identifier (e.g. the auth subject) when sharing a store across principals. With the default `''` every entry — public or private — lives at the connected server's shared partition (the safe single-tenant posture). Note `serverInfo` is self-reported, so a server that deliberately impersonates another's `name`/`version` shares its `'public'` slot; the per-principal isolation holds regardless.
- `defaultCacheTtlMs?: number` — applied when a cacheable result lacks `ttlMs` (e.g. a legacy-era response). Default `0` — never serve from cache; the list aggregate is still **stored** so `callTool`'s mirroring/output-validation index keeps working regardless. The server-supplied `ttlMs` is clamped at 24 h (`MAX_CACHE_TTL_MS`).

The `ResponseCacheStore` interface gained `delete(key)` (the per-URI invalidation `notifications/resources/updated` drives) — custom stores written against the alpha substrate need to add it. The default `InMemoryResponseCacheStore` is now bounded (default 512 entries, oldest-first eviction; configurable via `{ maxEntries }`).

Check warning on line 595 in docs/migration.md

View check run for this annotation

Claude / Claude Code Review

New cache-honouring feature not documented in docs/client.md (only migration.md and the example README)

The new SEP-2549 cache-honouring behaviour (cache-served listTools()/readResource(), the per-call cacheMode option, and the new ClientOptions cachePartition/defaultCacheTtlMs) is documented here in migration.md and in examples/caching/README.md, but the canonical client feature guide docs/client.md was not updated — its Tools and Resources sections still describe these verbs as always reaching the server, with no mention of cache-serving or how to force a fetch. Consider adding a short 'Response
Comment on lines +578 to +595

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new SEP-2549 cache-honouring behaviour (cache-served listTools()/readResource(), the per-call cacheMode option, and the new ClientOptions cachePartition/defaultCacheTtlMs) is documented here in migration.md and in examples/caching/README.md, but the canonical client feature guide docs/client.md was not updated — its Tools and Resources sections still describe these verbs as always reaching the server, with no mention of cache-serving or how to force a fetch. Consider adding a short 'Response caching (SEP-2549)' subsection (or a sentence in the Tools/Resources sections) of docs/client.md covering the cache-serving behaviour, cacheMode, and cachePartition/defaultCacheTtlMs.

Extended reasoning...

The gap. This PR introduces user-visible client behaviour: listTools(), listPrompts(), listResources(), listResourceTemplates(), and readResource() may now be served from the response cache without a round trip when the server-stamped ttlMs has not elapsed, plus the new per-call cacheMode option ('use' | 'refresh' | 'bypass'), the new ClientOptions.cachePartition / defaultCacheTtlMs semantics, and new public exports (CacheMode, CacheableRequestOptions, InMemoryResponseCacheStoreOptions, MAX_CACHE_TTL_MS). Prose was added to docs/migration.md (this hunk), docs/migration-SKILL.md, and examples/caching/README.md — but docs/client.md, the canonical client feature reference, is not touched by the PR.

What docs/client.md says today. A grep of docs/client.md for cacheMode / cacheHint / ttlMs / cachePartition / defaultCacheTtlMs / responseCacheStore returns nothing related to this feature; the only cache-related prose is the SEP-2243 'internal tools/list cache' paragraph and the listChanged local-cache option. Its Tools section (~line 255) describes listTools() as 'walks every page on your behalf' and the Resources section (~lines 318–332) describes listResources()/readResource() purely as discovering and reading server-provided data — the readResource example there even uses the same config://app URI the caching example now cache-serves. Nothing tells a reader of the feature guide that, after this PR, a second call within the server's ttlMs may never reach the server, or that cacheMode: 'refresh' / 'bypass' exists to force a fetch.

Why this matters. The migration guide targets upgraders and the example README targets the example; a user consulting the feature reference for listTools()/readResource() (e.g. while debugging why a request never hit their server) will not learn that calls can be cache-served, nor how to opt out per call. The repo's review checklist asks for prose documentation of new features and for updating docs that describe the pre-change behaviour.

Why this is an inconsistency, not a different convention. The repo's own precedent is that comparable client-side behaviour changes in this stack got prose in docs/client.md: the predecessor PR's auto-aggregation behaviour is the source of the 'walks every page on your behalf' wording there, and the SEP-2243 mirroring feature has its own subsection. SEP-2549 cache honouring is the same kind of user-visible Client behaviour change and is the only one of the set absent from client.md.

Concrete walkthrough of the reader-facing gap. (1) A developer's host calls client.readResource({ uri: 'config://app' }) against a server stamping ttlMs: 60_000; they then change the resource server-side and call readResource again within 60 s. (2) The second call returns the old body with no wire request — by design. (3) They open docs/client.md → Resources to understand why; the section describes readResource() as reading server data with no mention of the response cache, ttlMs, or cacheMode, so the behaviour looks like a bug rather than a documented feature with a documented escape hatch ({ cacheMode: 'refresh' }).

Suggested fix. Add a short 'Response caching (SEP-2549)' subsection to docs/client.md (or a sentence each in the Tools and Resources sections) stating that the cacheable verbs serve a still-fresh entry without a round trip when the server stamps a positive ttlMs, that cacheMode: 'refresh' | 'bypass' forces a fetch, and pointing at ClientOptions.cachePartition / defaultCacheTtlMs / responseCacheStore for shared-store setups — largely a condensed copy of the migration.md section added in this PR. Filed as a nit: the feature is documented (migration guide, example README, JSDoc), just not in the feature reference users of these verbs actually consult.


**Output-schema validator lifecycle (every era):** validator compilation is now lazy — validators are compiled on the first `callTool()` against the cached `tools/list` entry, not eagerly inside `listTools()` — and non-throwing: an uncompilable `outputSchema` is `console.warn`-ed
and validation is skipped for that tool only. In v1, `listTools()` threw on an uncompilable `outputSchema`; now it succeeds, and a pluggable `jsonSchemaValidator` provider observes compilation at `callTool` time, not `listTools` time. The legacy-era `listTools()` path is
unchanged at the wire level but is observably different at the validator-lifecycle level.
Expand Down
48 changes: 45 additions & 3 deletions examples/caching/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,52 @@
# caching

`CacheableResult` freshness hints (protocol revision 2026-07-28). The server declares hints at two layers — a per-registration `cacheHint` on the resource and server-level `ServerOptions.cacheHints` — and the SDK resolves most-specific-author-first (handler-return fields would
take precedence over both) and stamps `ttlMs`/`cacheScope` on the wire toward modern clients only. The client reads the stamped values back.

> Full client-side cache **honouring** (re-using a still-fresh result instead of re-requesting) is a follow-up; this example reads what the server emits today.
take precedence over both) and stamps `ttlMs`/`cacheScope` on the wire toward modern clients only. The client honours the stamped values: a still-fresh held entry is served without a round trip.

```bash
pnpm tsx examples/caching/client.ts
```

The client calls `listTools()` and `readResource()` twice each; the second of each pair is served from the response cache. The server exposes a `request-count` tool (how many `tools/list` requests reached it) and a `read-count` tool (how many times the resource handler ran), so the example asserts each counter is unchanged after the cache-served call and increments after `cacheMode: 'refresh'`.

## `cacheMode`

Per-call control on the cacheable verbs (`listTools()` / `listPrompts()` / `listResources()` / `listResourceTemplates()` / `readResource()`):

```ts
await client.readResource({ uri: 'config://app' }); // 'use' (default): serve from cache if fresh
await client.readResource({ uri: 'config://app' }, { cacheMode: 'refresh' }); // always fetch, then re-store
await client.readResource({ uri: 'config://app' }, { cacheMode: 'bypass' }); // fetch; do not read or write the cache
```

A `list_changed` notification still evicts immediately regardless of TTL.

## Custom store

The default per-client `InMemoryResponseCacheStore` (bounded at 512 entries by default) is enough for most hosts. To back the cache with something persistent (Redis, KV, IndexedDB), implement the five-method `ResponseCacheStore` interface — the store is a dumb keyed-value carrier; freshness and partitioning are the client's job:

```ts
import type { CacheEntry, CacheKey, CacheScope, ResponseCacheStore } from '@modelcontextprotocol/client';

class MyStore implements ResponseCacheStore {
async get(key: CacheKey): Promise<CacheEntry | undefined> {
/* read {value, stamp, expiresAt, scope} from your backend */
}
async set(key: CacheKey, entry: { value: unknown; expiresAt?: number; scope?: CacheScope }): Promise<number> {
/* write entry under key; return a monotonically-increasing stamp */
}
async delete(key: CacheKey): Promise<void> {
/* drop the single entry under key (no-op if absent) */
}
async evict(method: string): Promise<void> {
/* drop every entry whose key.method === method (across every partition) */
}
async clear(): Promise<void> {
/* drop everything */
}
}

const client = new Client({ name: 'host', version: '1.0.0' }, { responseCacheStore: new MyStore(), cachePartition: principalId });
```

The SDK scopes every entry by the connected server's identity automatically — you do not encode server identity into `cachePartition` or the store key yourself. When one store backs several principals against the same server, set `ClientOptions.cachePartition` to a stable identity of the authorization context (e.g. the auth subject) so `'private'`-scoped entries are isolated per principal; `'public'`-scoped entries are shared within the connected server's namespace automatically. Note `serverInfo` is self-reported, so a server that deliberately impersonates another's `name`/`version` shares its `'public'` slot; the per-principal isolation holds regardless.
43 changes: 40 additions & 3 deletions examples/caching/client.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
/**
* Reads the cache hints emitted on cacheable results (2026-07-28 connections
* only) and asserts the configured values reached the wire. Full client-side
* cache *honouring* (re-using a fresh result instead of re-requesting) is a
* follow-up — see the SDK's tracking issue for client cache support.
* only) and asserts the client honours them: a still-fresh cached entry is
* served without a round trip.
*/
import { check, connectFromArgs, runClient } from '../harness.js';

Expand All @@ -11,22 +10,60 @@ interface Cacheable {
cacheScope?: 'public' | 'private';
}

async function callCount(client: Awaited<ReturnType<typeof connectFromArgs>>, name: 'read-count' | 'request-count'): Promise<number> {
const r = await client.callTool({ name });
return Number((r.content[0] as { text: string }).text);
}

runClient('caching', async () => {
// connectFromArgs picks transport (default: spawn ./server.ts over stdio; --http <url>) and era (--legacy) from argv. Your code would construct a Client and connect over your chosen transport directly.
const client = await connectFromArgs(import.meta.dirname);
check.equal(client.getNegotiatedProtocolVersion(), '2026-07-28');

// The server stamps `tools/list` with `ttlMs: 30_000, cacheScope: 'public'`.
const tools = (await client.listTools()) as Cacheable & Awaited<ReturnType<typeof client.listTools>>;
check.equal(tools.ttlMs, 30_000);
check.equal(tools.cacheScope, 'public');
// `request-count` proves the wire was reached exactly once.
check.equal(await callCount(client, 'request-count'), 1);

// The second call is served from the response cache: the server-side
// `tools/list` counter is unchanged, and the result is a fresh copy of the
// held entry (so mutating it cannot reach the cache).
const toolsAgain = await client.listTools();
check.deepEqual(
toolsAgain.tools.map(t => t.name),
tools.tools.map(t => t.name)
);
check.equal(await callCount(client, 'request-count'), 1);

// `cacheMode: 'refresh'` always fetches and re-stores: the counter moves.
await client.listTools(undefined, { cacheMode: 'refresh' });
check.equal(await callCount(client, 'request-count'), 2);

const resources = (await client.listResources()) as Cacheable & Awaited<ReturnType<typeof client.listResources>>;
check.equal(resources.ttlMs, 5000);
check.equal(resources.cacheScope, 'public');

// `readResource`: the resource handler counts how many times it ran, and
// the `read-count` tool exposes that counter.
const read = (await client.readResource({ uri: 'config://app' })) as Cacheable & Awaited<ReturnType<typeof client.readResource>>;
check.equal(read.ttlMs, 60_000);
check.equal(read.cacheScope, 'private');
check.equal(await callCount(client, 'read-count'), 1);

// Within TTL, default `cacheMode: 'use'` → served from cache; the server
// handler does not run.
await client.readResource({ uri: 'config://app' });
check.equal(await callCount(client, 'read-count'), 1);

// `cacheMode: 'refresh'` always fetches and re-stores.
await client.readResource({ uri: 'config://app' }, { cacheMode: 'refresh' });
check.equal(await callCount(client, 'read-count'), 2);

// After the refresh the entry is fresh again — back to cache-served.
await client.readResource({ uri: 'config://app' });
check.equal(await callCount(client, 'read-count'), 2);

await client.close();
});
Loading
Loading