Skip to content

Commit 925db19

Browse files
committed
docs: explain client-side sampling callbacks
1 parent ac96f88 commit 925db19

2 files changed

Lines changed: 134 additions & 0 deletions

File tree

docs/sampling.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Sampling
2+
3+
Sampling lets a server ask the client to generate a message with an LLM. The
4+
SDK does not call a model provider by itself. Instead, the client opts in by
5+
registering a `sampling_callback`, and that callback decides which model or
6+
runtime to use.
7+
8+
## Request Flow
9+
10+
1. A server handler calls `ctx.session.create_message(...)`.
11+
2. The SDK sends a `sampling/createMessage` request to the connected client.
12+
3. The client's `sampling_callback` receives a `ClientRequestContext` and
13+
`CreateMessageRequestParams`.
14+
4. The callback calls the model provider or local runtime that the client owns.
15+
5. The callback returns `CreateMessageResult` to the server.
16+
17+
If the client does not register a sampling callback, sampling requests are
18+
answered with the SDK's default "Sampling not supported" error.
19+
20+
## Register a Client Callback
21+
22+
```python
23+
from mcp import ClientSession, types
24+
from mcp.client.context import ClientRequestContext
25+
26+
27+
async def handle_sampling_message(
28+
context: ClientRequestContext,
29+
params: types.CreateMessageRequestParams,
30+
) -> types.CreateMessageResult:
31+
print(f"Sampling request {context.request_id}: {params.messages}")
32+
33+
return types.CreateMessageResult(
34+
role="assistant",
35+
content=types.TextContent(type="text", text="Hello from the client model"),
36+
model="example-model",
37+
stop_reason="endTurn",
38+
)
39+
40+
41+
async def run(read_stream, write_stream):
42+
async with ClientSession(read_stream, write_stream, sampling_callback=handle_sampling_message) as session:
43+
await session.initialize()
44+
```
45+
46+
The callback may return `types.ErrorData` instead of `CreateMessageResult` when
47+
the user rejects a request or the client cannot fulfill it.
48+
49+
## Model Preferences
50+
51+
`params.model_preferences` is advisory. The server can provide model name hints
52+
or priorities for cost, speed, and intelligence, but the client chooses how to
53+
interpret them.
54+
55+
```python
56+
def pick_model(preferences: types.ModelPreferences | None) -> str:
57+
if preferences and preferences.hints:
58+
for hint in preferences.hints:
59+
if hint.name in {"fast-model", "smart-model"}:
60+
return hint.name
61+
62+
if preferences and (preferences.speed_priority or 0) > (preferences.intelligence_priority or 0):
63+
return "fast-model"
64+
65+
return "smart-model"
66+
```
67+
68+
Clients can ignore unsupported hints and should still apply their own policy,
69+
such as user approval, model availability, cost limits, or tenant configuration.
70+
71+
## Context Fields
72+
73+
`ClientRequestContext` is request metadata for the callback. It provides:
74+
75+
- `context.session`: the client session handling the request.
76+
- `context.request_id`: the request id, when one is available.
77+
- `context.meta`: optional request metadata.
78+
79+
It is not prompt context and does not automatically add resources or previous
80+
messages to the LLM request.
81+
82+
`params.include_context` is the server's request for additional context. The SDK
83+
passes the value to the callback, but it does not attach context automatically.
84+
The client implementation decides what context it can safely include.
85+
86+
When using `ClientSession` directly, a client that supports non-`none`
87+
`include_context` values can declare that with `sampling_capabilities`:
88+
89+
```python
90+
session = ClientSession(
91+
read_stream,
92+
write_stream,
93+
sampling_callback=handle_sampling_message,
94+
sampling_capabilities=types.SamplingCapability(context=types.SamplingContextCapability()),
95+
)
96+
```
97+
98+
```python
99+
async def handle_sampling_message(
100+
context: ClientRequestContext,
101+
params: types.CreateMessageRequestParams,
102+
) -> types.CreateMessageResult:
103+
model = pick_model(params.model_preferences)
104+
provider_messages = convert_sampling_messages(params.messages)
105+
106+
if params.system_prompt:
107+
provider_messages.insert(0, {"role": "system", "content": params.system_prompt})
108+
109+
if params.include_context in {"thisServer", "allServers"}:
110+
extra_context = await load_allowed_context(context, params.include_context)
111+
provider_messages.insert(0, {"role": "system", "content": extra_context})
112+
113+
text = await call_your_llm(
114+
model=model,
115+
messages=provider_messages,
116+
max_tokens=params.max_tokens,
117+
temperature=params.temperature,
118+
stop_sequences=params.stop_sequences,
119+
metadata=params.metadata,
120+
)
121+
122+
return types.CreateMessageResult(
123+
role="assistant",
124+
content=types.TextContent(type="text", text=text),
125+
model=model,
126+
stop_reason="endTurn",
127+
)
128+
```
129+
130+
In this example, `convert_sampling_messages`, `load_allowed_context`, and
131+
`call_your_llm` are application-specific helpers. Keeping them outside the SDK
132+
callback makes the example provider-neutral: the same callback shape works with
133+
a hosted model API, a local model runtime, or a test double.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ nav:
1616
- Migration Guide: migration.md
1717
- Documentation:
1818
- Concepts: concepts.md
19+
- Sampling: sampling.md
1920
- Low-Level Server: low-level-server.md
2021
- Authorization: authorization.md
2122
- Testing: testing.md

0 commit comments

Comments
 (0)