Skip to content

Commit 3e577e8

Browse files
authored
DRIVERS-3344 - Add support for server selection's deprioritized servers to all topologies (#1865)
1 parent a8d34be commit 3e577e8

File tree

54 files changed

+2194
-21
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+2194
-21
lines changed

source/retryable-reads/retryable-reads.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -207,8 +207,8 @@ capture this original retryable error. Drivers should then proceed with selectin
207207

208208
###### 3a. Selecting the server for retry
209209

210-
In a sharded cluster, the server on which the operation failed MUST be provided to the server selection mechanism as a
211-
deprioritized server.
210+
The server address on which the operation failed MUST be provided to the server selection mechanism as a member of the
211+
deprioritized server address list.
212212

213213
If the driver cannot select a server for a retry attempt or the newly selected server does not support retryable reads,
214214
retrying is not possible and drivers MUST raise the previous retryable error. In both cases, the caller is able to infer
@@ -284,6 +284,7 @@ function executeRetryableRead(command, session) {
284284
Exception previousError = null;
285285
retrying = false;
286286
Server previousServer = null;
287+
deprioritizedServers = [];
287288
while true {
288289
if (previousError != null) {
289290
retrying = true;
@@ -292,9 +293,9 @@ function executeRetryableRead(command, session) {
292293
if (previousServer == null) {
293294
server = selectServer();
294295
} else {
295-
// If a previous attempt was made, deprioritize the previous server
296+
// If a previous attempt was made, deprioritize the previous server address
296297
// where the command failed.
297-
deprioritizedServers = [ previousServer ];
298+
deprioritizedServers.push(previousServer.address);
298299
server = selectServer(deprioritizedServers);
299300
}
300301
} catch (ServerSelectionException exception) {
@@ -547,6 +548,8 @@ any customers experiencing degraded performance can simply disable `retryableRea
547548
548549
## Changelog
549550
551+
- 2026-12-08: Clarified that server deprioritization during retries must use a list of server addresses.
552+
550553
- 2024-04-30: Migrated from reStructuredText to Markdown.
551554
552555
- 2023-12-05: Add that any server information associated with retryable exceptions MUST reflect the originating server,

source/retryable-writes/retryable-writes.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -317,8 +317,8 @@ Drivers MUST then retry the operation as many times as necessary until any one o
317317

318318
- CSOT is not enabled and one retry was attempted.
319319

320-
For each retry attempt, drivers MUST select a writable server. In a sharded cluster, the server on which the operation
321-
failed MUST be provided to the server selection mechanism as a deprioritized server.
320+
For each retry attempt, drivers MUST select a writable server. The server address on which the operation failed MUST be
321+
provided to the server selection mechanism as a member of the deprioritized server address list.
322322

323323
If the driver cannot select a server for a retry attempt or the selected server does not support retryable writes,
324324
retrying is not possible and drivers MUST raise the retryable error from the previous attempt. In both cases, the caller
@@ -377,6 +377,7 @@ function executeRetryableWrite(command, session) {
377377

378378
Exception previousError = null;
379379
retrying = false;
380+
deprioritizedServers = [];
380381
while true {
381382
try {
382383
return executeCommand(server, retryableCommand);
@@ -418,13 +419,13 @@ function executeRetryableWrite(command, session) {
418419
}
419420

420421
/*
421-
* We try to select server that is not the one that failed by passing the
422-
* failed server as a deprioritized server.
422+
* We try to select a server that has not already failed by adding the
423+
* failed server to the list of deprioritized servers passed to selectServer.
423424
* If we cannot select a writable server, do not proceed with retrying and
424425
* throw the previous error. The caller can then infer that an attempt was
425426
* made and failed. */
426427
try {
427-
deprioritizedServers = [ server ];
428+
deprioritizedServers.push(server.address);
428429
server = selectServer("writable", deprioritizedServers);
429430
} catch (Exception ignoredError) {
430431
throw previousError;
@@ -680,6 +681,8 @@ retryWrites is not true would be inconsistent with the server and potentially co
680681

681682
## Changelog
682683

684+
- 2026-12-08: Clarified that server deprioritization during retries must use a list of server addresses.
685+
683686
- 2024-05-08: Add guidance for client-level `bulkWrite()` retryability.
684687

685688
- 2024-05-02: Migrated from reStructuredText to Markdown.

source/server-selection/server-selection-tests.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,15 +40,19 @@ The following test cases can be found in YAML form in the "tests" directory. Eac
4040
representing a set of servers, a ReadPreference document, and sets of servers returned at various stages of the server
4141
selection process. These sets are described below. Note that it is not required to test for correctness at every step.
4242

43-
| Test Case | Description |
44-
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
45-
| `suitable_servers` | the set of servers matching all server selection logic. |
46-
| `in_latency_window` | the subset of `suitable_servers` that falls within the allowable latency window (required). NOTE: tests use the default localThresholdMS of 15 ms. |
43+
| Test Case | Description |
44+
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
45+
| `suitable_servers` | the set of servers matching all server selection logic. |
46+
| `in_latency_window` | the subset of `suitable_servers` that falls within the allowable latency window (required). NOTE: tests use the default localThresholdMS of 15 ms. |
47+
| `deprioritized_servers` | the set of servers that are deprioritized and must only be selected if no other suitable server exists. |
4748

4849
Drivers implementing server selection MUST test that their implementations correctly return **one** of the servers in
4950
`in_latency_window`. Drivers SHOULD test against the full set of servers in `in_latency_window` and against
5051
`suitable_servers` if possible.
5152

53+
For tests containing `deprioritized_servers`, drivers MUST pass the given list of deprioritized servers to each server
54+
selection call.
55+
5256
### Topology Type Single
5357

5458
- The single server is always selected.

source/server-selection/server-selection.md

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -708,9 +708,10 @@ For multi-threaded clients, the server selection algorithm is as follows:
708708
["Server selection started" message](#server-selection-started-message).
709709
2. If the topology wire version is invalid, raise an error and log a
710710
["Server selection failed" message](#server-selection-failed-message).
711-
3. Find suitable servers by topology type and operation type. If a list of deprioritized servers is provided, and the
712-
topology is a sharded cluster, these servers should be selected only if there are no other suitable servers. The
713-
server selection algorithm MUST ignore the deprioritized servers if the topology is not a sharded cluster.
711+
3. Find suitable servers as follows:
712+
- Filter out any deprioritized server addresses.
713+
- Find suitable servers from the filtered list by topology type and operation type.
714+
- If there are no suitable servers, perform the previous step again without filtering out deprioritized servers.
714715
4. Filter the suitable servers by calling the optional, application-provided server selector.
715716
5. If there are any suitable servers, filter them according to
716717
[Filtering suitable servers based on the latency window](#filtering-suitable-servers-based-on-the-latency-window)
@@ -756,9 +757,10 @@ Therefore, for single-threaded clients, the server selection algorithm is as fol
756757
longer stale)
757758
5. If the topology wire version is invalid, raise an error and log a
758759
["Server selection failed" message](#server-selection-failed-message).
759-
6. Find suitable servers by topology type and operation type. If a list of deprioritized servers is provided, and the
760-
topology is a sharded cluster, these servers should be selected only if there are no other suitable servers. The
761-
server selection algorithm MUST ignore the deprioritized servers if the topology is not a sharded cluster.
760+
6. Find suitable servers as follows:
761+
- Filter out any deprioritized server addresses.
762+
- Find suitable servers from the filtered list by topology type and operation type.
763+
- If there are no suitable servers, perform the previous step again without filtering out deprioritized servers.
762764
7. Filter the suitable servers by calling the optional, application-provided server selector.
763765
8. If there are any suitable servers, filter them according to
764766
[Filtering suitable servers based on the latency window](#filtering-suitable-servers-based-on-the-latency-window)
@@ -846,10 +848,12 @@ details on each step, and
846848
[why is maxStalenessSeconds applied before tag_sets?](#why-is-maxstalenessseconds-applied-before-tag_sets).)
847849

848850
If `mode` is 'secondaryPreferred', attempt the selection algorithm with `mode` 'secondary' and the user's
849-
`maxStalenessSeconds` and `tag_sets`. If no server matches, select the primary.
851+
`maxStalenessSeconds` and `tag_sets`. If no server matches, select the primary. Note that if all secondaries are
852+
deprioritized, the primary MUST be selected if it is available.
850853
851854
If `mode` is 'primaryPreferred', select the primary if it is known, otherwise attempt the selection algorithm with
852-
`mode` 'secondary' and the user's `maxStalenessSeconds` and `tag_sets`.
855+
`mode` 'secondary' and the user's `maxStalenessSeconds` and `tag_sets`. Note that if the primary is deprioritized, a
856+
secondary MUST be selected if one is available.
853857

854858
For all read preferences modes except 'primary', clients MUST set the `SecondaryOk` wire protocol flag (OP_QUERY) or
855859
`$readPreference` global command argument (OP_MSG) to ensure that any suitable server can handle the request. If the
@@ -1605,6 +1609,16 @@ filter it out because it is too stale, and be left with no eligible servers.
16051609
The user's intent in specifying two tag sets was to fall back to the second set if needed, so we filter by
16061610
maxStalenessSeconds first, then tag_sets, and select Node 2.
16071611

1612+
### Why does server deprioritization use only server addresses and not ServerDescription objects?
1613+
1614+
A server's address is the minimum identifying attribute that stays constant for across topology changes. Drivers create
1615+
new ServerDescription objects on each topology change, and since ServerDescription objects check multiple attributes to
1616+
determine equality comparisons, a deprioritized server could become non-equal to itself after a change and therefore
1617+
incorrectly be considered suitable for a retry operation.
1618+
1619+
By using addresses, we ensure that once a server is marked as deprioritized by an operation, it cannot be used again for
1620+
a retry on that operation unless there are no other suitable servers.
1621+
16081622
## References
16091623
16101624
- [Server Discovery and Monitoring](../server-discovery-and-monitoring/server-discovery-and-monitoring.md) specification
@@ -1614,6 +1628,9 @@ maxStalenessSeconds first, then tag_sets, and select Node 2.
16141628
16151629
## Changelog
16161630
1631+
- 2025-12-08: Require server deprioritization for all topology types and clarify the order of server candidate
1632+
filtering.
1633+
16171634
- 2015-06-26: Updated single-threaded selection logic with "stale" and serverSelectionTryOnce.
16181635
16191636
- 2015-08-10: Updated single-threaded selection logic to ensure a scan always happens at least once under

source/server-selection/tests/server_selection/ReplicaSetNoPrimary/read/DeprioritizedNearest.json

Lines changed: 62 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
topology_description:
2+
type: ReplicaSetNoPrimary
3+
servers:
4+
- &1
5+
address: b:27017
6+
avg_rtt_ms: 5
7+
type: RSSecondary
8+
tags:
9+
data_center: nyc
10+
- &2
11+
address: c:27017
12+
avg_rtt_ms: 100
13+
type: RSSecondary
14+
tags:
15+
data_center: nyc
16+
operation: read
17+
read_preference:
18+
mode: Nearest
19+
tag_sets:
20+
- data_center: nyc
21+
deprioritized_servers:
22+
- *1
23+
suitable_servers:
24+
- *2
25+
in_latency_window:
26+
- *2

source/server-selection/tests/server_selection/ReplicaSetNoPrimary/read/DeprioritizedPrimary.json

Lines changed: 39 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
topology_description:
2+
type: ReplicaSetNoPrimary
3+
servers:
4+
- &1
5+
address: b:27017
6+
avg_rtt_ms: 5
7+
type: RSSecondary
8+
tags:
9+
data_center: nyc
10+
- &2
11+
address: c:27017
12+
avg_rtt_ms: 100
13+
type: RSSecondary
14+
tags:
15+
data_center: nyc
16+
operation: read
17+
read_preference:
18+
mode: Primary
19+
deprioritized_servers:
20+
- *1
21+
suitable_servers: []
22+
in_latency_window: []

source/server-selection/tests/server_selection/ReplicaSetNoPrimary/read/DeprioritizedPrimaryPreferred.json

Lines changed: 62 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)