Skip to content

Conversation

@connorsmacd
Copy link
Collaborator

@connorsmacd connorsmacd commented Dec 4, 2025

Reference

CDRIVER-5815
CDRIVER-6177

Summary

This PR makes it possible to disable socket timeouts via socketTimeoutMS.

According to the spec, socketTimeoutMS=0 shall be used to disable socket timeouts. However, the C Driver does not comply with this. Since URI option values of 0 are interpreted as unset (see CDRIVER-3730), the default timeout value is used instead. Normally, this would be okay since the spec dictates that the default timeout should be "no timeout", but the C Driver is yet again not spec-compliant as it uses a default of 5 minutes.

To avoid a breaking change, this PR introduces an alternate string option "inf" that can be used to disable the socket timeout. This is a stopgap measure that should be reevaluated for the next major release.

The C Driver also treated timeout values of 0 as "immediate timeouts" just like the POSIX system call poll. Since this is not aligned with the spec, the C Driver was changed to internally treat timeout values of 0 as infinite timeouts. However, there are some parts of the C Driver that were intentionally using immediate timeouts for non-blocking behavior, so a new sentinel MONGOC_SOCKET_TIMEOUT_IMMEDIATE is introduced, which currently is set to the value of INT32_MIN.

Caution

This makes specifying special timeout values very confusing since there are different interpretations depending on the context and representation:

URI Spec socketTimeoutMS C Driver URI socketTimeoutMS C Driver Internal INT32 Representation POSIX INT32 Representation (e.g., the `timeout` arg for `poll`)
Immediate timeout cannot specify cannot specify -2147483648 (INT32_MIN) 0
Infinite/disabled timeout 0 "inf" 0 normally, but technically [-2147483647, 0] due to CDRIVER-4781 <0

@connorsmacd connorsmacd changed the title CDRIVER-5815 treat socketTimeoutMS=0 as infinite timeout CDRIVER-5815 support disabling socket timeouts Dec 17, 2025
@connorsmacd connorsmacd marked this pull request as ready for review December 17, 2025 20:33
@connorsmacd connorsmacd requested a review from a team as a code owner December 17, 2025 20:33
@kevinAlbs kevinAlbs requested a review from eramongodb December 17, 2025 20:35
Copy link
Contributor

@eramongodb eramongodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am excited to see the further reduction of explicit timeout arithmetic throughout the codebase with the new _mongoc_stream_tls_timer_* helpers.

Just noting an observation:

int64_t timeout_ms = 0; // -> inf
timeout_ms = _mongoc_stream_tls_timer_to_timeout_msec(_mongoc_stream_tls_timer_from_timeout_msec(timeout_ms));
ASSERT_CMPINT32(timeout_ms, ==, MONGOC_SOCKET_TIMEOUT_INFINITE);

@connorsmacd
Copy link
Collaborator Author

connorsmacd commented Dec 18, 2025

I am excited to see the further reduction of explicit timeout arithmetic throughout the codebase with the new _mongoc_stream_tls_timer_* helpers.

Just noting an observation:

int64_t timeout_ms = 0; // -> inf
timeout_ms = _mongoc_stream_tls_timer_to_timeout_msec(_mongoc_stream_tls_timer_from_timeout_msec(timeout_ms));
ASSERT_CMPINT32(timeout_ms, ==, MONGOC_SOCKET_TIMEOUT_INFINITE);

I actually really dislike the fact that a timeout value of zero implies infinity, but I still thought it would be best to be consistent with the spec in this regard. As a result, there were sneaky bugs introduced by arithmetic such as this:

tls->timeout_msec = (expire - now) / 1000;

In the rare case where expire - now were sub-millisecond (i.e., <1000), this would truncate to zero, poisoning the TLS flow with infinite timeouts. Manual unit manipulation is already error prone as it is, but this quirk makes it even more crucial to avoid it.

Copy link
Contributor

@eramongodb eramongodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions remaining; otherwise, LGTM.

Comment on lines +2615 to +2621
const char *const str_maybe = mongoc_uri_get_option_as_utf8(uri, MONGOC_URI_SOCKETTIMEOUTMS, NULL);

if (str_maybe && strcasecmp(str_maybe, "inf") == 0) {
// CDRIVER-6177: To avoid a breaking change, use `socketTimeoutMS=inf` to specify an infinite timeout instead of
// `socketTimeoutMS=0`.
return MONGOC_SOCKET_TIMEOUT_INFINITE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const char *const str_maybe = mongoc_uri_get_option_as_utf8(uri, MONGOC_URI_SOCKETTIMEOUTMS, NULL);
if (str_maybe && strcasecmp(str_maybe, "inf") == 0) {
// CDRIVER-6177: To avoid a breaking change, use `socketTimeoutMS=inf` to specify an infinite timeout instead of
// `socketTimeoutMS=0`.
return MONGOC_SOCKET_TIMEOUT_INFINITE;
}
{
const char *const str_maybe = mongoc_uri_get_option_as_utf8(uri, MONGOC_URI_SOCKETTIMEOUTMS, NULL);
if (str_maybe && strcasecmp(str_maybe, "inf") == 0) {
// CDRIVER-6177: To avoid a breaking change, use `socketTimeoutMS=inf` to specify an infinite timeout instead
// of `socketTimeoutMS=0`.
return MONGOC_SOCKET_TIMEOUT_INFINITE;
}
}

Suggest reducing scope of str_maybe.

Comment on lines +2623 to +2629
const int32_t fallback = MONGOC_DEFAULT_SOCKETTIMEOUTMS;

const bson_t *const options = mongoc_uri_get_options(uri);
bson_iter_t iter;

if (options && bson_iter_init_find_case(&iter, options, MONGOC_URI_SOCKETTIMEOUTMS) &&
BSON_ITER_HOLDS_INT32(&iter)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const int32_t fallback = MONGOC_DEFAULT_SOCKETTIMEOUTMS;
const bson_t *const options = mongoc_uri_get_options(uri);
bson_iter_t iter;
if (options && bson_iter_init_find_case(&iter, options, MONGOC_URI_SOCKETTIMEOUTMS) &&
BSON_ITER_HOLDS_INT32(&iter)) {
const int32_t fallback = MONGOC_DEFAULT_SOCKETTIMEOUTMS;
bson_iter_t iter = {0};
if (bson_iter_init_find_case(&iter, mongoc_uri_get_options(uri), MONGOC_URI_SOCKETTIMEOUTMS) &&
BSON_ITER_HOLDS_INT32(&iter)) {

Suggest taking advantage of mongoc_uri_get_options() never returning a null pointer.

if (timeout_msec == MONGOC_SOCKET_TIMEOUT_IMMEDIATE) {
return 0;
} else if (timeout_msec <= 0) {
return -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return -1;
return -1; // Infinite.

Suggest drive-by documenting the meaning of -1 here.

Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation for the various timeout representations is much appreciated. Posting initial comments.


uri = mongoc_uri_new("mongodb://localhost/?" MONGOC_URI_SOCKETTIMEOUTMS "=garbage");
ASSERT(!uri);
ASSERT_CAPTURED_LOG("mongoc_uri_get_socket_timeout_ms_option",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: use name of function expected to warn:

Suggested change
ASSERT_CAPTURED_LOG("mongoc_uri_get_socket_timeout_ms_option",
ASSERT_CAPTURED_LOG("mongoc_uri_new",

Changes an failed assert message from testing mongoc_uri_get_socket_timeout_ms_option didn't log ... to testing mongoc_uri_new didn't log ....

ASSERT(uri);

// CDRIVER-6177: `socketTimeoutMS=0` is treated as unset, so the default is used instead.
ASSERT_CMPINT32(mongoc_uri_get_socket_timeout_ms_option(uri), ==, MONGOC_DEFAULT_SOCKETTIMEOUTMS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest capturing and asserting expected log message:

Suggested change
ASSERT_CMPINT32(mongoc_uri_get_socket_timeout_ms_option(uri), ==, MONGOC_DEFAULT_SOCKETTIMEOUTMS);
capture_logs(true);
ASSERT_CMPINT32(mongoc_uri_get_socket_timeout_ms_option(uri), ==, MONGOC_DEFAULT_SOCKETTIMEOUTMS);
ASSERT_CAPTURED_LOG("mongoc_uri_get_socket",
MONGOC_LOG_LEVEL_WARNING,
"`socketTimeoutMS=0` cannot be used to disable socket timeouts");
capture_logs(false);

ASSERT_CAPTURED_LOG("mongoc_uri_get_socket_timeout_ms_option",
MONGOC_LOG_LEVEL_WARNING,
"Unsupported value for \"sockettimeoutms\": \"garbage\"");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
capture_logs(false);
}

Suggest wrapping log asserts with capture_logs(true); and capture_logs(false); to limit scope of captured logs.

I expect it has no impact on the last case since TestSuite_RunTest calls capture_logs(false); after each test. But it may help avoid an unexpected capture if this test is later added to.


if (value == 0) {
// See CDRIVER-6177.
MONGOC_WARNING("`socketTimeoutMS=0` cannot be used to disable socket timeouts. The default of %" PRId32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mongoc_uri_get_socket_timeout_ms_option is called by mongoc_cluster_reset_sockettimeoutms, which is called each time a client is pushed back to a pool:

uri = mongoc_uri_new("mongodb://localhost/?" MONGOC_URI_SOCKETTIMEOUTMS "=0");
pool = mongoc_client_pool_new(uri);
client = mongoc_client_pool_pop(pool); // Warns.
mongoc_client_pool_push(pool, client); // Warns again.

Suggest moving this warning to mongoc_uri_apply_options to reduce the number of warnings to only when a URI is parsed.

} else if (timeout_msec == 0) {
if (timeout_msec == MONGOC_SOCKET_TIMEOUT_IMMEDIATE) {
return 0;
} else if (timeout_msec <= 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this may change how 0 is interpreted when directly using the mongoc_stream_t API. The mongoc_stream_t API is (unfortunately) public and has known users (e.g. userver here). Example:

mongoc_socket_t *sock = mongoc_socket_new(AF_INET, SOCK_STREAM, 0);

// Build address struct for localhost:27017:
struct sockaddr_in server_addr = {0};
server_addr.sin_family = AF_INET;
server_addr.sin_port = htons(27017);
server_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);

ssize_t ok = mongoc_socket_connect(sock, (struct sockaddr *)&server_addr, sizeof(server_addr), -1);
ASSERT_CMPSSIZE_T(ok, ==, 0);

mongoc_stream_t *stream = mongoc_stream_socket_new(sock);
mongoc_iovec_t iov = {.iov_base = "foo", .iov_len = 3};
ok = mongoc_stream_writev(stream, &iov, 1, 0 /* timeout */); // Want to be non-blocking.
ASSERT_CMPSSIZE_T(ok, ==, 3);

mongoc_stream_destroy(stream);

Suggest reverting to preserve the meaning of 0 as non-blocking in the mongoc_stream_t API.

Copy link
Collaborator Author

@connorsmacd connorsmacd Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is quite unfortunate. While I could use the POSIX convention just for the public mongoc_stream_t API, I do not think this is the best approach.

Take mongoc_stream_read with OpenSSL enabled for example:

  • When mongoc_stream_read is called, we convert timeout_msec from the POSIX convention to the spec convention, then pass it to stream->readv.
  • stream->readv is _mongoc_stream_tls_openssl_readv in this case, which eventually calls mongoc_stream_tls_openssl_bio_read.
  • In mongoc_stream_tls_openssl_bio_read, we call mongoc_stream_read again, so we convert the timeout_msec back to the POSIX convention.

If we continue to mix conventions like this, then we are just asking for errors. In my opinion, it would be best to use the POSIX convention as much as possible internally and use the spec convention only where needed for spec compliance. In the case of socketTimeoutMS, we can use the spec convention within the URI, but once we extract the timeout from the URI, we convert to the POSIX convention then stick with it. This would have the added benefit of being more intuitive and less error-prone when dealing with POSIX system calls that accept a timeout.

That said, I recognize that being unaligned with the spec in this regard is less than ideal. This also goes against the recommendation in CDRIVER-5815:

IMO, libmongoc should be changed to have 0 actually mean "unlimited"

Thoughts?

Copy link
Contributor

@eramongodb eramongodb Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mongoc_socket_t and mongoc_stream_t API have been an unfortunate source of codebase and behavior inconsistencies that is difficult to address due to being public API. They are also primarily responsible for blocking further int32 vs. int64 correctness improvements (CDRIVER-4596 and CDRIVER-4599) which were were unable to be scoped for the v2 release, as noted in #1475:

Additionally, an attempt is made to document the unspecified/inconsistent behavior of non-positive timeout values for the URI option documentation and for functions that have a timeout parameter (i.e. mongoc_stream_*()). Scanning through how the timeout values are both represented and interpreted within libmongoc, it does not seem desirable to continue to support non-positive URI timeout values (even 0) despite historical (implicit) support.

Concerning "spec" vs. "POSIX" conventions, I am slightly in favor of limiting the POSIX convention to the mongoc-stream-socket.c component (where get_expiration() converts from spec to POSIX) + implementing specific workarounds for the stream/socket API, since in most contexts except for the socket/stream API, we use the spec convention. If we want to use the POSIX convention throughout the codebase, I expect there will be comparatively more changes needed for that than what this PR currently proposes (e.g. consider also how mlib_timer API treats < 0 as immediate timeout, whereas POSIX convention treats < 0 as infinite timeout).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the important context.

Upon reflection, I think I can avoid some of the mess I was previously concerned about by adding new private mongoc_stream_t functions that mirror the public API, but use the expected timeout convention. All internal uses of mongoc_stream_t can use the private API without changing timeout conventions. The public API can be reimplemented to delegate to the private API by converting timeouts to the correct convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants