Skip to content

[storage] Add configurable S3 read timeout (OSMO_S3_READ_TIMEOUT)#1060

Open
xutongNV wants to merge 1 commit into
mainfrom
xutongr/fix-s3-read-timeout
Open

[storage] Add configurable S3 read timeout (OSMO_S3_READ_TIMEOUT)#1060
xutongNV wants to merge 1 commit into
mainfrom
xutongr/fix-s3-read-timeout

Conversation

@xutongNV

@xutongNV xutongNV commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Problem

The S3 client — also used for Swift via its S3-compatible API — never sets a read_timeout, so botocore's 60s default applies (_get_boto_config in src/lib/data/storage/backends/s3.py).

For a large object uploaded as a single PUT (i.e. below multipart_threshold, default 512 MiB), the bytes transfer quickly but the server (e.g. Swift) replicates and checksums the whole object before responding. On a busy/slow backend that finalization exceeds 60s, so botocore raises ReadTimeoutError and the upload is retried in full — looping indefinitely even though the data transferred fine.

Observed symptom (480 MB upload to swift://):

100%|██████████| 480M/480M [00:20<00:00, 130MB/s]  Unknown error: Read timeout on endpoint URL: ".../gear_upload_test_large.bin". Retrying 3 more times.
... (repeats) ...

Fix

Add a configurable read_timeout to the botocore Config, controlled by a new OSMO_S3_READ_TIMEOUT env var (default 5m), mirroring the existing _get_s3_timeout() / OSMO_S3_TIMEOUT pattern. This applies to all S3-family backends (S3, Swift, GS, TOS).

Changes

  • s3.py: new OSMO_S3_READ_TIMEOUT constant, _get_s3_read_timeout() helper, and read_timeout added to the boto Config.
  • test_backends.py: tests for the default (300s) and env override.

Testing

bazel test //src/lib/data/storage/backends/tests:test_backends

Passes, including mypy.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced configurable S3 read timeout via environment variable, enabling custom timeout settings for S3 storage connections (defaults to 5 minutes).
  • Tests

    • Added unit tests verifying default and custom S3 read timeout configurations.

The S3 client (also used for Swift via its S3-compatible API) never set a
read_timeout, so botocore's 60s default applied. For large single-PUT
objects on backends like Swift, the server replicates and checksums the
whole object before responding, which can exceed 60s. The client then
raises ReadTimeoutError and retries the entire upload, looping
indefinitely even though the bytes transferred fine.

Add a configurable read_timeout (env OSMO_S3_READ_TIMEOUT, default 5m)
to the botocore Config, mirroring the existing _get_s3_timeout pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@xutongNV xutongNV requested a review from a team as a code owner June 2, 2026 22:55
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 69915ad3-ef85-463c-b921-29a2527aa9ab

📥 Commits

Reviewing files that changed from the base of the PR and between 2149c59 and e87a1f3.

📒 Files selected for processing (2)
  • src/lib/data/storage/backends/s3.py
  • src/lib/data/storage/backends/tests/test_backends.py

📝 Walkthrough

Walkthrough

This PR adds configurable S3 read timeout support via a new OSMO_S3_READ_TIMEOUT environment variable. A parser function converts the variable (defaulting to 5m) into seconds, which is then integrated into boto3 client configuration. Unit tests validate both default and override behaviors.

Changes

S3 Read Timeout Configuration

Layer / File(s) Summary
Timeout configuration contract and parser
src/lib/data/storage/backends/s3.py
New OSMO_S3_READ_TIMEOUT constant defines the environment variable name. _get_s3_read_timeout() parses the variable (default 5m) using timedelta and returns the value in seconds.
Boto3 integration and test validation
src/lib/data/storage/backends/s3.py, src/lib/data/storage/backends/tests/test_backends.py
_get_boto_config() integrates the parsed timeout into boto3/botocore configuration. Two tests verify the default 300-second timeout when the environment variable is absent and the override to 600 seconds when OSMO_S3_READ_TIMEOUT=10m is set.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A rabbit hops through S3's door,
With timeouts tuned at 5m core,
Now configs bend to env's command,
Boto3 reads with steady hand! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding a configurable S3 read timeout feature via the OSMO_S3_READ_TIMEOUT environment variable.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch xutongr/fix-s3-read-timeout

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.07%. Comparing base (2149c59) to head (e87a1f3).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1060   +/-   ##
=======================================
  Coverage   49.07%   49.07%           
=======================================
  Files         220      220           
  Lines       28664    28667    +3     
  Branches     4288     4288           
=======================================
+ Hits        14066    14069    +3     
  Misses      13935    13935           
  Partials      663      663           
Flag Coverage Δ
backend 50.37% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/lib/data/storage/backends/s3.py 29.77% <100.00%> (+0.47%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant