[FLINK-39603] Optimize NativeS3InputStream via seeks and reduce IOPS by Samrat002 · Pull Request #28112 · apache/flink

Samrat002 · 2026-05-04T19:01:00Z

What is the purpose of the change

NativeS3InputStream.seek() unconditionally aborts the HTTP connection and opens a new Range GET for every seek, even for small forward seeks that land inside already-buffered data. During state restore, this creates O(N) HTTP round-trips for N state entries. each call costing TCP handshake + TLS + S3 response latency.

This PR aim to optimise an skip all HTTP work is deferred to the next read() call. Forward seeks within max(readBufferSize, bufferedStream.available()) skip in-buffer instead of reopening the connection.

Brief change log

seek() and skip() now only update nextReadPos without any I/O
Added lazySeek() to reconcile nextReadPos vs streamPos on the next read() call
Forward seeks within the dynamic threshold max(readBufferSize, bufferedStream.available()) skip in-buffer; backward or large forward seeks reopen via Range GET
Multiple seeks between reads coalesce — only the final position matters

Verifying this change

This change added tests and can be verified wit Unit Test and E2E working for high state flink application

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: yes

Documentation

Does this pull request introduce a new feature? no
If yes, how is the feature documented? not applicable

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)

Samrat002 · 2026-05-04T19:01:30Z

@gaborgsomogyi PTAL

flinkbot · 2026-05-04T19:11:32Z

CI report:

c3d49db Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

gaborgsomogyi · 2026-05-06T08:18:12Z

I think we need an explicit config for buffer size. There are use-cases where we need to set 16MB buffer for proper IOPS usage.

gaborgsomogyi · 2026-05-06T12:48:36Z

Hi @Samrat002, great work reducing IOPS via lazy seek and range requests! While reviewing the implementation I found one remaining efficiency issue worth addressing.

What's working well

A seek that lands within the read-ahead buffer is served entirely from the buffer - no extra GetObject call, no bytes pulled from S3. The attached patch includes seekWithinBuffer_afterSmallRead_doesNotTouchUnderlyingStream which demonstrates this and passes.

Remaining issue: bulk reads bypass the BufferedInputStream buffer

BufferedInputStream.read(byte[], off, len) has an internal fast-path: when len >= bufferSize it reads directly from the underlying stream, skipping the local buffer entirely. After such a read the buffer is empty.

This means a forward seek that follows a bulk read cannot be satisfied from the buffer and falls through to bufferedStream.skip(), which consumes bytes from the live HTTP connection. No new GetObject is issued (IOPS are fine), but unnecessary bytes are downloaded and discarded.

The attached patch includes seekWithinBuffer_afterLargeRead_touchesUnderlyingStream which demonstrates the bug - it currently fails because bytesSkippedFromUnderlying is 10 instead of 0.

Suggested fix

Replace BufferedInputStream with a plain byte[] buffer managed directly, tracking bufferStart (file offset of the first byte in the buffer), bufferOffset (current read position within the array), and bufferLength (valid bytes). Then lazySeek() can check [bufferStart, bufferStart + bufferLength) to decide whether the seek is satisfiable in-buffer, independently of whether the preceding read was a byte-at-a-time or a bulk read.

The patch file adds the AtomicLong counters to TrackingInputStream and both test methods.

buffer-efficiency-tests.patch

[FLINK-39603] Optimize NativeS3InputStream via seeks and reduce IOPS

c3d49db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-39603] Optimize NativeS3InputStream via seeks and reduce IOPS#28112

[FLINK-39603] Optimize NativeS3InputStream via seeks and reduce IOPS#28112
Samrat002 wants to merge 1 commit intoapache:masterfrom
Samrat002:FLINK-39603

Samrat002 commented May 4, 2026

Uh oh!

Samrat002 commented May 4, 2026

Uh oh!

flinkbot commented May 4, 2026 •

edited

Loading

Uh oh!

gaborgsomogyi commented May 6, 2026

Uh oh!

gaborgsomogyi commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Samrat002 commented May 4, 2026

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Was generative AI tooling used to co-author this PR?

Uh oh!

Samrat002 commented May 4, 2026

Uh oh!

flinkbot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

gaborgsomogyi commented May 6, 2026

Uh oh!

gaborgsomogyi commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flinkbot commented May 4, 2026 •

edited

Loading