Skip to content

Stream image data in load() instead of reading into memory#640

Merged
inknos merged 1 commit intocontainers:mainfrom
Honny1:stream-image-load
Apr 8, 2026
Merged

Stream image data in load() instead of reading into memory#640
inknos merged 1 commit intocontainers:mainfrom
Honny1:stream-image-load

Conversation

@Honny1
Copy link
Copy Markdown
Member

@Honny1 Honny1 commented Mar 25, 2026

No description provided.

@Honny1
Copy link
Copy Markdown
Member Author

Honny1 commented Mar 25, 2026

/packit retest-failed

Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Copy link
Copy Markdown
Contributor

@inknos inknos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit. I was playing with cursor locally and didn't realize it uploaded the review, didn't want to do that but here we are.

My notes:

I was going to simply approve the pr since I think it's good. didn't think about the cursor nits to be honest and I prefer concise code as you did much more


Cursor:

Review: Approve with minor suggestions

The change is correct, well-scoped, and CI is fully green. The core benefit — avoiding loading multi-GB tarballs entirely into memory when file_path is used — is real and valuable. The APIClient._Data type alias already includes IO, confirming file-like objects are a supported data type.

Correct and positive

  • Memory efficiency: Path.open("rb") streams data to requests, which reads it incrementally during the HTTP POST. Large tarballs no longer occupy full memory.
  • Context manager usage: Both Path.open() and io.BytesIO support the with protocol. The file handle is properly closed when the with block exits.
  • Generator safety: The returned generator does not depend on the stream. By the time return _generator(response.json()) executes, the HTTP request is complete and the response body is parsed. The generator only calls self.get(item) on image names — no risk of accessing a closed stream.
  • Test update is correct: Mocking pathlib.Path.open to return io.BytesIO(b"mock tarball data") is a proper mock for the new streaming behavior. The import io addition is needed.
  • Type safety: The _Data union type in podman/api/client.py explicitly includes IO, so passing a file object is type-safe.

Minor suggestions (non-blocking)

  1. Unnecessary wrapping for the data path: When data (bytes) is provided, wrapping it in io.BytesIO(data) adds a copy and context manager overhead for no streaming benefit — the bytes are already in memory. The old code passed data directly to post(). Consider keeping a fast path that avoids the wrapper when only data is provided. This is a style/performance nit — the current unified with is more concise and still correct.

  2. PR description is empty: Adding a one-liner like "Avoid loading entire tarball into memory when file_path is used; stream the file data directly to the HTTP request instead" would help reviewers and future archaeology.

  3. Edge case data=b"": Verified that the validation if not data and not file_path correctly catches empty bytes (not b"" is True), raising PodmanError. No issue.

LGTM — nice focused improvement.

@inknos inknos merged commit 52a3036 into containers:main Apr 8, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants