Stream image data in load() instead of reading into memory#640
Stream image data in load() instead of reading into memory#640inknos merged 1 commit intocontainers:mainfrom
Conversation
|
/packit retest-failed |
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
b350e08 to
5395b3c
Compare
There was a problem hiding this comment.
Edit. I was playing with cursor locally and didn't realize it uploaded the review, didn't want to do that but here we are.
My notes:
I was going to simply approve the pr since I think it's good. didn't think about the cursor nits to be honest and I prefer concise code as you did much more
Cursor:
Review: Approve with minor suggestions
The change is correct, well-scoped, and CI is fully green. The core benefit — avoiding loading multi-GB tarballs entirely into memory when file_path is used — is real and valuable. The APIClient._Data type alias already includes IO, confirming file-like objects are a supported data type.
Correct and positive
- Memory efficiency:
Path.open("rb")streams data torequests, which reads it incrementally during the HTTP POST. Large tarballs no longer occupy full memory. - Context manager usage: Both
Path.open()andio.BytesIOsupport thewithprotocol. The file handle is properly closed when thewithblock exits. - Generator safety: The returned generator does not depend on the stream. By the time
return _generator(response.json())executes, the HTTP request is complete and the response body is parsed. The generator only callsself.get(item)on image names — no risk of accessing a closed stream. - Test update is correct: Mocking
pathlib.Path.opento returnio.BytesIO(b"mock tarball data")is a proper mock for the new streaming behavior. Theimport ioaddition is needed. - Type safety: The
_Dataunion type inpodman/api/client.pyexplicitly includesIO, so passing a file object is type-safe.
Minor suggestions (non-blocking)
-
Unnecessary wrapping for the
datapath: Whendata(bytes) is provided, wrapping it inio.BytesIO(data)adds a copy and context manager overhead for no streaming benefit — the bytes are already in memory. The old code passeddatadirectly topost(). Consider keeping a fast path that avoids the wrapper when onlydatais provided. This is a style/performance nit — the current unifiedwithis more concise and still correct. -
PR description is empty: Adding a one-liner like "Avoid loading entire tarball into memory when
file_pathis used; stream the file data directly to the HTTP request instead" would help reviewers and future archaeology. -
Edge case
data=b"": Verified that the validationif not data and not file_pathcorrectly catches empty bytes (not b""isTrue), raisingPodmanError. No issue.
LGTM — nice focused improvement.
No description provided.