fix: support manifests >5 GB via size-aware copy#7047
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
westonpace
left a comment
There was a problem hiding this comment.
A few questions but nothing too worrisome. Go ahead and ping me when you're ready for another review.
| // Note: this does NOT cover task cancellation — `MultipartUpload`'s | ||
| // upstream Drop is documented as a no-op for S3/GCS. Callers that | ||
| // need cancellation cleanliness should run this with a guard or | ||
| // switch to `object_store::WriteMultipart` (planned follow-up). |
There was a problem hiding this comment.
Is there actually a follow-up issue created? Can we specify the issue directly?
There was a problem hiding this comment.
I reworded this from a "deferred to a follow-up" TODO into a plain inline NOTE rather than referencing a tracked issue.
It's a cold path (only triggers for >5 GiB manifests) and the whole read+rewrite helper is a stopgap until object_store exposes UploadPartCopy, so I didn't think a standalone tracking issue was warranted — a self-contained note explaining the tradeoff seemed more useful than an issue link that'd likely sit untouched.
Happy to file one if you'd prefer it tracked.
| if let Err(e) = upload.put_part(Bytes::from(payload).into()).await { | ||
| let _ = upload.abort().await; | ||
| return Err(e); | ||
| } |
There was a problem hiding this comment.
This could maybe be optimized to allow for concurrent calls to put_part but maybe that will be addressed by the previously mentioned follow-up?
There was a problem hiding this comment.
Right — left sequential for now, covered by the reworded note just above. It can be parallelized with a bounded JoinSet (mirroring object_writer.rs's LANCE_UPLOAD_CONCURRENCY) or sidestepped by switching to object_store::WriteMultipart, which handles concurrency and abort-on-drop for free. Deferred since this only runs for >5 GiB manifests and the helper itself is interim until UploadPartCopy exists.
| /// `EntityTooLarge`, so we clamp `LANCE_INITIAL_UPLOAD_SIZE` one byte | ||
| /// below that threshold to keep the buffer-fills-to-clamp single-PUT | ||
| /// path safe. See lance#6750 for the related txn-file write fix. | ||
| const MAX_UPLOAD_PART_SIZE: usize = 1024 * 1024 * 1024 * 5 - 1; |
There was a problem hiding this comment.
Is this limit S3-specific or common across all object stores? Should we limit the slow path to S3?
There was a problem hiding this comment.
It's not actually S3-specific — GCS's single-shot Objects.copy has the same ~5 GiB limit. So rather than gate the slow path to S3, I reframed the threshold as the most-conservative server-side-copy limit across backends and renamed the constant to MAX_SERVER_SIDE_COPY_BYTES.
…nd-agnostic Review feedback on PR lance-format#7047 (westonpace): - Make `copy_size_aware`'s size parameter mandatory (`Option<u64>` -> `u64`) and drop the `head()` fallback. The only caller already knows the size, and the fallback was just an extra IOP the caller can avoid. - Reframe the 5 GiB threshold as backend-agnostic rather than S3-specific. S3's CopyObject and GCS's single-shot Objects.copy both cap at ~5 GiB, so `S3_COPY_OBJECT_CAP_BYTES` -> `MAX_SERVER_SIDE_COPY_BYTES`. Stores without such a cap (e.g. local FS) take the read+rewrite fallback above 5 GiB too; correctness is preserved, only the rare >5 GiB copy is slower. Comments in copy_via_read_rewrite and on COPY_REWRITE_PART_SIZE reworded to "S3/GCS" / "the backend". Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c12c46a to
ab8f3da
Compare
`object_store.copy()` is called unconditionally on the staging→final manifest copy path. This routes to S3's CopyObject API, which has a 5 GB hard cap, so manifests above this fail with `EntityTooLarge` (production case: a ~14 GB manifest with ProposedSize 14961429442). - Add `copy_size_aware`: keeps the cheap server-side `store.copy()` for sources below the limit, falls back to read+rewrite via multipart upload for larger sources. The required `size` argument avoids an extra `head()` round-trip (the only caller already knows the size). - The 5 GiB threshold is backend-agnostic, not S3-specific: S3's CopyObject and GCS's single-shot Objects.copy both cap at ~5 GiB. Stores without such a cap (e.g. local FS) take the read+rewrite fallback above 5 GiB too; correctness is preserved, only the rare large copy is slower. - Tighten `MAX_UPLOAD_PART_SIZE` from 5 GiB to 5 GiB - 1 so `LANCE_INITIAL_UPLOAD_SIZE=5368709120` can't trigger a single PUT of exactly 5 GiB on shutdown, which S3 also rejects. Same bug class as lance-format#6750 (multipart-aware put for txn file writes), different code path. Fixes lance-format#7197. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ab8f3da to
7b59d20
Compare
Thanks @westonpace for reviewing it. Please take another look with the updates. |
Fixes #7197.
Summary
object_store.copy()is called unconditionally on the staging→final manifest copy path. This routes to S3'sCopyObjectAPI, which has a ~5 GB hard cap. Manifests above this fail withEntityTooLarge— production case was a ~14 GB manifest withProposedSize 14961429442on_versions/...manifest.copy_size_aware: keeps the cheap server-sidestore.copy()for sources below the limit, falls back to read+rewrite via a multipart upload for larger sources. The requiredsizeargument lets the caller skip an extrahead()round-trip.CopyObjectand GCS's single-shotObjects.copyboth cap at ~5 GiB, so the constant is namedMAX_SERVER_SIDE_COPY_BYTES. Stores without such a cap (e.g. local FS) take the read+rewrite fallback above 5 GiB too; correctness is preserved, only the rare large copy is slower.MAX_UPLOAD_PART_SIZEfrom5 GiBto5 GiB - 1soLANCE_INITIAL_UPLOAD_SIZE=5368709120can't trigger a single PUT of exactly 5 GiB on shutdown — which S3 also rejects.Same bug class as #6750 (multipart-aware put for txn file writes), different code path.
Test plan
New tests in
rust/lance/src/io/commit/external_manifest.rscovering both the >5 GB read+rewrite fallback and the small-file fast path.Related downstream issue: lance-format/lance-spark#529