Skip to content

Bug: manifest commit fails with EntityTooLarge for manifests >5 GB (CopyObject hard cap) #7197

Description

@lixmgl

Summary

ExternalManifestStore calls object_store.copy(staging, final) unconditionally on the manifest commit path at external_manifest.rs:128 and external_manifest.rs:250. This routes to S3's CopyObject API, which has a 5 GB hard cap on source object size. Any manifest above this fails with EntityTooLarge and the commit cannot complete.

Reproduction

Production failure observed on a Pinterest CTAS workload:

  • Manifest path: _versions/<version>.manifest
  • Reported ProposedSize: 14961429442 bytes (~14 GB)
  • Error: EntityTooLarge from S3 on the staging→final manifest copy step
  • No workaround except shrinking the manifest (which isn't always feasible — manifests grow with table version count and fragment count)

Why this matters

  • Affects any user whose manifest grows past 5 GB. Manifest size scales with table version history and fragment count, so this is reachable on long-lived production tables, not a corner case.
  • Crashes the commit; there is no graceful degradation. The CTAS or write workflow fails entirely.
  • No object_store-layer fallback today — the upstream object_store crate doesn't expose UploadPartCopy, so a workaround inside Lance is needed.

Proposed fix #7047

A copy_size_aware helper that:

  • Keeps the cheap server-side store.copy() for sources <5 GiB (the common case, no regression)
  • Falls back to read+rewrite via multipart upload for sources ≥5 GiB
  • Accepts a size_hint so callers that already know the source size can skip an extra head() round-trip on the small-file fast path

Same bug class as #6750, different code path

#6750 fixed the analogous bug for transaction file writes (write_transaction_file was using inner.put(), hitting S3's 5 GB single-PUT limit). That PR was scoped to txn files and did not touch the manifest commit path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions