feat(blob-v2): configure packed blob file max size with field metadata by jo-migo · Pull Request #7322 · lance-format/lance

jo-migo · 2026-06-17T14:19:29Z

Fixes

Problem

There's currently no way to encode a threshold for maximum packed blob sidecar file size into a dataset in the way that you can encode inline blob and dedicated blob size thresholds via lance-encoding:blob-dedicated-size-threshold and lance-encoding:blob-inline-size-threshold respectively.

Solution

Add a new lance-encoding:blob-pack-file-size-threshold field-level metadata key which informs the blob writer to only start new packed files when current packed file reaches that threshold.

It can be overridden by supplying another value via the existing blob_pack_file_size_threshold parameter to the write_dataset function.

github-actions · 2026-06-17T14:20:09Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

Fixes: lance-format#7292 Problem: There's currently no way to encode a threshold for maximum packed blob sidecar file size into a dataset in the way that you can encode inline blob and dedicated blob size thresholds via `lance-encoding:blob-dedicated-size-threshold` and `lance-encoding:blob-inline-size-threshold` respectively. Solution: Add a new `lance-encoding:blob-pack-file-size-threshold` field-level metadata key which informs the blob writer to only start new packed files when current packed file reaches that threshold. It can be overridden by supplying another value via the existing `blob_pack_file_size_threshold` parameter to the `write_dataset` function.

Xuanwo · 2026-06-20T02:03:14Z

        if self
            .current_blob_id
-            .map(|_| self.current_size + len > self.max_pack_size)
+            .map(|_| self.current_size + len > max_pack_size)


This uses the incoming field's threshold to decide whether to append to the shared pack file. In datasets with multiple blob fields and different thresholds, a file that already contains data from a smaller-threshold field can be extended under a larger threshold, so the persisted field metadata no longer reliably bounds sidecar size.

Thanks for the review! Good point, I was not dealing with datasets containing different thresholds before. Now this case should be covered.

Xuanwo · 2026-06-20T02:03:14Z

-                    input_field.name,
-                    dataset_inline_threshold,
-                )));
+        for (key, read_threshold) in [


This append validation only checks top-level fields, even though blob threshold metadata is read recursively from nested blob fields. A nested blob append can silently ignore mismatched pack-threshold metadata and write using the persisted schema instead, so nested fields do not get the same explicit-mismatch protection as top-level blob columns.

I made this whole check recursive.

…olumns and do recursive validation

github-actions Bot added A-python Python bindings enhancement New feature or request labels Jun 17, 2026

jo-migo force-pushed the fix-packed-storage-filesize branch from e6b4c40 to 08c0e06 Compare June 17, 2026 14:28

jo-migo changed the title ~~feat(blob-v2): Configure Packed Blob File Max Size with Field Metadata~~ feat(blob-v2): configure packed blob file max size with field metadata Jun 17, 2026

Merge branch 'main' into fix-packed-storage-filesize

b8282c1

Xuanwo reviewed Jun 20, 2026

View reviewed changes

jo-migo and others added 2 commits June 22, 2026 18:16

Merge branch 'lance-format:main' into fix-packed-storage-filesize

a06b35c

Review remarks: deal with different thresholds on different blob v2 c…

97c41af

…olumns and do recursive validation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blob-v2): configure packed blob file max size with field metadata#7322

feat(blob-v2): configure packed blob file max size with field metadata#7322
jo-migo wants to merge 4 commits into
lance-format:mainfrom
jo-migo:fix-packed-storage-filesize

jo-migo commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Xuanwo Jun 20, 2026

Uh oh!

jo-migo Jun 22, 2026

Uh oh!

Xuanwo Jun 20, 2026

Uh oh!

jo-migo Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jo-migo commented Jun 17, 2026

Fixes

Problem

Solution

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Xuanwo Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

jo-migo Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

jo-migo Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants