Skip to content

feat: add RLE v2 run length widths#7376

Open
Xuanwo wants to merge 2 commits into
mainfrom
xuanwo/rle-v2-run-length-widths
Open

feat: add RLE v2 run length widths#7376
Xuanwo wants to merge 2 commits into
mainfrom
xuanwo/rle-v2-run-length-widths

Conversation

@Xuanwo

@Xuanwo Xuanwo commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds RLE v2 run-length widths so newly created datasets can write RLE pages with u16 or u32 run lengths instead of splitting every run at 255 values. The capability is recorded as a reader feature flag and is only enabled when a new dataset is created with WriteParams::enable_rle_v2; existing unflagged datasets reject attempts to turn it on mid-stream.

Closes #7327.

Benchmark

Ran on xuanwo-lance-lazy-metadata-bench with a #6941-style sorted low-cardinality asset_id workload.

workload Lance default Lance RLE2 reduction
150M rows / 5k assets / random5 value 167.36 MiB 164.57 MiB 1.67%
150M rows / 5k assets / by-asset5 value 7.62 MiB 2.03 MiB 73.34%

The first row keeps the random low-cardinality value column from the issue-like workload, which dominates total size. The second row isolates the long-run case RLE2 targets.

Validation

Validated with focused RLE2 tests and full Rust clippy before publishing.

@github-actions github-actions Bot added A-encoding Encoding, IO, file reader/writer A-format On-disk format: protos and format spec docs A-namespace Namespace impls labels Jun 19, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added the enhancement New feature or request label Jun 19, 2026
@Xuanwo Xuanwo marked this pull request as ready for review June 22, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-encoding Encoding, IO, file reader/writer A-format On-disk format: protos and format spec docs A-namespace Namespace impls enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add opt-in RLE v2 run-length widths

1 participant