Skip to content

Use varint encoding for row count in encoding prefix (#657)#657

Open
tanjialiang wants to merge 1 commit into
facebookincubator:mainfrom
tanjialiang:export-D97174231
Open

Use varint encoding for row count in encoding prefix (#657)#657
tanjialiang wants to merge 1 commit into
facebookincubator:mainfrom
tanjialiang:export-D97174231

Conversation

@tanjialiang
Copy link
Copy Markdown
Contributor

@tanjialiang tanjialiang commented Apr 10, 2026

Summary:

Change the Nimble encoding prefix row count field from a fixed 4-byte uint32_t to varint encoding. For typical small row counts (< 128 rows), this saves 3 bytes per encoding prefix.

The change is gated by a minor version bump (1 → 2) for backward compatibility:

  • Write path: All encodings now use Encoding::Options{.useVarintRowCount = true} when index is enabled
  • Read path: Files with version >= (0, 2) AND cluster index present decode with varint row counts; older files use the fixed 4-byte format

Key changes:

  • Introduce NimbleFeatureVersion.h with NimbleVersion struct and NimbleFeatureVersionRange for centralized version-to-feature mapping. Future features follow the same pattern: add a version range, gate writer on current version, gate reader on file version.
  • Bump kVersionMinor from 1 to 2 in Constants.h
  • Thread useVarintRowCount=true through VeloxWriter, IndexWriter, and VectorizedStatistics encode paths
  • Version-gate decode in VeloxReader, SelectiveNimbleReader, SelectiveNimbleIndexReader, ChunkedDecoder, IndexReader, and ReaderBase (stats deserialization)
  • Fix tools (NimbleDumpLib, NimbleDslLib, EncodingLayoutTrainer, DumpNimbleTablet2DatabaseLib) to use varint-aware encoding traversal and row count reading
  • Add useVarintRowCount parameter to traverseEncodings(), getStreamInputLabel(), and getEncodingLabel() in EncodingUtilities
  • Update test utilities (TestUtils, VeloxWriterTest) to handle varint prefix parsing
  • Update ~80 hardcoded physicalSize values in FieldWriterStatsTest to account for smaller prefixes
  • Add NimbleFeatureVersionTest with version comparison, feature range, and constexpr tests

Differential Revision: D97174231

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 10, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 10, 2026

@tanjialiang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97174231.

tanjialiang added a commit to tanjialiang/nimble that referenced this pull request Apr 10, 2026
…or#657)

Summary:
Pull Request resolved: facebookincubator#657

Change the Nimble encoding prefix row count field from a fixed 4-byte uint32_t to varint encoding. For typical small row counts (< 128 rows), this saves 3 bytes per encoding prefix.

The change is gated by a minor version bump (1 → 2) for backward compatibility:
- Write path: All encodings now use `Encoding::Options{.useVarintRowCount = true}` when index is enabled
- Read path: Files with version >= (0, 2) AND cluster index present decode with varint row counts; older files use the fixed 4-byte format

Key changes:
- Introduce `NimbleFeatureVersion.h` with `NimbleVersion` struct and `NimbleFeatureVersionRange` for centralized version-to-feature mapping. Future features follow the same pattern: add a version range, gate writer on current version, gate reader on file version.
- Bump `kVersionMinor` from 1 to 2 in Constants.h
- Thread `useVarintRowCount=true` through VeloxWriter, IndexWriter, and VectorizedStatistics encode paths
- Version-gate decode in VeloxReader, SelectiveNimbleReader, SelectiveNimbleIndexReader, ChunkedDecoder, IndexReader, and ReaderBase (stats deserialization)
- Fix tools (NimbleDumpLib, NimbleDslLib, EncodingLayoutTrainer, DumpNimbleTablet2DatabaseLib) to use varint-aware encoding traversal and row count reading
- Add `useVarintRowCount` parameter to `traverseEncodings()`, `getStreamInputLabel()`, and `getEncodingLabel()` in EncodingUtilities
- Update test utilities (TestUtils, VeloxWriterTest) to handle varint prefix parsing
- Update ~80 hardcoded physicalSize values in FieldWriterStatsTest to account for smaller prefixes
- Add NimbleFeatureVersionTest with version comparison, feature range, and constexpr tests

Differential Revision: D97174231
@meta-codesync meta-codesync Bot changed the title Use varint encoding for row count in encoding prefix Use varint encoding for row count in encoding prefix (#657) Apr 10, 2026
…or#657)

Summary:
Pull Request resolved: facebookincubator#657

Change the Nimble encoding prefix row count field from a fixed 4-byte uint32_t to varint encoding. For typical small row counts (< 128 rows), this saves 3 bytes per encoding prefix.

The change is gated by a minor version bump (1 → 2) for backward compatibility:
- Write path: All encodings now use `Encoding::Options{.useVarintRowCount = true}` when index is enabled
- Read path: Files with version >= (0, 2) AND cluster index present decode with varint row counts; older files use the fixed 4-byte format

Key changes:
- Introduce `NimbleFeatureVersion.h` with `NimbleVersion` struct and `NimbleFeatureVersionRange` for centralized version-to-feature mapping. Future features follow the same pattern: add a version range, gate writer on current version, gate reader on file version.
- Bump `kVersionMinor` from 1 to 2 in Constants.h
- Thread `useVarintRowCount=true` through VeloxWriter, IndexWriter, and VectorizedStatistics encode paths
- Version-gate decode in VeloxReader, SelectiveNimbleReader, SelectiveNimbleIndexReader, ChunkedDecoder, IndexReader, and ReaderBase (stats deserialization)
- Fix tools (NimbleDumpLib, NimbleDslLib, EncodingLayoutTrainer, DumpNimbleTablet2DatabaseLib) to use varint-aware encoding traversal and row count reading
- Add `useVarintRowCount` parameter to `traverseEncodings()`, `getStreamInputLabel()`, and `getEncodingLabel()` in EncodingUtilities
- Update test utilities (TestUtils, VeloxWriterTest) to handle varint prefix parsing
- Update ~80 hardcoded physicalSize values in FieldWriterStatsTest to account for smaller prefixes
- Add NimbleFeatureVersionTest with version comparison, feature range, and constexpr tests

Differential Revision: D97174231
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant