test: assert dictionary page is listed first in page_encoding_stats#10051
Closed
alamb wants to merge 1 commit into
Closed
test: assert dictionary page is listed first in page_encoding_stats#10051alamb wants to merge 1 commit into
alamb wants to merge 1 commit into
Conversation
Adds a regression test asserting that for a dictionary-encoded column written via `ArrowWriter`, the `page_encoding_stats` list places the DICTIONARY_PAGE entry before the DATA_PAGE entries, matching the on-disk dictionary-first page layout and the order produced by the column-at-a-time `SerializedFileWriter`. The test reads the full (non-mask) encoding stats via `ParquetMetaDataOptions::with_encoding_stats_as_mask(false)`, since the default metadata reader collapses the list to a bitmask and discards the ordering under test. This passes on `main` and is intended to guard the dictionary-first ordering of the emitted metadata. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
I am not convinced it si important that the stats remain in the same order |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
While reviewing #10020 I (claude) noticed that the deferred-dictionary-ordering change makes the
ArrowWriteremit data pages before the dictionary page, which reorders thepage_encoding_statslist so theDICTIONARY_PAGEentry lands last instead of first.There is currently no test asserting the order of
page_encoding_stats, so that change is silent. This PR adds one.It passes on
mainand is intended to fail against the #10020 branch, documenting the dictionary-first ordering as an explicit invariant so any future reordering is caught.What changes are included in this PR?
A single new unit test,
dictionary_page_encoding_stats_lists_dictionary_first, inparquet/src/arrow/arrow_writer/mod.rs. No production code changes.Are these changes tested?
The change is a test. Verified it passes on
main.Are there any user-facing changes?
No.
🤖 Generated with Claude Code