Skip to content

test: assert dictionary page is listed first in page_encoding_stats#10051

Closed
alamb wants to merge 1 commit into
apache:mainfrom
alamb:alamb/test-dict-page-encoding-stats-order
Closed

test: assert dictionary page is listed first in page_encoding_stats#10051
alamb wants to merge 1 commit into
apache:mainfrom
alamb:alamb/test-dict-page-encoding-stats-order

Conversation

@alamb

@alamb alamb commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

While reviewing #10020 I (claude) noticed that the deferred-dictionary-ordering change makes the ArrowWriter emit data pages before the dictionary page, which reorders the page_encoding_stats list so the DICTIONARY_PAGE entry lands last instead of first.

There is currently no test asserting the order of page_encoding_stats, so that change is silent. This PR adds one.

It passes on main and is intended to fail against the #10020 branch, documenting the dictionary-first ordering as an explicit invariant so any future reordering is caught.

What changes are included in this PR?

A single new unit test, dictionary_page_encoding_stats_lists_dictionary_first, in parquet/src/arrow/arrow_writer/mod.rs. No production code changes.

Are these changes tested?

The change is a test. Verified it passes on main.

Are there any user-facing changes?

No.

🤖 Generated with Claude Code

Adds a regression test asserting that for a dictionary-encoded column
written via `ArrowWriter`, the `page_encoding_stats` list places the
DICTIONARY_PAGE entry before the DATA_PAGE entries, matching the on-disk
dictionary-first page layout and the order produced by the
column-at-a-time `SerializedFileWriter`.

The test reads the full (non-mask) encoding stats via
`ParquetMetaDataOptions::with_encoding_stats_as_mask(false)`, since the
default metadata reader collapses the list to a bitmask and discards the
ordering under test.

This passes on `main` and is intended to guard the dictionary-first
ordering of the emitted metadata.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the parquet Changes to the parquet crate label Jun 2, 2026
@alamb alamb closed this Jun 2, 2026
@alamb alamb reopened this Jun 2, 2026
@alamb alamb closed this Jun 2, 2026
@alamb

alamb commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

I am not convinced it si important that the stats remain in the same order

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant