Skip to content

[8016]Support writing hydrated REE arrays to Parquet#10064

Open
Rich-T-kid wants to merge 3 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/Write-REE-flat
Open

[8016]Support writing hydrated REE arrays to Parquet#10064
Rich-T-kid wants to merge 3 commits into
apache:mainfrom
Rich-T-kid:rich-T-kid/Write-REE-flat

Conversation

@Rich-T-kid
Copy link
Copy Markdown
Contributor

@Rich-T-kid Rich-T-kid commented Jun 3, 2026

Which issue does this PR close?

This PR works towards an initial solution closing #8016

Rationale for this change

Currently arrow_writer does not support writing Run End Encoded columns out to parquet. This PR works towards solving this by first expanding out the REE to its value type & then writing out to parquet. Once its possible to write REE to parquet we can work on optimizing it by keeping the compacting nature in tact.

What changes are included in this PR?

arrow_writer() now supports writing Run End Encoded (REE) arrays to Parquet by hydrating them to their underlying value type before encoding. This is an initial, correctness-first implementation. A follow-up can/should optimize to preserve the compacted structure.

parquet/src/arrow/arrow_writer/mod.rs: generate a value-type arrow-column writer & test
parquet/src/arrow/arrow_writer/levels.rs: core writer logic updated to detect REE columns and expand them to their flat value type before the existing write path.
parquet/src/arrow/schema/mod.rs: schema conversion updated to map RunEndEncodedType to an appropriate Parquet physical type.
parquet/benches/arrow_writer.rs: REE write benchmarks added with low and high null density scenarios, now unblocked by the implementation.

Are these changes tested?

Yes

Are there any user-facing changes?

Users will be able to write out their REE columns out to parquet using arrow_writer

@github-actions github-actions Bot added the parquet Changes to the parquet crate label Jun 3, 2026
Comment thread parquet/src/arrow/arrow_writer/mod.rs Outdated
Comment thread parquet/src/arrow/arrow_writer/mod.rs Outdated
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Write-REE-flat branch from 9171f7b to be83375 Compare June 4, 2026 20:47
@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Write-REE-flat branch 2 times, most recently from 85cc317 to 4fc61f5 Compare June 8, 2026 01:48
@Rich-T-kid
Copy link
Copy Markdown
Contributor Author

With this PR, its now possible to take a REE and write it out to parquet. test & benchmarks are included in the PR. Also ran a couple local test and used parquetReader to validate.

@Rich-T-kid
Copy link
Copy Markdown
Contributor Author

Rich-T-kid commented Jun 8, 2026

Tried to break the commits into 3 independent pieces (test & benchmarks | implementation | more test/edge cases & benchmarks (null density) )
@Jefffrey @albertlockett Do you mind taking a look when you get a chance? Thank you 🚀

@Rich-T-kid Rich-T-kid force-pushed the rich-T-kid/Write-REE-flat branch from 4fc61f5 to 50aa7c3 Compare June 8, 2026 02:06
@Rich-T-kid Rich-T-kid marked this pull request as ready for review June 8, 2026 02:07
@Rich-T-kid Rich-T-kid changed the title [8016][Draft] Support writing hydrated REE arrays to Parquet [8016]Support writing hydrated REE arrays to Parquet Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support converting RunEndEncodedType to parquet

1 participant