Skip to content

refactor(parquet): bundle array reader recursion args into ReaderArgs#10089

Merged
alamb merged 1 commit into
apache:mainfrom
HippoBaro:reader_args
Jun 8, 2026
Merged

refactor(parquet): bundle array reader recursion args into ReaderArgs#10089
alamb merged 1 commit into
apache:mainfrom
HippoBaro:reader_args

Conversation

@HippoBaro
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The recursive build_reader / build_*_reader methods in the array reader builder thread field and mask through every call.

What changes are included in this PR?

Bundle them into a small Copy ReaderArgs struct so the recursive signatures stay compact and there is a single, documented home for per-field reader options added in the future. This is a mechanical, behavior-preserving change: build_array_reader constructs the args at the entry point, group readers recurse with args.with_field(child), and leaf readers read args.field and args.mask.

Are these changes tested?

All tests passing.

Are there any user-facing changes?

No.

The recursive `build_reader` / `build_*_reader` methods in the array
reader builder thread `field` and `mask` through every call. Bundle them
into a small `Copy` `ReaderArgs` struct so the recursive signatures stay
compact and there is a single, documented home for per-field reader
options added in the future.

This is a mechanical, behavior-preserving change: `build_array_reader`
constructs the args at the entry point, group readers recurse with
`args.with_field(child)`, and leaf readers read `args.field` and
`args.mask`. No functional change.
@github-actions github-actions Bot added the parquet Changes to the parquet crate label Jun 7, 2026
Copy link
Copy Markdown
Contributor

@SoimanVasile SoimanVasile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really clean refactor! I pulled this locally and ran cargo test and clippy against the parquet crate—everything compiles cleanly and all tests pass. Bundling these into ReaderArgs definitely makes the recursive builder thread much easier to read.

I just noticed two minor details while reading through:

  1. Consistency Nitpick: In the sub-builder functions (like build_map_reader, build_list_reader), you instantiate a local binding let field = args.field; at the top. However, in the main build_reader function, you match on args.field.field_type directly without the local binding. It might be slightly cleaner to pick one pattern and stick to it uniformly across the file.

  2. unwrap() Safety: In build_map_reader (and the list/struct readers), there is:
    let children = field.children().unwrap();
    Even though the Parquet schema invariants should guarantee that map fields have children, since this function already returns a Result, would it be safer to bubble this up as an Arrow schema error instead of panicking if a malformed file somehow slips through?
    (e.g., field.children().ok_or_else(|| arrow_err!("Map field must have children"))?)

    Aside from those minor points, the logic and the with_field implementation look rock solid. Looks good to me.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @HippoBaro and @SoimanVasile

@alamb alamb merged commit 6fae4ea into apache:main Jun 8, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants