refactor(parquet): bundle array reader recursion args into ReaderArgs#10089
Conversation
The recursive `build_reader` / `build_*_reader` methods in the array reader builder thread `field` and `mask` through every call. Bundle them into a small `Copy` `ReaderArgs` struct so the recursive signatures stay compact and there is a single, documented home for per-field reader options added in the future. This is a mechanical, behavior-preserving change: `build_array_reader` constructs the args at the entry point, group readers recurse with `args.with_field(child)`, and leaf readers read `args.field` and `args.mask`. No functional change.
SoimanVasile
left a comment
There was a problem hiding this comment.
This is a really clean refactor! I pulled this locally and ran cargo test and clippy against the parquet crate—everything compiles cleanly and all tests pass. Bundling these into ReaderArgs definitely makes the recursive builder thread much easier to read.
I just noticed two minor details while reading through:
-
Consistency Nitpick: In the sub-builder functions (like
build_map_reader,build_list_reader), you instantiate a local bindinglet field = args.field;at the top. However, in the mainbuild_readerfunction, you match onargs.field.field_typedirectly without the local binding. It might be slightly cleaner to pick one pattern and stick to it uniformly across the file. -
unwrap() Safety: In build_map_reader (and the list/struct readers), there is:
let children = field.children().unwrap();
Even though the Parquet schema invariants should guarantee that map fields have children, since this function already returns a Result, would it be safer to bubble this up as an Arrow schema error instead of panicking if a malformed file somehow slips through?
(e.g.,field.children().ok_or_else(|| arrow_err!("Map field must have children"))?)Aside from those minor points, the logic and the with_field implementation look rock solid. Looks good to me.
alamb
left a comment
There was a problem hiding this comment.
Thank you @HippoBaro and @SoimanVasile
Which issue does this PR close?
Rationale for this change
The recursive
build_reader/build_*_readermethods in the array reader builder threadfieldandmaskthrough every call.What changes are included in this PR?
Bundle them into a small
CopyReaderArgsstruct so the recursive signatures stay compact and there is a single, documented home for per-field reader options added in the future. This is a mechanical, behavior-preserving change:build_array_readerconstructs the args at the entry point, group readers recurse withargs.with_field(child), and leaf readers readargs.fieldandargs.mask.Are these changes tested?
All tests passing.
Are there any user-facing changes?
No.