ArrowReader enhancements for Apache DataFusion Comet

### What's the feature are you trying to implement?

Apache DataFusion Comet is an Apache Spark accelerator with Apache Iceberg support. We would like to enhance that support by leveraging Iceberg-Rust. You can find the details of this effort in the POC PR https://github.com/apache/datafusion-comet/pull/2528 and in [slides presented at the 10/9/25 Iceberg-Rust community call](https://github.com/user-attachments/files/22930897/iceberg-rust.pdf).

The short version is that Comet will rely on Apache Iceberg's Java integration with Apache Spark for planning, and then pass those generated `FileScanTask`s to Iceberg-Rust via a new DataFusion `IcebergScan` operator in Comet. We need a lot of new (or just public) APIs in the `ArrowReader` since we are bypassing the `Table` interface to avoid redundant (and possibly incorrect partitioned) planning. I will start to accumulate those efforts here.

One benefit of this approach is that I can run the Iceberg Java tests against Iceberg Rust's reader. There are gaps in features, so I hope to rapidly iterate on improving Iceberg Rust's reader to support them. I am not using Iceberg Rust's table interface or planning, so others will need to fill the gaps there, but I think this will greatly improve and harden Iceberg Rust's reader.

- [x] Make `ArrowReaderBuilder::new` `pub` instead of `pub(crate)`. (#1748)
- [ ] Expose `ArrowReaderOptions` in `ArrowReaderBuilder`. This likely requires a new Iceberg-Rust Cargo feature like in DataFusion to enable the `encryption` feature for the Parquet crate.
- [x] Read Parquet files without field ID metadata (migrated tables) (#1777)
- [x] Read Parquet files with both equality and position deletes (#1778)
- [x] Filter row groups when FileScanTask includes byte ranges (#1779)
- [x] Equality deletes with partial schemas (#1782)
- [x] Date32 support in RecordBatchTransformer (#1792)
- [x] Date32 default value from days since epoch, not just string (#1803)
- [x] Field ID conflict resolution after addFiles (#1821)
- [ ] Support complex types in pushdown filters
- [ ] Support binary, fixedSizeBinary, and decimal(28+) partition values
- [x] Bugs with position delete files and row group skipping (#1806)
- [x] Failed to deserialize JSON struct with type field (#1822)
- [x] Support struct default values of NULL after schema change, non-NULL is deferred (#1847)
- [x] Support equality deletes with binary type (#1848)

### Willingness to contribute

I can contribute to this feature independently

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArrowReader enhancements for Apache DataFusion Comet #1749

What's the feature are you trying to implement?

Willingness to contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ArrowReader enhancements for Apache DataFusion Comet #1749

Description

What's the feature are you trying to implement?

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions