Parent: #281
Goal
Validate whether Parquet bloom filters are actually written, read, and effective for the current Delta/DataFusion stack.
Writer properties alone are not enough. We need evidence that bloom filters exist in produced files and that the reader uses them in the workloads where we expect a win.
Scope
- Inspect generated Parquet files and confirm bloom filter metadata exists for intended columns.
- Verify whether the pinned DataFusion/Parquet reader uses those bloom filters for trace-id and selective entity predicates.
- Add tests or benchmark checks that can distinguish “bloom filters configured” from “bloom filters reducing reads.”
- Document unsupported or ineffective cases.
High-level design
Bloom filters are most useful after partition and file pruning have already narrowed the candidate set. They should not be treated as the primary fix for broad scans or small-file pressure.
This issue should produce a clear yes/no answer for each intended bloom-filter column and reader path.
Acceptance criteria
- Generated Parquet files are inspected for bloom filter metadata.
- Benchmarks or tests show whether bloom filters reduce row-group/page reads for trace-id lookups.
- Unsupported columns or reader paths are documented in the issue.
- Follow-up tuning is based on measured behavior, not writer-property assumptions.
Parent: #281
Goal
Validate whether Parquet bloom filters are actually written, read, and effective for the current Delta/DataFusion stack.
Writer properties alone are not enough. We need evidence that bloom filters exist in produced files and that the reader uses them in the workloads where we expect a win.
Scope
High-level design
Bloom filters are most useful after partition and file pruning have already narrowed the candidate set. They should not be treated as the primary fix for broad scans or small-file pressure.
This issue should produce a clear yes/no answer for each intended bloom-filter column and reader path.
Acceptance criteria