Trace latency: validate Parquet bloom filters

Parent: #281

# Goal

Validate whether Parquet bloom filters are actually written, read, and effective for the current Delta/DataFusion stack.

Writer properties alone are not enough. We need evidence that bloom filters exist in produced files and that the reader uses them in the workloads where we expect a win.

# Scope

- Inspect generated Parquet files and confirm bloom filter metadata exists for intended columns.
- Verify whether the pinned DataFusion/Parquet reader uses those bloom filters for trace-id and selective entity predicates.
- Add tests or benchmark checks that can distinguish “bloom filters configured” from “bloom filters reducing reads.”
- Document unsupported or ineffective cases.

# High-level design

Bloom filters are most useful after partition and file pruning have already narrowed the candidate set. They should not be treated as the primary fix for broad scans or small-file pressure.

This issue should produce a clear yes/no answer for each intended bloom-filter column and reader path.

# Acceptance criteria

- Generated Parquet files are inspected for bloom filter metadata.
- Benchmarks or tests show whether bloom filters reduce row-group/page reads for trace-id lookups.
- Unsupported columns or reader paths are documented in the issue.
- Follow-up tuning is based on measured behavior, not writer-property assumptions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trace latency: validate Parquet bloom filters #292

Goal

Scope

High-level design

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Trace latency: validate Parquet bloom filters #292

Description

Goal

Scope

High-level design

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions