Parent: #281
Goal
Add bounded metadata warmup so a new pod can populate useful Delta and Parquet metadata before taking normal traffic.
Warmup should reduce first-query cold-path cost, but it must not become an unbounded startup scan.
Scope
- Warm recent Delta table metadata on startup.
- Prefetch bounded Parquet metadata ranges for recent or high-value partitions.
- Include footers and bloom/filter/index regions where those byte ranges can be identified safely.
- Add configuration for warmup window, concurrency, byte caps, file caps, and timeout.
- Keep readiness blocking configurable. Default behavior should avoid making object-store slowness a hard startup outage.
- Do not depend on PVC-backed persistence.
High-level design
Warmup is useful within a pod lifetime and after deploys, especially when combined with the in-memory object-store range cache. It is not a substitute for better file layout or bounded queries.
The implementation should prefer “warm the most likely useful metadata” over “touch every active file.” A partition with 100k files should degrade gracefully instead of spending minutes prefetching metadata before the server starts.
Acceptance criteria
- Warmup has bounded file, byte, concurrency, and time controls.
- Warmup metrics report files attempted, files warmed, bytes read, duration, and errors.
- Readiness behavior is configurable and has a timeout.
- A pod can start and serve traffic if warmup fails or times out, with degraded status logged.
- Benchmarks compare first-query latency with warmup disabled and enabled.
Parent: #281
Goal
Add bounded metadata warmup so a new pod can populate useful Delta and Parquet metadata before taking normal traffic.
Warmup should reduce first-query cold-path cost, but it must not become an unbounded startup scan.
Scope
High-level design
Warmup is useful within a pod lifetime and after deploys, especially when combined with the in-memory object-store range cache. It is not a substitute for better file layout or bounded queries.
The implementation should prefer “warm the most likely useful metadata” over “touch every active file.” A partition with 100k files should degrade gracefully instead of spending minutes prefetching metadata before the server starts.
Acceptance criteria