Skip to content

Trace latency: add bounded metadata warmup #288

Description

@thorrester

Parent: #281

Goal

Add bounded metadata warmup so a new pod can populate useful Delta and Parquet metadata before taking normal traffic.

Warmup should reduce first-query cold-path cost, but it must not become an unbounded startup scan.

Scope

  • Warm recent Delta table metadata on startup.
  • Prefetch bounded Parquet metadata ranges for recent or high-value partitions.
  • Include footers and bloom/filter/index regions where those byte ranges can be identified safely.
  • Add configuration for warmup window, concurrency, byte caps, file caps, and timeout.
  • Keep readiness blocking configurable. Default behavior should avoid making object-store slowness a hard startup outage.
  • Do not depend on PVC-backed persistence.

High-level design

Warmup is useful within a pod lifetime and after deploys, especially when combined with the in-memory object-store range cache. It is not a substitute for better file layout or bounded queries.

The implementation should prefer “warm the most likely useful metadata” over “touch every active file.” A partition with 100k files should degrade gracefully instead of spending minutes prefetching metadata before the server starts.

Acceptance criteria

  • Warmup has bounded file, byte, concurrency, and time controls.
  • Warmup metrics report files attempted, files warmed, bytes read, duration, and errors.
  • Readiness behavior is configurable and has a timeout.
  • A pod can start and serve traffic if warmup fails or times out, with degraded status logged.
  • Benchmarks compare first-query latency with warmup disabled and enabled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions