Skip to content

Trace latency: move maintenance to leases and add recent Compact #287

Description

@thorrester

Parent: #281

Goal

Move Delta table maintenance under Postgres-coordinated leases and add recent bin-pack compaction where it is supported by the Delta writer stack.

This reduces small-file pressure without letting multiple pods compact, vacuum, or refresh table state in conflicting ways.

Scope

  • Add lease-managed maintenance tasks for DataFusion/Delta tables.
  • Keep nightly Z-ORDER where it already makes sense.
  • Add a separate recent bin-pack Compact task for active partitions if the pinned Delta implementation supports it.
  • Validate support for OptimizeType::Compact and target-size configuration in the pinned Delta crate before assuming the API exists.
  • Do not rely on delta.autoOptimize.optimizeWrite=true unless the pinned Rust Delta implementation actually honors it.
  • Add refresh-origin tracking so background refreshes, write-commit refreshes, maintenance refreshes, and any accidental request-path refreshes can be distinguished.
  • Ensure maintenance covers trace spans, trace summaries, GenAI trace tables, Bifrost datasets, dispatch records if applicable, and eval scenarios.

High-level design

Small files multiply footer reads and bloom/index page checks. Recent bin-pack compaction is useful because daytime writes create small files before nightly Z-ORDER has a chance to cluster them.

Maintenance should be lease-driven. In multi-pod deployments, one pod should own a maintenance action at a time, while other pods keep serving from their current snapshots and pick up the new Delta version through background refresh.

Acceptance criteria

  • Maintenance tasks are protected by Postgres leases.
  • Compact support is verified against the pinned Delta crate before implementation.
  • If Compact is unsupported, the issue records the supported alternative instead of silently implementing a no-op.
  • Refresh-origin metrics exist and show user requests are not waiting on maintenance refreshes.
  • Benchmarks show file-count and object-store request-count impact for recent partitions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions