Skip to content

Trace latency: fix writer properties, cache shape, vacuum safety, and eval scenario partitioning #283

Description

@thorrester

Parent: #281

Goal

Fix the correctness and file-layout issues that can invalidate later performance work.

This includes writer-property consistency, result-cache key shape, vacuum safety, table maintenance coverage, and the first layout for eval_scenarios.

Scope

  • Create a shared writer-properties helper for DataFusion/Delta table writes.
  • Make writer profiles explicit per table type: trace spans, trace summaries, GenAI trace tables, Bifrost datasets, dispatch records, control tables where relevant, and eval scenarios.
  • Make sure tables that need bloom filters, compression, row-group sizing, and column encodings actually get those writer properties when written and optimized.
  • Audit trace_dispatch; either add writer properties and maintenance hooks or document why it is intentionally exempt.
  • Fix unsafe vacuum behavior so one pod cannot remove files still needed by another pod’s active snapshot.
  • Fix result-cache key shape and weights before TTL changes. Cache keys need the full query shape, not just a partial set of inputs.
  • Ship eval_scenarios as a partitioned Delta table using a created-date partition derived from created_at. This table has not shipped, so no migration is needed.

High-level design

The shared writer helper should remove copy/paste drift between engines. It should not force every table into the same physical layout. Different tables have different query patterns, so the helper should expose table profiles rather than a single global set of Parquet knobs.

eval_scenarios should be corrected before release. Use a dedicated date partition column derived from created_at, with a stable type suitable for Hive-style partition pruning.

Acceptance criteria

  • Writer properties are built through one shared path or explicitly documented as intentionally different.
  • Trace summaries, trace spans, GenAI trace tables, Bifrost tables, dispatch records, and eval scenarios have an explicit maintenance/layout decision.
  • eval_scenarios writes include a created-date partition derived from created_at.
  • Result-cache keys include the full query shape required for correctness.
  • Vacuum behavior is safe under multi-pod readers.
  • Tests cover writer-property construction and the eval_scenarios partition column.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions