Skip to content

Rollup → raw segment provenance (closes spec §19.10)#15

Merged
pbudzik merged 1 commit into
mainfrom
feat/rollup-input-provenance
May 16, 2026
Merged

Rollup → raw segment provenance (closes spec §19.10)#15
pbudzik merged 1 commit into
mainfrom
feat/rollup-input-provenance

Conversation

@pbudzik

@pbudzik pbudzik commented May 16, 2026

Copy link
Copy Markdown
Owner

Summary

Each rollup segment now records the list of raw segment IDs whose events contributed to it. The explain endpoint surfaces this so operators can drill from an invoice line back to the source raw segments. Closes the deferred half of spec §19.10.

What changed

  • SegmentMeta gains input_segment_ids: Vec<String> with #[serde(default)] so existing manifests deserialize unchanged. Empty for raw segments (ground truth) and compacted segments (their lineage already lives in Manifest.compacted_replacements). Populated for rollup segments.
  • RollupWorker.tick now tracks per-(hour, bucket) inputs in a BTreeSet<String> during the scan and stamps the sorted list on the output rollup segment's meta. Sorting via BTreeSet keeps the encoding deterministic.
  • handle_explain adds a rollup_inputs: { rollup_id → [raw_id, ...] } map to the response. The existing flat rollup_segments list of IDs is kept for callers that don't need provenance.

Example explain response

{
  "account_id": "acc_x",
  "watermark_ms": 1701900000000,
  "lines": [...],
  "rollup_segments": ["rollup_abc..."],
  "rollup_inputs": {
    "rollup_abc...": ["raw_def...", "raw_ghi..."]
  },
  "raw_segments": [...],
  "corrections": [...]
}

Spec coverage

Invariant Before After
§19.10 — invoice snapshots reference watermark + source segment set watermark only watermark + per-rollup raw input set

Tests

tests/rollup_provenance.rs — 3 new tests:

  • rollup_segment_records_contributing_raw_segments — two raw segments → rollup lists both
  • rollup_input_only_lists_segments_that_actually_contributed — out-of-hour segment is not listed
  • explain_endpoint_surfaces_rollup_inputs_map — full end-to-end through the HTTP layer

Test plan

  • cargo build --all-targets clean with -D warnings
  • cargo test --all-targets — 91 tests pass (was 88; +3)
  • CI green

🤖 Generated with Claude Code

Each rollup segment now records the list of raw segment IDs whose
events contributed to it. The explain endpoint surfaces this as a
`rollup_inputs: { rollup_id → [raw_id, ...] }` map so an operator
can drill from an invoice line back to the source raw segments.

This closes the deferred half of spec §19.10 ("invoice snapshots must
reference both a watermark and a source segment set"). The watermark
half was already covered by `Manifest.watermarks.hourly_rollup_ms`;
this is the segment-set half.

Changes:
  - SegmentMeta gains `input_segment_ids: Vec<String>` with
    #[serde(default)] so existing manifests deserialize unchanged.
    Empty for raw segments (they're ground truth) and compacted
    segments (their provenance lives in `compacted_replacements`).
  - RollupWorker.tick tracks per-(hour, bucket) inputs in a
    BTreeSet<String> during the scan, then stamps the sorted list on
    the rollup segment meta. Sorted via BTreeSet so the on-disk
    encoding is deterministic.
  - write_rollup_segment / build_rollup_segment_meta thread the
    Vec<String> through.
  - handle_explain reads each overlapping rollup segment's
    input_segment_ids and adds them to the response as a
    `rollup_inputs` object. Existing `rollup_segments` list of IDs
    is kept for callers that don't need provenance.

Tests (tests/rollup_provenance.rs, 3 tests):
  - rollup_segment_records_contributing_raw_segments — two raw
    segments → rollup lists both as inputs
  - rollup_input_only_lists_segments_that_actually_contributed —
    out-of-hour segment is NOT listed
  - explain_endpoint_surfaces_rollup_inputs_map — full end-to-end
    through the HTTP layer

Total tests: 91 (was 88; +3). Clean under RUSTFLAGS=-D warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pbudzik pbudzik merged commit ee75625 into main May 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant