Skip to content

Persistent dedupe sidecar: per-segment .idx for fast recovery (closes last Phase A item)#21

Merged
pbudzik merged 1 commit into
mainfrom
feat/persistent-dedupe-index
May 16, 2026
Merged

Persistent dedupe sidecar: per-segment .idx for fast recovery (closes last Phase A item)#21
pbudzik merged 1 commit into
mainfrom
feat/persistent-dedupe-index

Conversation

@pbudzik

@pbudzik pbudzik commented May 16, 2026

Copy link
Copy Markdown
Owner

Summary

Closes the last open item from Phase A (the external reviewer's "correctness contract"): persistent dedupe index. Each raw / compacted segment now writes a sidecar .idx file with (event_id_hash, payload_hash, ingested_at_ms) triples in segment row order, and recovery uses it to rebuild the hot dedupe cache without re-decoding events.

Non-breaking

The sidecar is an optimization, not a source of truth. Recovery:

  1. Tries raw_<uuid>.idx first → loads triples directly into dedupe.
  2. On missing or corrupt sidecar → falls back to RawSegmentReader::new + per-event hashing (the existing path).

Segments from before this PR have no sidecar; they continue to recover via the fallback. No format version bump.

File format

magic     b"UDBIDX01"  (8 bytes)
count     u32 LE
entries   bincode Vec<(u128, u128, i64)>
checksum  u64 LE       (blake3 of everything above)

Validated on read: magic + checksum + count consistency. Any failure causes the caller to fall back to the segment scan.

Win

Currently recovery scans every event in every segment whose max_timestamp_ms is within the dedupe TTL window (7 days). For a 1M-event segment with realistic payloads that's a bincode + zstd decode of every event. The sidecar avoids that — it's a single read_to_end + bincode deserialize of 32 bytes/event.

Recovery logs sidecar hit / miss counts so operators can spot incomplete coverage (e.g., after rolling back to a build that didn't write sidecars).

Plumbing

  • New src/storage/dedupe_index.rs with write_dedupe_index + read_dedupe_index + index_path helpers.
  • flusher.write_bucket writes the sidecar after the segment, before the manifest commit. Failure is non-fatal — log + remove partial.
  • compact/worker.run_one_plan does the same for compacted outputs.
  • runtime/recovery tries the sidecar first; falls back on Ok(None) or Err.

Tests

8 new tests in tests/dedupe_index.rs:

  • Round-trip with u128::MAX and i64::MAX boundary values
  • Ok(None) on missing file (fall-back signal)
  • Corruption rejection: byte-flip checksum, truncation, missing magic
  • Recovery picks up dedupe via the fast path
  • Recovery falls back to segment scan without a sidecar
  • DedupeEntry type alias compiles to (EventHash, EventHash, i64)

Test plan

  • cargo build --all-targets clean with -D warnings
  • cargo test --all-targets — 122 tests pass (was 114; +8)
  • CI green
  • Bump PR Bump Cargo version to 0.4.0 #20 merged first (Cargo version is 0.3.0 on this branch; will rebase or auto-conflict on Cargo.toml line)

🤖 Generated with Claude Code

Closes the last Phase A item from the external review's correctness
contract: "persistent dedupe index" so recovery doesn't have to scan
every event in the TTL window.

Mechanism. Each raw segment now writes a sidecar `.idx` file holding
`Vec<(event_id_hash, payload_hash, ingested_at_ms)>` in segment row
order. Recovery prefers the sidecar; without it, it falls back to
the existing segment-scan path. So the change is non-breaking —
older segments without an .idx still recover correctly, just slower.

The sidecar avoids the bincode + zstd decode of the event column on
every segment in the dedupe TTL window. For a 1M-event segment with
realistic event payloads, that's the difference between hundreds of
milliseconds and tens.

File format (`raw_<uuid>.idx`):
  magic     b"UDBIDX01"  (8 bytes)
  count     u32 LE
  entries   bincode Vec<(u128, u128, i64)>
  checksum  u64 LE  (blake3 over everything above)

Corruption → caller falls back to the segment scan; the sidecar is
an optimization, not a source of truth. The actual event_id /
payload_hash / ingested_at_ms can always be recomputed from the
segment.

Plumbing:
  - New src/storage/dedupe_index.rs with write/read functions
  - flusher.write_bucket writes the sidecar after the segment + before
    the manifest commit. Failure is non-fatal — log and remove the
    partial sidecar; segment is fine.
  - compact/worker.run_one_plan does the same for compacted outputs.
  - runtime/recovery tries `read_dedupe_index` first; falls back to
    `RawSegmentReader::new` + per-event hashing on Ok(None) or Err.
    Logs sidecar hit/miss counts so operators can spot incomplete
    sidecar coverage.

Tests (tests/dedupe_index.rs, 8 tests):
  - write_then_read_round_trips_entries (incl. u128::MAX, i64::MAX)
  - missing_sidecar_returns_ok_none
  - corrupt_checksum_is_rejected (byte-flip mid-payload)
  - truncated_sidecar_is_rejected
  - missing_magic_is_rejected
  - recovery_uses_sidecar_to_rebuild_dedupe (fast path)
  - recovery_falls_back_to_segment_scan_without_sidecar
  - dedupe_entry_uses_event_hash_type (compile-time sanity)

Total tests: 122 (was 114; +8). Clean under RUSTFLAGS=-D warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pbudzik pbudzik merged commit bd8e1f0 into main May 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant