Persistent dedupe sidecar: per-segment .idx for fast recovery (closes last Phase A item)#21
Merged
Merged
Conversation
Closes the last Phase A item from the external review's correctness
contract: "persistent dedupe index" so recovery doesn't have to scan
every event in the TTL window.
Mechanism. Each raw segment now writes a sidecar `.idx` file holding
`Vec<(event_id_hash, payload_hash, ingested_at_ms)>` in segment row
order. Recovery prefers the sidecar; without it, it falls back to
the existing segment-scan path. So the change is non-breaking —
older segments without an .idx still recover correctly, just slower.
The sidecar avoids the bincode + zstd decode of the event column on
every segment in the dedupe TTL window. For a 1M-event segment with
realistic event payloads, that's the difference between hundreds of
milliseconds and tens.
File format (`raw_<uuid>.idx`):
magic b"UDBIDX01" (8 bytes)
count u32 LE
entries bincode Vec<(u128, u128, i64)>
checksum u64 LE (blake3 over everything above)
Corruption → caller falls back to the segment scan; the sidecar is
an optimization, not a source of truth. The actual event_id /
payload_hash / ingested_at_ms can always be recomputed from the
segment.
Plumbing:
- New src/storage/dedupe_index.rs with write/read functions
- flusher.write_bucket writes the sidecar after the segment + before
the manifest commit. Failure is non-fatal — log and remove the
partial sidecar; segment is fine.
- compact/worker.run_one_plan does the same for compacted outputs.
- runtime/recovery tries `read_dedupe_index` first; falls back to
`RawSegmentReader::new` + per-event hashing on Ok(None) or Err.
Logs sidecar hit/miss counts so operators can spot incomplete
sidecar coverage.
Tests (tests/dedupe_index.rs, 8 tests):
- write_then_read_round_trips_entries (incl. u128::MAX, i64::MAX)
- missing_sidecar_returns_ok_none
- corrupt_checksum_is_rejected (byte-flip mid-payload)
- truncated_sidecar_is_rejected
- missing_magic_is_rejected
- recovery_uses_sidecar_to_rebuild_dedupe (fast path)
- recovery_falls_back_to_segment_scan_without_sidecar
- dedupe_entry_uses_event_hash_type (compile-time sanity)
Total tests: 122 (was 114; +8). Clean under RUSTFLAGS=-D warnings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the last open item from Phase A (the external reviewer's "correctness contract"): persistent dedupe index. Each raw / compacted segment now writes a sidecar
.idxfile with(event_id_hash, payload_hash, ingested_at_ms)triples in segment row order, and recovery uses it to rebuild the hot dedupe cache without re-decoding events.Non-breaking
The sidecar is an optimization, not a source of truth. Recovery:
raw_<uuid>.idxfirst → loads triples directly into dedupe.RawSegmentReader::new+ per-event hashing (the existing path).Segments from before this PR have no sidecar; they continue to recover via the fallback. No format version bump.
File format
Validated on read: magic + checksum + count consistency. Any failure causes the caller to fall back to the segment scan.
Win
Currently recovery scans every event in every segment whose
max_timestamp_msis within the dedupe TTL window (7 days). For a 1M-event segment with realistic payloads that's a bincode + zstd decode of every event. The sidecar avoids that — it's a singleread_to_end+ bincode deserialize of 32 bytes/event.Recovery logs sidecar hit / miss counts so operators can spot incomplete coverage (e.g., after rolling back to a build that didn't write sidecars).
Plumbing
src/storage/dedupe_index.rswithwrite_dedupe_index+read_dedupe_index+index_pathhelpers.flusher.write_bucketwrites the sidecar after the segment, before the manifest commit. Failure is non-fatal — log + remove partial.compact/worker.run_one_plandoes the same for compacted outputs.runtime/recoverytries the sidecar first; falls back onOk(None)orErr.Tests
8 new tests in
tests/dedupe_index.rs:u128::MAXandi64::MAXboundary valuesOk(None)on missing file (fall-back signal)DedupeEntrytype alias compiles to(EventHash, EventHash, i64)Test plan
cargo build --all-targetsclean with-D warningscargo test --all-targets— 122 tests pass (was 114; +8)🤖 Generated with Claude Code