Skip to content

perf: stream notes to fast-import during commit loop#856

Draft
svarlamov wants to merge 16 commits intomainfrom
perf/streaming-fast-import
Draft

perf: stream notes to fast-import during commit loop#856
svarlamov wants to merge 16 commits intomainfrom
perf/streaming-fast-import

Conversation

@svarlamov
Copy link
Copy Markdown
Member

Summary

Stacked on #854. Experimental optimization that streams note blob entries to fast-import incrementally during the commit processing loop, rather than batching at the end.

  • Realistic benchmark: 798ms → 702ms overhead (12% reduction)
  • Heavy benchmark: 1462ms → 1337ms overhead (8.5% reduction)

Adds ~200 lines for StreamingNotesWriter struct. Tradeoff: moderate complexity for modest gain.

🤖 Generated with Claude Code

svarlamov and others added 16 commits March 29, 2026 02:07
Replace per-file content diffing (imara_diff Myers) with hunk-based
attribution transfer using git diff-tree -p -U0 hunk headers. For
commits after first appearance, adjusts attribution line numbers via
O(attrs+hunks) arithmetic instead of O(file_length) Myers diff.

Key changes:
- Combined diff-tree call extracts both metadata and hunks in one subprocess
- Only read blob contents for first-seen files (99% reduction in blob I/O)
- Hunk-based transfer for all subsequent commits (no diff algorithm needed)

Benchmark (100 commits x 30 files x 200 lines):
- Authorship overhead: 1450ms -> 413ms (3.5x faster)
- Commit processing loop: 1247ms -> 28ms (44x faster)
- Blob reads: 3000 -> 30 (99% reduction)
- 3056 integration tests pass, 0 regressions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ed var)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ching

The attribution_reconstruction phase was the dominant bottleneck (51% of
total authorship overhead) in realistic rebases. Root cause: reading
file contents for ALL pathspecs × ALL commits via cat-file, even when
most commits' notes only reference a few files.

Key changes:
- Selective cat-file batch: only read files referenced in each commit's
  note attestations, not all pathspecs for every commit (~12x fewer
  reads in the realistic benchmark)
- HEAD note uses already-loaded HEAD file contents (no extra git call)
- Always perform full content-based scan across ALL commits' notes
  (no HEAD-only shortcut — each note only covers its commit's changes)
- Avoid unnecessary subtract+add metrics cycle for deleted/skipped files
- Use literal "human" comparison instead of allocating via to_str()

Realistic benchmark (150 commits × 50 files, 10-2000+ lines):
- attribution_reconstruction: 821ms → 192ms (4.3x faster)
- Total overhead: 1617ms → 1008ms (1.6x faster)

Adds 3 regression tests proving non-HEAD attribution survives rebase:
- test_rebase_preserves_attribution_from_non_head_commits
- test_rebase_preserves_multi_commit_attribution_same_file
- test_rebase_non_head_attribution_survives_slow_path

All 79 rebase + 21 cherry-pick tests pass with zero regressions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arser

Addresses Devin review findings:

1. Rename/copy entries (R/C status) in --raw format without -z produce
   "old_path\tnew_path" — use the destination path for attribution tracking.

2. First-appearance blob optimization now only marks files as "seen" when
   they have a non-None blob OID. Deletions no longer prevent later
   re-creations from having their blob content read.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The metadata template is frozen before the commit loop, so the
per-commit subtract→add→apply_to_prompts cycle was computing
metrics that never appeared in the output. Remove the dead
computation to save ~120ms in the commit processing loop.

Also fix timing instrumentation: loop:serialize and loop:metrics
were using as_millis() per-iteration (always 0), now use
as_micros() for accurate sub-millisecond accumulation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The shutil.rmtree call on the benchmark repo directory intermittently
fails with "Directory not empty: 'objects'" on CI runners. This is a
cleanup step after all benchmarks have already completed successfully,
so using ignore_errors=True prevents spurious CI failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previous run had Windows jobs stuck in pending for hours due to
GitHub Actions runner availability issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two bugs found by Devin review:

1. first_appearance_blobs: seen_files was not cleared on file deletion,
   so when a file was deleted and recreated with different content, the
   new blob OID was never read (seen_files.insert returned false).

2. files_with_synced_state: was not cleared on file deletion, so
   recreation incorrectly used the hunk-based fast path with stale
   pre-deletion attributions instead of content-diff.

Adds regression test for delete-then-recreate with different content.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of spawning separate git diff-tree processes for new commits and
original commits, concatenate both commit lists into a single --stdin
call and partition the results afterward. This eliminates ~500ms of
subprocess overhead (git process startup + repo loading) on large rebases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Start the fast-import subprocess early (overlapping process startup with
diff-tree) and write blob entries incrementally as commits are processed,
rather than batching all notes at the end. This overlaps fast-import's
blob ingestion with our commit processing computation.

Realistic benchmark: 798ms → 702ms overhead (12% reduction)
Heavy benchmark: 1462ms → 1337ms overhead (8.5% reduction)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@svarlamov svarlamov force-pushed the perf/rebase-reconstruction-opt branch from 0f72020 to fb6091f Compare March 29, 2026 22:45
Base automatically changed from perf/rebase-reconstruction-opt to main March 30, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant