perf(client/extract): cut hot-path copies + a per-file lock (R1/R2/R3)#65
Merged
Conversation
Behavior-preserving allocation/lock reductions on the download+extract hot path. The full test suite + clippy pass, but the *speedup* is unverified in this environment (the only perf tooling is benchmark/*.sh against a live clone target). Run `benchmark/baseline.sh` before/after to confirm before merging. - R3: the per-file "has any thread failed?" check took a Mutex once per file. It's now a cheap AtomicBool load; the Mutex is kept only to hold the actual error value, written once on the rare failure path. - R2: the per-frame fragment-pair list was `.cloned()` out of the shared map every frame; borrow it instead (the loop only reads it). - R1 (pack path): `fetch_artifact*` now return `bytes::Bytes` (a refcounted buffer) instead of `.to_vec()`-ing the response body, so a downloaded pack/idx flows to disk and the index reader with no second per-artifact copy. The idx-bundle slice is now a zero-copy `Bytes::slice`. The archive streaming channel still carries `Vec<u8>` (one copy at the producer, same as before — no regression); carrying `Bytes` through that pipeline is the separate R4 change and is deferred (it entangles with the streaming decompress channels). Refs: ADVERSARIAL_REVIEW_2026-06-25.md R1/R2/R3 (R4 deferred) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Behavior-preserving allocation/lock reductions on the download + extract hot path, from the adversarial review (
ADVERSARIAL_REVIEW_2026-06-25.md).R3 — per-file
Mutex→AtomicBoolThe "has any thread failed?" check in the parallel pack extractor took a
Mutexlock once per file just to poll. Now a cheapAtomicBoolload; theMutexis kept only to carry the actual error value, written once on the rare failure path.R2 — borrow the fragment-pair map
The per-frame fragment-pair list was
.cloned()out of the shared map on every frame; the loop only reads it, so borrow it instead.R1 —
fetch_artifact*returnbytes::Bytes(pack path)The fetch boundary
.to_vec()-ed every response body. Now it returns refcountedBytes, so a downloaded pack/idx flows to disk + the index reader with no second per-artifact copy; the idx-bundle slice is a zero-copyBytes::slice.Scope note: the archive streaming channel still carries
Vec<u8>(one copy at the producer — same as before, no regression). CarryingBytesthrough that pipeline is the separate R4 change and is deferred, since it entangles with the streaming-decompress channels.Adds a
bytesdependency (already transitive via reqwest).🤖 Generated with Claude Code