Skip to content

perf(client/extract): cut hot-path copies + a per-file lock (R1/R2/R3)#65

Merged
russellromney merged 1 commit into
mainfrom
perf/hot-path-copies
Jun 26, 2026
Merged

perf(client/extract): cut hot-path copies + a per-file lock (R1/R2/R3)#65
russellromney merged 1 commit into
mainfrom
perf/hot-path-copies

Conversation

@russellromney

Copy link
Copy Markdown
Owner

Behavior-preserving allocation/lock reductions on the download + extract hot path, from the adversarial review (ADVERSARIAL_REVIEW_2026-06-25.md).

⚠️ Benchmark before merge. Correctness is verified (full cargo test + clippy -D warnings + fmt), but the speedup is not measured — this sandbox has no live clone target, and the repo's only perf tooling is benchmark/*.sh (real clones against a running server). Please run benchmark/baseline.sh before/after to confirm the win before merging. None of these change behavior, so worst case is "no measurable gain," not a regression.

R3 — per-file MutexAtomicBool

The "has any thread failed?" check in the parallel pack extractor took a Mutex lock once per file just to poll. Now a cheap AtomicBool load; the Mutex is kept only to carry the actual error value, written once on the rare failure path.

R2 — borrow the fragment-pair map

The per-frame fragment-pair list was .cloned() out of the shared map on every frame; the loop only reads it, so borrow it instead.

R1 — fetch_artifact* return bytes::Bytes (pack path)

The fetch boundary .to_vec()-ed every response body. Now it returns refcounted Bytes, so a downloaded pack/idx flows to disk + the index reader with no second per-artifact copy; the idx-bundle slice is a zero-copy Bytes::slice.

Scope note: the archive streaming channel still carries Vec<u8> (one copy at the producer — same as before, no regression). Carrying Bytes through that pipeline is the separate R4 change and is deferred, since it entangles with the streaming-decompress channels.

Adds a bytes dependency (already transitive via reqwest).

🤖 Generated with Claude Code

Behavior-preserving allocation/lock reductions on the download+extract hot path.
The full test suite + clippy pass, but the *speedup* is unverified in this
environment (the only perf tooling is benchmark/*.sh against a live clone
target). Run `benchmark/baseline.sh` before/after to confirm before merging.

- R3: the per-file "has any thread failed?" check took a Mutex once per file.
  It's now a cheap AtomicBool load; the Mutex is kept only to hold the actual
  error value, written once on the rare failure path.
- R2: the per-frame fragment-pair list was `.cloned()` out of the shared map
  every frame; borrow it instead (the loop only reads it).
- R1 (pack path): `fetch_artifact*` now return `bytes::Bytes` (a refcounted
  buffer) instead of `.to_vec()`-ing the response body, so a downloaded
  pack/idx flows to disk and the index reader with no second per-artifact copy.
  The idx-bundle slice is now a zero-copy `Bytes::slice`. The archive streaming
  channel still carries `Vec<u8>` (one copy at the producer, same as before — no
  regression); carrying `Bytes` through that pipeline is the separate R4 change
  and is deferred (it entangles with the streaming decompress channels).

Refs: ADVERSARIAL_REVIEW_2026-06-25.md R1/R2/R3 (R4 deferred)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@russellromney russellromney merged commit 5f6dc50 into main Jun 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant