Fix file upload retry reusing an already-consumed stream body#2897
Conversation
`fetchWithRetry` retried by re-calling `fetch` with the same `init`. When the request body is a Node stream (as in `uploadUsingPutObject` and the multipart part upload), the stream is consumed on the first attempt. On a retry undici was handed an already-disturbed/locked stream and threw the misleading `Response body object should not be disturbed or locked`, masking the underlying network error. `fetchWithRetry` now accepts an optional body factory that produces a fresh body for every attempt, and `uploadUsingPutObject` passes one. When a stream body is supplied without a factory, the request is attempted only once so the real error surfaces instead of the confusing undici message. Co-Authored-By: Claude
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
This PR fixes a retry failure mode in src/lib/client-file-uploader where upload requests that stream files from disk would retry using an already-consumed stream body, causing undici to throw a misleading “disturbed or locked” error and preventing retries from ever succeeding.
Changes:
- Updated
fetchWithRetry()to support an optional body factory (createBody) that generates a fresh request body for each attempt, and to avoid retrying one-shot stream bodies when no factory is provided. - Updated
uploadUsingPutObject()to pass a body factory that recreates the file read stream (and progress passthrough) per retry. - Added unit tests covering stream vs non-stream retry behavior and body recreation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/lib/client-file-uploader.ts |
Adds stream-safe retry support via a body factory and updates PutObject uploads to recreate stream bodies per attempt. |
__tests__/lib/client-file-uploader.js |
Adds coverage for the new fetchWithRetry() behavior (body recreation, stream no-retry safety net, and error propagation). |
|
Thanks for tracking this down — the root cause analysis and the single-PUT fix are correct. I reproduced the exact failure locally (undici 7.25.0, the version shipped in One scope concern before this closes out PLTFRM-2491: the customer's reported case looks like the multipart path, which this PR doesn't convert.
So for the customer's 86 MB deploy on a flaky link, this PR likely surfaces a clearer Two asks:
For context, this stream-reuse defect predates the undici migration: I reproduced the same pattern on the old |
|
Follow-up: I pushed commit 6cd163a to this branch to close the multipart scope gap I flagged above. What it does: Progress handling: the per-part factory recreates
Tests: added a
Verification: This matters because the reported customer archive (~86.6 MB |
Purpose and Context: Extend the disturbed/locked stream-body fix to the multipart upload path. `uploadPart` previously built its request body once (`createReadStream(...).pipe(progressPassThrough)`) and called `fetchWithRetry` without a `createBody` factory, so multipart part uploads were attempted only once and could not recover from a transient network failure. The reported customer archive exceeds MULTIPART_THRESHOLD, so it uses this path; without this change the PR only converts the misleading "should not be disturbed or locked" error into the real network error for that case, rather than restoring retry/recovery. Key Changes: - `uploadPart` now forwards a per-attempt `createBody` factory to `fetchWithRetry(url, opts, 3, createBody)`, mirroring the single-PUT `uploadUsingPutObject` path. - The per-part factory (built in `uploadParts`) recreates the ranged read stream and its progress PassThrough per attempt and resets that part's counters, so retries re-stream the part without reusing a consumed stream and without double-counting bytes. - Aggregate progress is now summed from a per-part `partBytesRead` array, keeping totals correct across concurrent parts and retries. - Export `BodyFactory` for the public `UploadPartArgs` type. - Import `setInterval`/`clearInterval` from `node:timers` instead of relying on test-sandbox globals; under Node 26 the jest environment does not expose them as globals, which broke the new tests. Impact and Considerations: Production behaviour is unchanged for successful uploads. The presigned request is still fetched once per part (outside `fetchWithRetry`), so the URL is reused across attempts, as on the single-PUT path. No config or migration changes are required. Testing and Validation: Added a `uploadParts()` test suite (mocking `undici` fetch and `../lib/api/http`): multi-part progress aggregation stays within bounds and finishes at 100%, and a transient ECONNRESET on the first attempt recovers on retry (fetch twice, http once) with final progress exactly 100%, proving the per-attempt counter reset prevents double-counting. Verified with check-types, eslint, babel build, and the `__tests__/lib` suite on Node 20, 22, 24, and 26. Refs: PLTFRM-2491 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6cd163a to
aaeec79
Compare
|



Description
File uploads (
vip app deploy, media/SQL imports) stream the file from disk as thefetchrequest body.fetchWithRetry()retried failed requests by re-callingfetchwith the sameinitobject. Because a Node stream can only be consumedonce, the retry handed undici an already-consumed (disturbed/locked) stream, which
threw:
This masked the real, underlying network error (e.g. a connection reset partway
through the upload) and made the retry useless — the retry could never succeed.
This change makes
fetchWithRetry()accept an optional body factory that produces afresh body for each attempt, and
uploadUsingPutObject()now passes one (recreatingthe read stream + progress pass-through per attempt). As a safety net, if a stream
body is supplied without a factory, the request is attempted only once so the true
error surfaces instead of the misleading undici message.
Observed in CI with VIP-CLI v4.0.5 on Node v24.16.0, where the upload stalled at 90%
and failed with the message above.
Changelog Description
Fixed
Pull request checklist
Steps to Test
npm ci.npm test -- __tests__/lib/client-file-uploader.js— the newfetchWithRetry()suite passes (retry recreates the body; a stream body with nofactory is attempted once; non-stream bodies still retry; last error propagates).
vip app deploy <file>against a flaky connection; atransient failure now retries the upload from a fresh stream instead of throwing
"Response body object should not be disturbed or locked".