Skip to content

Fix file upload retry reusing an already-consumed stream body#2897

Merged
sjinks merged 2 commits into
trunkfrom
fix/upload-retry-disturbed-stream-body
Jun 22, 2026
Merged

Fix file upload retry reusing an already-consumed stream body#2897
sjinks merged 2 commits into
trunkfrom
fix/upload-retry-disturbed-stream-body

Conversation

@mchanDev

Copy link
Copy Markdown

Description

File uploads (vip app deploy, media/SQL imports) stream the file from disk as the
fetch request body. fetchWithRetry() retried failed requests by re-calling
fetch with the same init object. Because a Node stream can only be consumed
once, the retry handed undici an already-consumed (disturbed/locked) stream, which
threw:

`Error: Response body object should not be disturbed or locked`

This masked the real, underlying network error (e.g. a connection reset partway
through the upload) and made the retry useless — the retry could never succeed.

This change makes fetchWithRetry() accept an optional body factory that produces a
fresh body for each attempt, and uploadUsingPutObject() now passes one (recreating
the read stream + progress pass-through per attempt). As a safety net, if a stream
body is supplied without a factory, the request is attempted only once so the true
error surfaces instead of the misleading undici message.

Observed in CI with VIP-CLI v4.0.5 on Node v24.16.0, where the upload stalled at 90%
and failed with the message above.

Changelog Description

Fixed

  • Fixed file uploads failing with "Response body object should not be disturbed or locked" when an upload request was retried, and ensured retries re-stream the file instead of reusing a consumed stream.

Pull request checklist

  • Update SETUP.md with any new environmental variables. (n/a — no new env vars)
  • Update the documentation. (n/a)
  • Manually test the relevant changes.
  • Follow the pull request checklist.
  • Add/update automated tests as needed.

Steps to Test

  1. Check out PR.
  2. Run npm ci.
  3. Run npm test -- __tests__/lib/client-file-uploader.js — the new
    fetchWithRetry() suite passes (retry recreates the body; a stream body with no
    factory is attempted once; non-stream bodies still retry; last error propagates).
  4. (Optional end-to-end) Run vip app deploy <file> against a flaky connection; a
    transient failure now retries the upload from a fresh stream instead of throwing
    "Response body object should not be disturbed or locked".

`fetchWithRetry` retried by re-calling `fetch` with the same `init`. When
the request body is a Node stream (as in `uploadUsingPutObject` and the
multipart part upload), the stream is consumed on the first attempt. On a
retry undici was handed an already-disturbed/locked stream and threw the
misleading `Response body object should not be disturbed or locked`,
masking the underlying network error.

`fetchWithRetry` now accepts an optional body factory that produces a fresh
body for every attempt, and `uploadUsingPutObject` passes one. When a stream
body is supplied without a factory, the request is attempted only once so the
real error surfaces instead of the confusing undici message.

Co-Authored-By: Claude
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a retry failure mode in src/lib/client-file-uploader where upload requests that stream files from disk would retry using an already-consumed stream body, causing undici to throw a misleading “disturbed or locked” error and preventing retries from ever succeeding.

Changes:

  • Updated fetchWithRetry() to support an optional body factory (createBody) that generates a fresh request body for each attempt, and to avoid retrying one-shot stream bodies when no factory is provided.
  • Updated uploadUsingPutObject() to pass a body factory that recreates the file read stream (and progress passthrough) per retry.
  • Added unit tests covering stream vs non-stream retry behavior and body recreation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/lib/client-file-uploader.ts Adds stream-safe retry support via a body factory and updates PutObject uploads to recreate stream bodies per attempt.
__tests__/lib/client-file-uploader.js Adds coverage for the new fetchWithRetry() behavior (body recreation, stream no-retry safety net, and error propagation).

Comment thread src/lib/client-file-uploader.ts
Comment thread __tests__/lib/client-file-uploader.js
@rinatkhaziev rinatkhaziev requested a review from rebeccahum June 22, 2026 15:44
@sjinks

sjinks commented Jun 22, 2026

Copy link
Copy Markdown
Member

Thanks for tracking this down — the root cause analysis and the single-PUT fix are correct. I reproduced the exact failure locally (undici 7.25.0, the version shipped in @automattic/vip@4.0.5): a transient mid-upload reset throws ECONNRESET on attempt 0, and the retry with the same consumed stream throws Response body object should not be disturbed or locked — character-for-character the reported error. The createBody factory recovers correctly (attempt 0 ECONNRESET → attempt 1 fresh stream → 200). 👍

One scope concern before this closes out PLTFRM-2491: the customer's reported case looks like the multipart path, which this PR doesn't convert.

  • MULTIPART_THRESHOLD = 32 * MB. The attached archive develop-built-files.zip is 90,757,361 bytes (~86.6 MB), and it's already a .zip (isCompressed = true), so the gzip step is skipped and the size stays ~86.6 MB.
  • 86.6 MB ≥ 32 MB → multipart (~6 parts) via uploadUsingMultipartuploadPartsuploadPart, not uploadUsingPutObject.
  • uploadPart still does fetchOptions.body = createReadStream(fileName, { start, end }).pipe(progressPassThrough) and calls fetchWithRetry(url, fetchOptions) without a createBody factory. With this PR, that path hits maxAttempts = 0 (stream body + no factory) → it's attempted once. That's a real improvement (the misleading error is replaced by the true network error), but it does not restore retry/recovery for the multipart case — which is the one the ~86 MB archive actually exercises.

So for the customer's 86 MB deploy on a flaky link, this PR likely surfaces a clearer ECONNRESET but the upload would still fail without retrying.

Two asks:

  1. Could you confirm whether the manual repro ("stalled at 90%") used a <32 MB file (single-PUT) or the provided ~86 MB archive (multipart)? That determines whether the verified path matches the customer's.
  2. Consider extending the same body-factory treatment to uploadPart. It's a bit more involved because the per-part progressPassThrough is created in uploadParts' parts.map(async part => …) and passed in, so the factory needs to recreate both the ranged read stream and the per-part PassThrough (and rewire progress aggregation).

For context, this stream-reuse defect predates the undici migration: I reproduced the same pattern on the old node-fetch@2 + fetch-retry@5 stack and the retries sent truncated (~1.9 MB of 2 MB) then 0-byte bodies and failed with a generic FetchError: network timeout. node-fetch just failed quietly where undici (stricter on Node 24) now fails loudly — so "works on 4.0.0" reflects a first attempt that didn't need a retry, not a working retry.

@sjinks

sjinks commented Jun 22, 2026

Copy link
Copy Markdown
Member

Follow-up: I pushed commit 6cd163a to this branch to close the multipart scope gap I flagged above.

What it does: uploadPart now forwards a per-attempt createBody factory to fetchWithRetry(url, opts, 3, createBody) instead of building fetchOptions.body once — mirroring the single-PUT uploadUsingPutObject path. So multipart part uploads now actually retry (3 attempts) with a fresh body, rather than maxAttempts = 0 (single attempt).

Progress handling: the per-part factory recreates createReadStream(fileName, { start, end }).pipe(new PassThrough()) and resets that part's counters on each attempt, so a retry re-streams the part without reusing a consumed stream and without double-counting bytes. Aggregate progress is now summed from a per-part partBytesRead[] array, which stays correct across concurrent parts and retries.

getSignedUploadRequestData is still called once per part (outside fetchWithRetry), so the presigned URL is reused across attempts — same as the single-PUT path.

Tests: added a uploadParts() suite (mocking undici fetch and ../lib/api/http):

  • multi-part upload aggregates progress and never exceeds 100%;
  • a transient ECONNRESET on attempt 0 recovers on retry (fetch ×2, http ×1) with final progress exactly 100% — proving the per-attempt counter reset prevents 200% double-counting.

Verification: check-types (tsc) clean, eslint clean, npm run build (babel) OK, and the __tests__/lib/ suite is green (33 suites / 357 tests, including the new cases).

This matters because the reported customer archive (~86.6 MB .zip) exceeds MULTIPART_THRESHOLD (32 MB), so it goes through uploadPart, not uploadUsingPutObject — without this, the PR would only have converted the misleading error into the real one for that case, not restored retry/recovery.

Purpose and Context:
Extend the disturbed/locked stream-body fix to the multipart upload
path. `uploadPart` previously built its request body once
(`createReadStream(...).pipe(progressPassThrough)`) and called
`fetchWithRetry` without a `createBody` factory, so multipart part
uploads were attempted only once and could not recover from a transient
network failure. The reported customer archive exceeds
MULTIPART_THRESHOLD, so it uses this path; without this change the PR
only converts the misleading "should not be disturbed or locked" error
into the real network error for that case, rather than restoring
retry/recovery.

Key Changes:
- `uploadPart` now forwards a per-attempt `createBody` factory to
  `fetchWithRetry(url, opts, 3, createBody)`, mirroring the single-PUT
  `uploadUsingPutObject` path.
- The per-part factory (built in `uploadParts`) recreates the ranged
  read stream and its progress PassThrough per attempt and resets that
  part's counters, so retries re-stream the part without reusing a
  consumed stream and without double-counting bytes.
- Aggregate progress is now summed from a per-part `partBytesRead`
  array, keeping totals correct across concurrent parts and retries.
- Export `BodyFactory` for the public `UploadPartArgs` type.
- Import `setInterval`/`clearInterval` from `node:timers` instead of
  relying on test-sandbox globals; under Node 26 the jest environment
  does not expose them as globals, which broke the new tests.

Impact and Considerations:
Production behaviour is unchanged for successful uploads. The presigned
request is still fetched once per part (outside `fetchWithRetry`), so
the URL is reused across attempts, as on the single-PUT path. No config
or migration changes are required.

Testing and Validation:
Added a `uploadParts()` test suite (mocking `undici` fetch and
`../lib/api/http`): multi-part progress aggregation stays within bounds
and finishes at 100%, and a transient ECONNRESET on the first attempt
recovers on retry (fetch twice, http once) with final progress exactly
100%, proving the per-attempt counter reset prevents double-counting.
Verified with check-types, eslint, babel build, and the `__tests__/lib`
suite on Node 20, 22, 24, and 26.

Refs: PLTFRM-2491

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sjinks sjinks force-pushed the fix/upload-retry-disturbed-stream-body branch from 6cd163a to aaeec79 Compare June 22, 2026 19:37
@sjinks sjinks self-assigned this Jun 22, 2026
@sjinks sjinks merged commit 1d565db into trunk Jun 22, 2026
19 checks passed
@sjinks sjinks deleted the fix/upload-retry-disturbed-stream-body branch June 22, 2026 20:10
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants