fix(aws/s3): store object bodies as base64 so binary uploads survive#168
Open
andthezhang wants to merge 1 commit into
Open
fix(aws/s3): store object bodies as base64 so binary uploads survive#168andthezhang wants to merge 1 commit into
andthezhang wants to merge 1 commit into
Conversation
PutObject read the request body with `.text()` (and the presigned-POST path with `file.text()`), which decodes bytes as UTF-8 and replaces every non-UTF-8 byte with U+FFFD (EF BF BD). Any binary object — audio, images, gzip, protobuf — came back corrupted on GetObject, and Content-Length was inflated because each replacement char re-encodes to 3 bytes. Read the raw bytes via `arrayBuffer()` instead and persist them base64. ETag is now md5 of the raw bytes, Content-Length is the true byte count, and GetObject decodes the base64 back to a byte-exact Uint8Array. Copy already round-trips the stored body verbatim, so it inherits the fix. `md5()` now accepts Buffer/Uint8Array so it can hash raw bytes directly. Test: round-trips a binary object (NUL, 0x80/0xff/0xfe, PNG magic) byte-for-byte and asserts Content-Length is the raw byte count.
Contributor
|
@andthezhang is attempting to deploy a commit to the Vercel Labs Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The S3 emulator corrupts any non-text upload.
PutObjectreads the request body withc.req.text()(and the presigned-POST path withfile.text()), which decodes the bytes as UTF-8 and replaces every byte that isn't valid UTF-8 with the replacement characterU+FFFD(EF BF BD).As a result, any binary object — audio, images, gzip, protobuf, etc. — comes back corrupted on
GetObject, andContent-Lengthis inflated (eachU+FFFDre-encodes to 3 bytes).Fix
PutObject/ presignedPOSTnow read the raw bytes viaarrayBuffer()and persist them base64-encoded.ETagis computed from the raw bytes, andContent-Lengthis the true byte count.GetObjectdecodes the stored base64 back to a byte-exactUint8Array(sent viac.body(...)).CopyObjectalready round-trips the stored body verbatim, so it inherits the fix.md5()now acceptsBuffer/Uint8Arrayso it can hash raw bytes directly.No new dependencies; no behavioral change for text objects.
Test
Adds a round-trip test that uploads a binary object (NUL,
0x80/0xff/0xfe, PNG magic), reads it back, and asserts it is byte-for-byte identical and thatContent-Lengthequals the raw byte count.pnpm --filter @emulators/aws test→ all green;type-checkclean.