Skip to content

fix(log-ingestor): URL-decode S3 object keys from SQS event notifications (fixes #2225).#2299

Open
LinZhihao-723 wants to merge 3 commits into
y-scope:mainfrom
LinZhihao-723:fix-2225
Open

fix(log-ingestor): URL-decode S3 object keys from SQS event notifications (fixes #2225).#2299
LinZhihao-723 wants to merge 3 commits into
y-scope:mainfrom
LinZhihao-723:fix-2225

Conversation

@LinZhihao-723
Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 commented May 29, 2026

Description

This PR fixes #2225.

SQS forwards S3 notifications by encoding the key using application/x-www-form-urlencoded (see https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-content-structure.html?utm_source=chatgpt.com). That means the SQS listener needs to unconditionally decode the received S3 key. This PR adds the decoding logic into the existing listener framework.

This PR also adds test cases for S3 key encoding, by introducing , &, and = characters in the submitted mock S3 objects. The submitted objects also contain keys that don't require any form URL encoding. The overall test cases make sure that:

  • SQS listener can decode S3 keys into their original form.
  • S3 scanners, which use ListObjectV2 without S3 key encoding, can also handle special characters inside the keys.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows pass.
  • Add test cases to cover encoded characters inside the keys (check the details in the description).

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • S3 object keys containing special characters are now properly processed during log ingestion.

Review Change Stack

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner May 29, 2026 20:44
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Walkthrough

This PR fixes a bug where the log-ingestor stored percent-encoded S3 object keys from SQS events without decoding them. The fix adds the url crate, implements URL-decoding in extract_object_metadata before prefix matching and storage, and updates tests to cover URL-encoded characters in keys.

Changes

S3 Object Key URL Decoding

Layer / File(s) Summary
URL crate dependency
components/log-ingestor/Cargo.toml
The url = "2.5.8" dependency is added to support form-urlencoded decoding.
URL decoding in extract_object_metadata
components/log-ingestor/src/ingestion_job/sqs_listener.rs
The NonEmptyString import is added; the function now decodes the raw S3 object key via url::form_urlencoded, converts the result to a non-empty string, evaluates prefix relevance against the decoded key, and stores the decoded key in ObjectMetadata. Documentation is updated to describe the decoding behaviour.
Test object key generation with URL-encoded characters
components/log-ingestor/tests/test_utils.rs
Test objects are now generated with keys in the format {prefix}/idx{idx:05}{placeholder}.log, where placeholder cycles through URL-encoding test cases ("", &, =, space) to exercise the decoding logic.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely describes the main fix: URL-decoding S3 object keys from SQS notifications to resolve issue #2225.
Linked Issues check ✅ Passed The PR implementation directly addresses issue #2225 by adding URL-decoding of S3 object keys, fixing the 403 errors and compression failures caused by encoded keys.
Out of Scope Changes check ✅ Passed All changes are scoped to URL-decoding S3 keys: dependency addition (url crate), decoding logic in SQS listener, and test coverage for encoded characters.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/log-ingestor/src/ingestion_job/sqs_listener.rs`:
- Around line 210-213: In extract_object_metadata, avoid panicking on an empty
decoded S3 key and stop using the form-name/value parser: replace the
url::form_urlencoded::parse(record.s3.object.key.as_bytes()) + expect(...)
pattern with a percent-decoding call (e.g. percent_decode_str or equivalent) on
record.s3.object.key to decode the whole key as a single value, then if the
decoded string is empty return None instead of calling
NonEmptyString::new(...).expect(...); construct the NonEmptyString only when
decoded_key is non-empty and propagate None from extract_object_metadata for
empty or invalid keys.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e2bcaa90-2b1e-4db0-9ffc-f3a442faa2f5

📥 Commits

Reviewing files that changed from the base of the PR and between dee475f and 34d40b0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • components/log-ingestor/Cargo.toml
  • components/log-ingestor/src/ingestion_job/sqs_listener.rs
  • components/log-ingestor/tests/test_utils.rs

Comment thread components/log-ingestor/src/ingestion_job/sqs_listener.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

log-ingestor stores URL-encoded S3 object keys from SQS events, causing compression failures

1 participant