-
Notifications
You must be signed in to change notification settings - Fork 2
FEAT: Support for S3 object tagging in file task #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Divyanshu Tiwari (divyanshu-tiwari)
merged 3 commits into
main
from
DATA-8036_s3_tag_support
Apr 22, 2026
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,6 +31,7 @@ In read mode, the sanitized base filename is stored in the record context under | |
| | `path` | string | `/tmp/caterpillar.txt` | File path or S3 URL (`s3://bucket/key`); glob patterns supported in read mode | | ||
| | `region` | string | `us-west-2` | AWS region for S3 operations | | ||
| | `storage_class` | string | `STANDARD` | S3 **write** only: on `PutObject`. Ignored for local paths. See [S3 storage class](#s3-storage-class). | | ||
| | `tags` | map[string]string | - | S3 **write** only: object tags applied on `PutObject`. Ignored for local paths. Values support macros and context templates. See [S3 object tags](#s3-object-tags). | | ||
| | `delimiter` | string | `\n` | Delimiter used to separate records when reading | | ||
| | `success_file` | bool | `false` | Whether to create a success file after writing | | ||
| | `success_file_name` | string | `_SUCCESS` | Name of the success file | | ||
|
|
@@ -61,6 +62,45 @@ AWS may add or adjust classes in newer SDK releases; if a value is rejected as u | |
|
|
||
| Read mode does not set storage class (objects are read as-is). | ||
|
|
||
| ## S3 object tags | ||
|
|
||
| When the write `path` is an S3 URI (`s3://...`), each object is uploaded with the configured `tags` applied as the `x-amz-tagging` header on `PutObject`. The same tags are applied to the optional `success_file` marker. | ||
|
|
||
| Tag values are evaluated per record, so macros and context templates (e.g. `{{ macro "timestamp" }}`, `{{ context "user_id" }}`) are resolved against the record being written. | ||
|
|
||
| ### Limits | ||
|
|
||
| S3 enforces the following constraints ([docs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html)): | ||
|
|
||
| - At most **10 tags** per object. | ||
| - Tag keys must be unique (enforced by the YAML map). | ||
| - Tag **keys** up to **128 UTF-16 code units**. | ||
| - Tag **values** up to **256 UTF-16 code units**. | ||
|
|
||
| Tag count, key length, and resolved value length are validated on every S3 write (the count and keys don't change per record, but the checks are cheap and run alongside per-record value validation). In UTF-16, most characters take 1 code unit and supplementary characters (e.g. many emoji) take 2. Validation runs only when actually writing to S3 — local or read-mode runs are not affected by tag configuration. | ||
|
|
||
| ### `success_file` marker | ||
|
|
||
| The `_SUCCESS` marker is not tied to any record, so tag values for the success marker must only use static strings or startup-time templates (`env`, `secret`, `macro`). A tag that references `{{ context "..." }}` will fail at the success-marker write with `context keys were not set: ...`, since there is no record context to resolve against. | ||
|
|
||
| If you need record-derived tag values, either drop the context reference from the success-marker tags, or disable `success_file`. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit pick:
Same tags would be applied for both files, right? |
||
|
|
||
| Read mode does not apply tags (objects are read as-is). | ||
|
|
||
| ### Example | ||
|
|
||
| ```yaml | ||
| tasks: | ||
| - name: write_to_s3_tagged | ||
| type: file | ||
| path: s3://my-bucket/events/{{ macro "timestamp" }}.jsonl | ||
| region: us-east-1 | ||
| tags: | ||
| env: prod | ||
| pipeline: events | ||
| user_id: '{{ context "user_id" }}' | ||
| ``` | ||
|
|
||
| ## Path Schemes | ||
|
|
||
| The task supports different path schemes: | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.