docs: CLAUDE.md updates for 5 repos (based on recent merged PRs)#8
Open
shrivastavakapil2000 wants to merge 2 commits into
Open
docs: CLAUDE.md updates for 5 repos (based on recent merged PRs)#8shrivastavakapil2000 wants to merge 2 commits into
shrivastavakapil2000 wants to merge 2 commits into
Conversation
Reviews merged PRs from SayaliPat, shrivastavakapil2000, JoeVsVolcano, mike-brant, and nathan-resonate across the top 5 most active repos and generates updated (or new) CLAUDE.md files for each: - step-function-workflow-orchestrator: EMR 7.12 migration tracker, decommissioned pipelines, CheckSourceFreshness Lambda, Experian bucket - batch-audience-delivery-syndication: new — BlockGraph T06/T07/T08 Lambdas, OpenX path layout fix, testing patterns - identity-graph: new — PRISM Scala/Spark pipeline jobs + prism_dbt v1.0 (4 macros, 3 service models, NAME_ADDRESS_HASH UDF) - batch-expression-modeling: BlockGraph vendor config keys (stitch_columns, audience_bitmap_path), rate-limiting docs - core-data-pipelines-spark: app inventory, Sovrn Spark 3 fixes, deprecated CookieJarSampler, security note https://claude.ai/code/session_01VK2cGbtEwNsoboikzLouoA
There was a problem hiding this comment.
Pull request overview
This PR stages updated/new CLAUDE.md reference documents (plus an index README) for five active repositories, consolidated under claude-md-updates/ in resonate/.github due to scoped write access.
Changes:
- Added reference
CLAUDE.mdcontent for:step-function-workflow-orchestrator,batch-audience-delivery-syndication,identity-graph,batch-expression-modeling, andcore-data-pipelines-spark. - Added
claude-md-updates/README.mdto summarize covered repos, intended actions, and how to apply the files in follow-up PRs.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| claude-md-updates/README.md | Index of the staged CLAUDE.md updates and application instructions. |
| claude-md-updates/step-function-workflow-orchestrator/CLAUDE.md | Reference doc for Step Functions/Lambdas layout, workflows, EMR migration notes, and operational gotchas. |
| claude-md-updates/identity-graph/CLAUDE.md | Reference doc for PRISM Scala/Spark jobs and prism_dbt package structure/workflows. |
| claude-md-updates/core-data-pipelines-spark/CLAUDE.md | Reference doc for Spark 3 upgrade repo build/testing and notable app notes. |
| claude-md-updates/batch-expression-modeling/CLAUDE.md | Reference doc for BEM system architecture plus Step Functions JSONPath→JSONata guidance. |
| claude-md-updates/batch-audience-delivery-syndication/CLAUDE.md | Reference doc for delivery Lambdas, BlockGraph chain, and formatter path layout notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+301
to
+305
| // JSONPath | ||
| "ResultPath": null | ||
|
|
||
| // JSONata | ||
| // Simply omit the Output field - input passes through by default |
Comment on lines
+471
to
+474
| **Pass through input unchanged:** | ||
| - Omit the Output field entirely, OR explicitly use `"Output": "{% $states.input %}"` | ||
|
|
||
| **IMPORTANT:** Any Task state that needs to pass data to subsequent states should preserve the input explicitly. Without an Output field, the Task's result replaces the entire input, losing all previous data. |
|
|
||
| **Infrastructure accounts:** | ||
| - Non-prod: default AWS account (no `--profile` flag needed) | ||
| - Prod: `arn:aws:iam::694585954309:role/ProdTerraform` (assumed via `role_arn` in `step_function.hcl`) |
| |---|---|---| | ||
| | `blockgraph-create-taxonomy-file` | CDP-118915 (T06) | Generates metadata CSV(s): 13-field (initial/net-new) or 8-field (refresh/known) per BlockGraph spec. Reads audience set from ADS (syndicated) or event `audience_key_list` (custom). Routes by delivery state (known PSIDs). | | ||
| | `blockgraph-rename-files` | CDP-118916 (T07) | Concatenates per-audience Spark output parts into a single `resonate_<akey>_<ts>.csv.gz`. Uses S3 multipart copy for large files, download-concat-upload fallback for small parts. | | ||
| | `blockgraph-publish-files` | CDP-118917 (T08) | Uploads renamed segment files and metadata CSVs to BlockGraph's S3 (`auto/segment/upload/`, `auto/segment/metadata/`). Uses BG-issued SSM creds (`/resonate/cdp-118203/blockgraph/aws-*`) for the BG bucket; writes delivery-state delta (net-new PSIDs) to our own bucket. | |
| - Taxonomy metadata paths: `<prefix>/batch-delivery-payload/metadata/resonate_metadata_{initial,refresh}_<ts>.csv` | ||
| - State file path: `<prefix>/state/known-segments/run_date=YYYYMMDD/run_<ts>.csv` | ||
| - Two delivery modes: `blockgraph_syndicated` (ADS-sourced) and `blockgraph_custom` (event `audience_key_list`) | ||
| - SSM keys: `aws-access-key-id` / `aws-secret-access-key` under `/resonate/cdp-118203/blockgraph/` |
- batch-expression-modeling: fix JSONata 'discard result' example to use Output: $states.input (omitting Output replaces input with Task result) - batch-expression-modeling: remove misleading 'omit Output' guidance for Task states in Common Patterns; Task states always need explicit Output - step-function-workflow-orchestrator: replace hardcoded prod account ID with <PROD_ACCOUNT_ID> placeholder in ARN - batch-audience-delivery-syndication: generalize exact SSM path /resonate/cdp-118203/blockgraph/aws-* to a description + IaC pointer - batch-audience-delivery-syndication: generalize exact SSM key names (aws-access-key-id / aws-secret-access-key) to a description + IaC pointer https://claude.ai/code/session_01VK2cGbtEwNsoboikzLouoA
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR contains updated (or new)
CLAUDE.mdfiles for the 5 most active repos based on recent merged PRs from the team (SayaliPat, shrivastavakapil2000, JoeVsVolcano, mike-brant, nathan-resonate).Repos Covered
step-function-workflow-orchestratorbatch-audience-delivery-syndicationidentity-graphbatch-expression-modelingcore-data-pipelines-sparkPR Authors Whose Merged Work Is Reflected
How to Apply
Copy each
claude-md-updates/<repo>/CLAUDE.mdto the root of the corresponding repo and open a PR there.https://claude.ai/code/session_01VK2cGbtEwNsoboikzLouoA
Generated by Claude Code