Skip to content

docs: CLAUDE.md updates for 5 repos (based on recent merged PRs)#8

Open
shrivastavakapil2000 wants to merge 2 commits into
mainfrom
claude/serene-davinci-9xD8b
Open

docs: CLAUDE.md updates for 5 repos (based on recent merged PRs)#8
shrivastavakapil2000 wants to merge 2 commits into
mainfrom
claude/serene-davinci-9xD8b

Conversation

@shrivastavakapil2000

Copy link
Copy Markdown

Summary

This PR contains updated (or new) CLAUDE.md files for the 5 most active repos based on recent merged PRs from the team (SayaliPat, shrivastavakapil2000, JoeVsVolcano, mike-brant, nathan-resonate).

Note: Because this session's GitHub write access is scoped to resonate/.github, the files are staged here as references under claude-md-updates/. Each file needs to be applied to its corresponding repo as a follow-up PR.

Repos Covered

Repo Action Key Changes Documented
step-function-workflow-orchestrator Update EMR 7.12 migration tracker, decommissioned pipelines (fusion-behavior-preprocess, cookiejar-sample-export), CheckSourceFreshness Lambda, Experian bucket change
batch-audience-delivery-syndication New CLAUDE.md BlockGraph delivery Lambdas T06/T07/T08 (taxonomy file, rename files, publish files), OpenX formatter path layout fix
identity-graph New CLAUDE.md PRISM Scala/Spark pipeline jobs (11 jobs), prism_dbt v1.0 (4 macros, 3 service models, NAME_ADDRESS_HASH UDF, 2 SPs)
batch-expression-modeling Update BlockGraph vendor config (stitch_columns, audience_bitmap_path keys), batch-stitch rate-limiting (MaxConcurrency=2 + stagger), formatter path layout
core-data-pipelines-spark Update Full app inventory, Sovrn Spark 3 fixes (meanDensities), deprecated CookieJarSampler, security fix note

PR Authors Whose Merged Work Is Reflected

  • SayaliPat — identity-graph (prism_dbt, Scala jobs, Experian pipeline), step-function-workflow-orchestrator
  • shrivastavakapil2000 — step-function-workflow-orchestrator (EMR 7.12 migrations, decommissions), core-data-pipelines-spark
  • JoeVsVolcano — step-function-workflow-orchestrator (shuffle partitions, lost files)
  • mike-brant — batch-expression-modeling, step-function-workflow-orchestrator (geo-location), batch-audience-delivery-syndication
  • nathan-resonate — batch-audience-delivery-syndication (BlockGraph Lambdas), batch-expression-modeling (BlockGraph config)

How to Apply

Copy each claude-md-updates/<repo>/CLAUDE.md to the root of the corresponding repo and open a PR there.

https://claude.ai/code/session_01VK2cGbtEwNsoboikzLouoA


Generated by Claude Code

Reviews merged PRs from SayaliPat, shrivastavakapil2000, JoeVsVolcano,
mike-brant, and nathan-resonate across the top 5 most active repos and
generates updated (or new) CLAUDE.md files for each:

- step-function-workflow-orchestrator: EMR 7.12 migration tracker,
  decommissioned pipelines, CheckSourceFreshness Lambda, Experian bucket
- batch-audience-delivery-syndication: new — BlockGraph T06/T07/T08
  Lambdas, OpenX path layout fix, testing patterns
- identity-graph: new — PRISM Scala/Spark pipeline jobs + prism_dbt v1.0
  (4 macros, 3 service models, NAME_ADDRESS_HASH UDF)
- batch-expression-modeling: BlockGraph vendor config keys
  (stitch_columns, audience_bitmap_path), rate-limiting docs
- core-data-pipelines-spark: app inventory, Sovrn Spark 3 fixes,
  deprecated CookieJarSampler, security note

https://claude.ai/code/session_01VK2cGbtEwNsoboikzLouoA
Copilot AI review requested due to automatic review settings June 6, 2026 13:13

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR stages updated/new CLAUDE.md reference documents (plus an index README) for five active repositories, consolidated under claude-md-updates/ in resonate/.github due to scoped write access.

Changes:

  • Added reference CLAUDE.md content for: step-function-workflow-orchestrator, batch-audience-delivery-syndication, identity-graph, batch-expression-modeling, and core-data-pipelines-spark.
  • Added claude-md-updates/README.md to summarize covered repos, intended actions, and how to apply the files in follow-up PRs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
claude-md-updates/README.md Index of the staged CLAUDE.md updates and application instructions.
claude-md-updates/step-function-workflow-orchestrator/CLAUDE.md Reference doc for Step Functions/Lambdas layout, workflows, EMR migration notes, and operational gotchas.
claude-md-updates/identity-graph/CLAUDE.md Reference doc for PRISM Scala/Spark jobs and prism_dbt package structure/workflows.
claude-md-updates/core-data-pipelines-spark/CLAUDE.md Reference doc for Spark 3 upgrade repo build/testing and notable app notes.
claude-md-updates/batch-expression-modeling/CLAUDE.md Reference doc for BEM system architecture plus Step Functions JSONPath→JSONata guidance.
claude-md-updates/batch-audience-delivery-syndication/CLAUDE.md Reference doc for delivery Lambdas, BlockGraph chain, and formatter path layout notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +301 to +305
// JSONPath
"ResultPath": null

// JSONata
// Simply omit the Output field - input passes through by default
Comment on lines +471 to +474
**Pass through input unchanged:**
- Omit the Output field entirely, OR explicitly use `"Output": "{% $states.input %}"`

**IMPORTANT:** Any Task state that needs to pass data to subsequent states should preserve the input explicitly. Without an Output field, the Task's result replaces the entire input, losing all previous data.

**Infrastructure accounts:**
- Non-prod: default AWS account (no `--profile` flag needed)
- Prod: `arn:aws:iam::694585954309:role/ProdTerraform` (assumed via `role_arn` in `step_function.hcl`)
|---|---|---|
| `blockgraph-create-taxonomy-file` | CDP-118915 (T06) | Generates metadata CSV(s): 13-field (initial/net-new) or 8-field (refresh/known) per BlockGraph spec. Reads audience set from ADS (syndicated) or event `audience_key_list` (custom). Routes by delivery state (known PSIDs). |
| `blockgraph-rename-files` | CDP-118916 (T07) | Concatenates per-audience Spark output parts into a single `resonate_<akey>_<ts>.csv.gz`. Uses S3 multipart copy for large files, download-concat-upload fallback for small parts. |
| `blockgraph-publish-files` | CDP-118917 (T08) | Uploads renamed segment files and metadata CSVs to BlockGraph's S3 (`auto/segment/upload/`, `auto/segment/metadata/`). Uses BG-issued SSM creds (`/resonate/cdp-118203/blockgraph/aws-*`) for the BG bucket; writes delivery-state delta (net-new PSIDs) to our own bucket. |
- Taxonomy metadata paths: `<prefix>/batch-delivery-payload/metadata/resonate_metadata_{initial,refresh}_<ts>.csv`
- State file path: `<prefix>/state/known-segments/run_date=YYYYMMDD/run_<ts>.csv`
- Two delivery modes: `blockgraph_syndicated` (ADS-sourced) and `blockgraph_custom` (event `audience_key_list`)
- SSM keys: `aws-access-key-id` / `aws-secret-access-key` under `/resonate/cdp-118203/blockgraph/`
- batch-expression-modeling: fix JSONata 'discard result' example to use
  Output: $states.input (omitting Output replaces input with Task result)
- batch-expression-modeling: remove misleading 'omit Output' guidance for
  Task states in Common Patterns; Task states always need explicit Output
- step-function-workflow-orchestrator: replace hardcoded prod account ID
  with <PROD_ACCOUNT_ID> placeholder in ARN
- batch-audience-delivery-syndication: generalize exact SSM path
  /resonate/cdp-118203/blockgraph/aws-* to a description + IaC pointer
- batch-audience-delivery-syndication: generalize exact SSM key names
  (aws-access-key-id / aws-secret-access-key) to a description + IaC pointer

https://claude.ai/code/session_01VK2cGbtEwNsoboikzLouoA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants