Skip to content

docs: CLAUDE.md proposals for 5 repos (EMR migration, identity-graph, geo-pipeline, agent-v3, batch-delivery)#7

Open
shrivastavakapil2000 wants to merge 2 commits into
mainfrom
claude/serene-davinci-LdzOD
Open

docs: CLAUDE.md proposals for 5 repos (EMR migration, identity-graph, geo-pipeline, agent-v3, batch-delivery)#7
shrivastavakapil2000 wants to merge 2 commits into
mainfrom
claude/serene-davinci-LdzOD

Conversation

@shrivastavakapil2000

Copy link
Copy Markdown

Summary

CLAUDE.md maintenance pass triggered by recent merged PR activity from: SayaliPat, shrivastavakapil2000, JoeVsVolcano, mike-brant, nathan-resonate.

This PR adds proposed CLAUDE.md content for 5 repos under claude-md-proposals/. Because this session only has write access to resonate/.github, the proposals are stored here for manual application to each repo.


Proposals

1. step-function-workflow-orchestrator → append to existing CLAUDE.md

File: claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md

New sections to append:

  • Decommissioned Pipelines: fusion-behavior-preprocess (PR #734) and cookiejar-sample-export (PR #748) — removed from Terraform, AWS resources still need cleanup
  • EMR 7.12.0 / Spark 3 Migration (CDP-118269): 7 pipelines migrated (sovrn-weekly, maid-onboarder, damlam-preparation, idsync-overlap, behavior-stitch, marketops-overlap, topic-aggregation); jar naming convention; YARN memory guidance for r5.xlarge
  • Geo-Location Pipeline Recent Changes: TapAd HEM→RCID stitch, 4-file zip→district CSV shape, backfill "Should Run Full" bug fix

2. batch-expression-modeling → append to existing CLAUDE.md

File: claude-md-proposals/batch-expression-modeling.CLAUDE.md

New section to append:

  • BlockGraph delivery (CDP-118913): stitch_columns (plural), audience_bitmap_path = person_jar, stitch_table_name = person_identity_graph_beta
  • Formatter output path layout change (CDP-118857/CDP-118937): flipped to date=*/vendor=*/method=*/akey=*
  • Formatter metrics lambda: InfluxDB metrics (format.count, format.aggregate) via batch-delivery-formatter-publish-metrics
  • Batch stitch throttling (CDP-118972): MaxConcurrency=2, stagger by Map.Item.Index

3. identity-graph → create new CLAUDE.md

File: claude-md-proposals/identity-graph.CLAUDE.md

Complete new CLAUDE.md covering:

  • 11 Scala/Spark ETL jobs (PRISM identity system)
  • prism_dbt v1.0 — 4 primitives: waterfall_match, identifier_expand, persons_project, lookup
  • Snowflake UDFs/SPs, GitHub Actions deployment workflows
  • Key rules: all Snowflake deploys via GitHub Actions, person_id ≠ RID, ZIP11 routing

4. kshrivastava → create new CLAUDE.md

File: claude-md-proposals/kshrivastava.CLAUDE.md

Complete new CLAUDE.md covering:

  • V3 multi-agent orchestration framework (Temporal-based)
  • Roles: planner (claude-opus-4), coder/reviewer/specialist (claude-sonnet-4), retro (claude-haiku-4)
  • Contract 4 consult system: consultations_needed → specialist → SpecialistOutput threaded back
  • Consult-by-default: available_specialists catalog injected into every role
  • Skills auto-discovery (discover_skills(), SKILL.md routing key)

5. dos-data-pipeline → create new CLAUDE.md

File: claude-md-proposals/dos-data-pipeline.CLAUDE.md

Complete new CLAUDE.md covering:

  • Scala 2.11.8 / Spark 2.4.3 (NOT Spark 3 — do not apply Spark 3 optimizations)
  • district_source provenance column: L2_CONFIRMED / L2_UNCONFIRMED / IP_INFERRED / null
  • ToBitmap gates L2-confirmed marker bit only on L2_CONFIRMED (CDP-118947 fix)
  • 4-file zip→district CSV shape passed as zipDistrictMappingsBasePath
  • GeoLocationFullBackfill rewrite as daily-on-multi-day-pixels (CDP-118946)
  • Backfill "Should Run Full" state machine bug fix (CDP-118512)

How to Apply

For each proposal file, copy the content to the corresponding repo. For step-function-workflow-orchestrator and batch-expression-modeling, append the content to the existing CLAUDE.md. For identity-graph, kshrivastava, and dos-data-pipeline, create a new CLAUDE.md at the repo root (strip the header comment line # This is a NEW file...).

https://claude.ai/code/session_014wxhfqnUnnc6kcDYEgWDx3


Generated by Claude Code

Proposals generated from review of merged PRs by SayaliPat, shrivastavakapil2000,
JoeVsVolcano, mike-brant, and nathan-resonate. Each file documents what to add or
create as CLAUDE.md in the corresponding repo:

- step-function-workflow-orchestrator: decommissioned pipelines (fusion-behavior-preprocess,
  cookiejar-sample-export), EMR 7.12/Spark 3 migration (7 pipelines), geo-location changes
- batch-expression-modeling: BlockGraph delivery config, formatter path layout change,
  formatter metrics lambda, batch stitch throttling (CDP-118913/118857/118972)
- identity-graph: new CLAUDE.md for PRISM identity system (11 Spark jobs + prism_dbt v1.0)
- kshrivastava: new CLAUDE.md for V3 multi-agent orchestration framework (Temporal + roles + skills)
- dos-data-pipeline: new CLAUDE.md for geo-location ETL (district_source, 4-CSV mapping, backfill rewrite)

https://claude.ai/code/session_014wxhfqnUnnc6kcDYEgWDx3
Copilot AI review requested due to automatic review settings June 5, 2026 13:19

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds draft CLAUDE.md content under claude-md-proposals/ for manual copy/application into five separate repositories, based on recent merged PR activity and operational learnings (EMR/Spark migrations, identity-graph, geo-location pipeline, multi-agent framework, and batch delivery).

Changes:

  • Adds proposed CLAUDE.md append-only updates for step-function-workflow-orchestrator and batch-expression-modeling.
  • Adds proposed new CLAUDE.md files for identity-graph, kshrivastava, and dos-data-pipeline.
  • Captures repo-specific operational guidance (Spark/EMR versions, deployment rules, config keys, and recent bug-fix notes).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md Proposed CLAUDE.md append sections covering decommissioned pipelines, EMR 7/Spark 3 migration notes, and geo-location recent changes.
claude-md-proposals/batch-expression-modeling.CLAUDE.md Proposed CLAUDE.md append section documenting vendor-config additions and batch delivery operational changes.
claude-md-proposals/identity-graph.CLAUDE.md Proposed new CLAUDE.md describing PRISM architecture, repo layout, and Snowflake/dbt deployment rules.
claude-md-proposals/kshrivastava.CLAUDE.md Proposed new CLAUDE.md describing the Temporal-based V3 multi-agent orchestration framework and conventions.
claude-md-proposals/dos-data-pipeline.CLAUDE.md Proposed new CLAUDE.md describing DOS geo-location pipeline architecture and Spark 2 constraints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md Outdated
Comment thread claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md Outdated
…trator proposal

- Correct header: "two sections" → "three sections" (Decommissioned Pipelines,
  EMR 7.12.0/Spark 3 Migration, and Geo-Location Pipeline Recent Changes)
- Fix arg name: myZipDistrictMappingsBasePath → zipDistrictMappingsBasePath
  (consistent with dos-data-pipeline.CLAUDE.md and PR description)

https://claude.ai/code/session_014wxhfqnUnnc6kcDYEgWDx3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants