From 4a278ad17b465a44378d25aba548d9ff22df9d3b Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 5 Jun 2026 13:18:53 +0000
Subject: [PATCH 1/2] docs: add CLAUDE.md proposals for 5 repos with recent
 merged PRs

Proposals generated from review of merged PRs by SayaliPat, shrivastavakapil2000,
JoeVsVolcano, mike-brant, and nathan-resonate. Each file documents what to add or
create as CLAUDE.md in the corresponding repo:

- step-function-workflow-orchestrator: decommissioned pipelines (fusion-behavior-preprocess,
  cookiejar-sample-export), EMR 7.12/Spark 3 migration (7 pipelines), geo-location changes
- batch-expression-modeling: BlockGraph delivery config, formatter path layout change,
  formatter metrics lambda, batch stitch throttling (CDP-118913/118857/118972)
- identity-graph: new CLAUDE.md for PRISM identity system (11 Spark jobs + prism_dbt v1.0)
- kshrivastava: new CLAUDE.md for V3 multi-agent orchestration framework (Temporal + roles + skills)
- dos-data-pipeline: new CLAUDE.md for geo-location ETL (district_source, 4-CSV mapping, backfill rewrite)

https://claude.ai/code/session_014wxhfqnUnnc6kcDYEgWDx3
---
 .../batch-expression-modeling.CLAUDE.md       |  50 +++++
 .../dos-data-pipeline.CLAUDE.md               | 112 +++++++++++
 claude-md-proposals/identity-graph.CLAUDE.md  | 181 ++++++++++++++++++
 claude-md-proposals/kshrivastava.CLAUDE.md    | 117 +++++++++++
 ...p-function-workflow-orchestrator.CLAUDE.md |  51 +++++
 5 files changed, 511 insertions(+)
 create mode 100644 claude-md-proposals/batch-expression-modeling.CLAUDE.md
 create mode 100644 claude-md-proposals/dos-data-pipeline.CLAUDE.md
 create mode 100644 claude-md-proposals/identity-graph.CLAUDE.md
 create mode 100644 claude-md-proposals/kshrivastava.CLAUDE.md
 create mode 100644 claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md

diff --git a/claude-md-proposals/batch-expression-modeling.CLAUDE.md b/claude-md-proposals/batch-expression-modeling.CLAUDE.md
new file mode 100644
index 0000000..b5019df
--- /dev/null
+++ b/claude-md-proposals/batch-expression-modeling.CLAUDE.md
@@ -0,0 +1,50 @@
+# CLAUDE.md update — resonate/batch-expression-modeling
+# Append the following section to the END of the existing CLAUDE.md
+
+---
+
+## Vendor Configuration (`vendor-config.ini`) — Recent Additions
+
+File: `workflows/lambdas/batch-audience-delivery-config/vendor-config.ini`
+
+### New Config Keys
+
+| Key | Default | Purpose |
+|---|---|---|
+| `stitch_columns` | — | Comma-separated multi-column stitch (use instead of `stitch_column` for vendors that join on multiple fields, e.g. BlockGraph address). |
+| `audience_bitmap_path` | `tagav` | Bitmap source for audience evaluation. Set to `person_jar` for person-keyed (RID) vendors like BlockGraph. |
+
+All existing single-column vendors continue to use `stitch_column` (singular). `get_audience_bitmap_path(delivery_type)` returns the configured source (default `tagav`).
+
+### BlockGraph Delivery (CDP-118913)
+
+BlockGraph is **person-keyed (RID)**, not cookie-keyed (RCID). Two new sections in `vendor-config.ini`:
+
+- `[blockgraph_syndicated]` — syndication delivery via normalized RID→ADDRESS table
+- `[blockgraph_custom]` — custom delivery, same stitch shape
+
+Key differences from cookie-keyed vendors:
+- `stitch_columns = norm_address_line,norm_city,norm_state,norm_zip,zip_plus4` — normalized address stitch (5 columns)
+- `stitch_table_name = person_identity_graph_beta` — beta RID→ADDRESS table
+- `audience_bitmap_path = person_jar` — evaluates audiences against the personJar bitmap (not tagav)
+- `person_jar_path` is supplied per-run in the delivery event (`delivery_config.person_jar_path`)
+
+### Formatter Output Path Layout Change (CDP-118857 / CDP-118937)
+
+The batch-delivery-formatter flipped partition order from `date=*/method=av/vendor=*/akey=*` to `date=*/vendor=*/method=*/akey=*`. All downstream consumer state machines were updated accordingly:
+- `batch-audience-custom-delivery` (#256)
+- `batch-audience-delivery-syndication` (yahoo, experian, openx/viant variants)
+
+If you see `No files were able to be copied` errors after a formatter run, check that the consumer ASL's `source_prefix` uses `vendor=*/method=*` order.
+
+### Formatter Metrics Lambda (CDP-118857)
+
+`batch-delivery-formatter-pipeline` now invokes `batch-delivery-formatter-publish-metrics` Lambda after each successful EMR run. Lambda ARN injected via `PublishMetricsFunctionArn` in each pipeline's `terragrunt.hcl`.
+
+Metrics published to InfluxDB:
+- `com.resonate.delivery.format.count` — per-audience row counts
+- `com.resonate.delivery.format.aggregate` — aggregate metrics (vendor + method + delivery_type tags, no audience_key)
+
+### Batch Stitch Throttling (CDP-118972)
+
+`batch-stitch-pipeline` uses `MaxConcurrency=2` on the stitch Map state to prevent bursting `AddJobFlowSteps` API calls. A stagger wait of `(Map.Item.Index × wait_seconds)` is inserted before each step submission.
diff --git a/claude-md-proposals/dos-data-pipeline.CLAUDE.md b/claude-md-proposals/dos-data-pipeline.CLAUDE.md
new file mode 100644
index 0000000..09b42f0
--- /dev/null
+++ b/claude-md-proposals/dos-data-pipeline.CLAUDE.md
@@ -0,0 +1,112 @@
+# CLAUDE.md — resonate/dos-data-pipeline
+# This is a NEW file to be created at the root of the dos-data-pipeline repo
+
+# dos-data-pipeline
+
+## Project Purpose and Architecture Overview
+
+This repo contains the **DOS (Data Operations Systems) data pipeline** — Scala/Spark ETL jobs and configuration for geo-location enrichment, segment aggregation, and political district bitmap preparation for Resonate's audience platform.
+
+**Key pipelines:**
+- **geo-location** — Daily enrichment of RCIDs with geographic + political district data (congress, state-senate, state-house); full backfill for historical correction. Feeds `ToBitmap` which sets marker bits.
+- **segment-aggregator** — Aggregates behavioral segment data.
+
+**Deployment:** GitHub Actions. Config in this repo deploys directly. Other pipelines are activated by `datapipeline-trigger` in `lambda-modeling`; their config comes from `services-api-configuration`.
+
+---
+
+## Repository Structure
+
+```
+src/main/scala/com/resonate/dos/
+  GeoLocationDaily.scala             # Daily geo enrichment: IP → zip → district + district_source
+  GeoLocationFullBackfill.scala      # Backfills historical dates (daily-on-multi-day-pixels approach)
+  GeoLocationFull.scala              # Merges daily+backfill into rolling-full
+  ToBitmap.scala                     # Converts geo-location output to bitmap; gates L2-confirmed bit
+  DistrictResolver.scala             # Centralizes L2-vs-IP district coalescing + district_source derivation
+
+src/test/scala/com/resonate/dos/
+  GeoLocationDailyTest.scala
+  GeoLocationFullBackfillTest.scala
+  ToBitmapTest.scala
+  ...
+
+config/geo-location/
+  zip-district-mappings/             # 4 CSV files (one per district type)
+    zip-congress-mapping.csv          # 34,785 rows — L2 mode-per-zip, all 51 states
+    zip-proposed-congress-mapping.csv # 8,362 rows — 2026 redistricted (CA/MO/NC/OH/TX/UT/VA)
+    zip-state-senate-mapping.csv      # 33,527 rows
+    zip-state-house-mapping.csv       # 32,620 rows
+```
+
+**Scala version:** 2.11.8 | **Spark version:** 2.4.3 (Spark 2 — not yet migrated to Spark 3)
+
+---
+
+## Key Commands
+
+```bash
+# Run all unit tests
+sbt test
+
+# Run a specific test suite
+sbt "testOnly com.resonate.dos.GeoLocationDailyTest"
+sbt "testOnly com.resonate.dos.ToBitmapTest"
+
+# Build assembly JAR
+sbt assembly
+
+# Deploy — push to main triggers GitHub Actions deployment pipeline
+```
+
+---
+
+## Key Concepts
+
+### district_source Provenance (CDP-118946 / CDP-118947)
+
+Every RCID in geo-location output carries a `district_source` column set by `DistrictResolver`:
+
+| Value | Meaning |
+|---|---|
+| `L2_CONFIRMED` | L2 has a district AND `l2_party_confirmed = true`. **Only this value sets the L2-confirmed marker bit in ToBitmap.** |
+| `L2_UNCONFIRMED` | L2 has a district AND `l2_party_confirmed != true`. |
+| `IP_INFERRED` | No L2 district; NetAcuity IP → zip → 4-CSV-mapping found a district. |
+| `null` | No district data from any source. |
+
+**Observed distribution (dev 2026-05-12):** IP_INFERRED ~86.8%, L2_CONFIRMED ~9.9%, L2_UNCONFIRMED ~2.9%, null ~0.4%.
+
+**ToBitmap gating (CDP-118947):** The L2-confirmed marker bit is set **only** when `district_source = "L2_CONFIRMED"`. Before this fix (CDP-118947) the bit was set unconditionally (broken for L2_UNCONFIRMED and IP_INFERRED rows).
+
+### Zip→District Mappings (4-file Shape, CDP-118946)
+
+The district lookup uses 4 separate CSV files (one per district type) rather than a single consolidated CSV. Schema: `zip, state, <district-type>`. Files version together as a snapshot.
+
+Passed to Spark jobs as a single `zipDistrictMappingsBasePath` argument pointing at the directory. `DistrictResolver` reads all 4 files by convention from that path.
+
+Deployed to S3 via terragrunt `configurations` entries in the geo-location pipeline. After updating CSVs, deploy via the geo-location GitHub Actions workflow.
+
+**IP-inferred fallback flow:** `GeoLocationDaily` calls NetAcuity to resolve the RCID's IP to a zip, then joins the 4 CSVs for the district. State-leg mappings are many-to-many and collapsed to 1:1 using rank-based dedup with `sort_array` for deterministic tiebreaking.
+
+### GeoLocationFullBackfill Rewrite (CDP-118946)
+
+Rewritten as **daily-on-multi-day-pixels**: applies daily geo enrichment logic to each date in the backfill range rather than re-reading the prior rolling-full. Key changes:
+
+- No longer takes `DistrictColumnList` arg — always re-derives all 4 districts (prevents stale labels)
+- Reads existing `zip` from `fullDf`; applies same IP-fallback as Daily (no second NetAcuity call)
+- `GeoLocationFull.expectedCols` extended with `zip` and `district_source`
+- Orchestrator passes the date range as a rendered glob to backfill
+
+### Backfill `Should Run Full` Bug Fix (CDP-118512)
+
+A bug in the geo-location state machine (`geo_location.asl.json` in `step-function-workflow-orchestrator`) caused a pre-existing geo full for today to override backfill output. Fixed by adding an `IsPresent` check on `$.FullInputPath` before the path-equality choice. If backfill output is not propagating to the full-build stage, check the `Should Run Full` choice state.
+
+---
+
+## Project-Specific Rules and Gotchas
+
+- **Spark 2.4.3**: This repo is still on Spark 2. Do not apply Spark 3-specific optimizations (EMR 7 migration for this repo is not yet underway).
+- **State-leg mappings are many-to-many**: Do not assume 1:1 zip→district. `DistrictResolver` handles collapsing with `sort_array` for deterministic hash-based tiebreaking.
+- **CSV versioning**: The 4 zip-district CSVs version together as a snapshot. Updating one requires coordinated update of all 4 if the underlying data changes.
+- **`GeoLocationFull.expectedCols`**: If you add a new column to `GeoLocationDaily`, also add it to `expectedCols` in `GeoLocationFull.scala` so it flows through rolling-full and backfill.
+- **NetAcuity in Backfill**: `GeoLocationFullBackfill` does NOT call NetAcuity. It reads the `zip` column already resolved by `GeoLocationDaily` and applies the CSV-based district lookup.
diff --git a/claude-md-proposals/identity-graph.CLAUDE.md b/claude-md-proposals/identity-graph.CLAUDE.md
new file mode 100644
index 0000000..db17627
--- /dev/null
+++ b/claude-md-proposals/identity-graph.CLAUDE.md
@@ -0,0 +1,181 @@
+# CLAUDE.md — resonate/identity-graph
+# This is a NEW file to be created at the root of the identity-graph repo
+
+# identity-graph
+
+## Project Purpose and Architecture Overview
+
+This repo contains the **Identity Graph** — Resonate's Person Resolution and Identity System (PRISM). It has two layers:
+
+1. **Scala/Spark Jobs** (`src/main/scala/`) — 11 batch ETL jobs that build the underlying identity tables (identifier index, persons) from raw sources (L2, Experian, TapAd, LiveIntent, etc.)
+2. **prism_dbt** (`dbt/prism_dbt/`) — PRISM's consumer-facing dbt package v1.0. Exposes four primitives (`waterfall_match`, `identifier_expand`, `persons_project`, `lookup`) as Lambda-invokable Snowflake service models.
+
+**Production JAR path:** `s3://resonate-core-applications/identity-graph/jars/identity-graph-latest.jar`
+
+**Design docs (published to S3):**
+- `https://s3.nonprod.aws.resonatedigital.net/person-identity-graph/index.html` — hub
+- `docs/design/design_and_production_readiness.html` — main architecture doc
+- `docs/design/consumer_interface_design.html` — consumer integration (Append/BlockGraph/GeoFix/Cortex)
+- `docs/design/project_plan.html` — 6 tracks, Jira ticket DAG
+
+---
+
+## Repository Structure
+
+```
+src/main/scala/com/resonate/spark/apps/
+  ExperianPreprocessJob.scala         # Normalizes Experian ConsumerView + digital graph
+  L2ConsumerPreprocessJob.scala       # Preprocesses L2 consumer tab-delimited files
+  TapadIndexJob.scala                 # Experian/TapAd → identifier_index
+  L2IndexJob.scala                    # L2 voter → identifier_index
+  L2ConsumerIndexJob.scala            # L2 consumer → identifier_index
+  OnboardingMatchJob.scala            # 4-phase matching (blocking, corroboration, scoring, fallback)
+  OnboardingHouseholdLinksJob.scala   # Household CTV/TTD attribution via digital graph
+  OnboardingPersonsJob.scala          # TapAd-only person records (deprecated: BL-106)
+  MergeOnboardingLinksJob.scala       # Multi-provider collapse + confidence scoring
+  PersonIdentityJob.scala             # Main orchestrator: persons + identifier_index
+  CustomerMatchJob.scala              # Ephemeral customer file matching (7 fallback tiers)
+
+src/main/scala/com/resonate/spark/utils/
+  HashUtils.scala                     # Deterministic SHA-256 person_id + household_id
+  StagingWriter.scala                 # Cache/count/write/unpersist pattern
+  AddressNormalizer.scala             # USPS Pub 28 normalization + nickname resolution (max 5 hops)
+  IpFilter.scala                      # RFC1918 + cloud IP blocklist
+  ScoringConfig.scala                 # Centralized confidence scoring
+
+dbt/prism_dbt/                        # Consumer-facing dbt package (see dbt/prism_dbt/README.md)
+  macros/                             # waterfall_match, identifier_expand, persons_project, lookup
+  models/service/                     # Lambda-invokable: waterfall.sql, identifier_expand.sql, persons_project.sql
+  snowflake/
+    function/name_address_hash/       # NAME_ADDRESS_HASH Snowflake UDF
+    stored_procedure/prism_lookup/    # PRISM_LOOKUP SP
+    stored_procedure/name_address_lookup/ # NAME_ADDRESS_LOOKUP SP
+
+docs/design/                          # HTML design artifacts (published to S3)
+docs/scripts/                         # build_columns_page.py, sample_parquet.py
+
+.github/workflows/
+  snowflake_dbt.yml                   # Deploy prism_dbt via snow dbt deploy
+  snowflake_stored_procedure.yml      # Deploy PRISM_LOOKUP / NAME_ADDRESS_LOOKUP
+  snowflake_function.yml              # Deploy NAME_ADDRESS_HASH UDF
+```
+
+---
+
+## Key Commands
+
+### Scala/Spark Jobs (SBT)
+
+```bash
+# Run all unit tests (443 tests across 20 suites)
+sbt test
+
+# Run a specific test suite
+sbt "testOnly com.resonate.spark.apps.PersonIdentityJobSpec"
+sbt "testOnly com.resonate.spark.apps.CustomerMatchJobSpec"
+
+# Build assembly fat JAR for EMR deployment
+sbt assembly
+
+# Upload to S3 (prod)
+aws s3 cp target/identity-graph-latest.jar \
+  s3://resonate-core-applications/identity-graph/jars/identity-graph-latest.jar
+```
+
+See the [deployment wiki](https://resonate-jira.atlassian.net/wiki/spaces/ASD/pages/3683057840/Pipeline+Scala+SBT+GitOps) for the GitOps deploy process.
+
+### prism_dbt (Snowflake Native dbt Package)
+
+```bash
+cd dbt/prism_dbt
+
+# Install dbt dependencies
+dbt deps
+
+# Local dev (requires ~/.dbt/profiles.yml — copy from profiles.yml.example)
+dbt seed --profiles-dir <your-profiles-dir>
+dbt run  --select waterfall --profiles-dir <your-profiles-dir>
+dbt test --profiles-dir <your-profiles-dir>
+
+# Offline unit tests only (no Snowflake needed — fast)
+dbt test --select test_type:unit      # 16 tests
+
+# All tests (includes live Snowflake singular tests against resonate-dev)
+dbt test                              # 25 tests total
+```
+
+### Snowflake Deployments (GitHub Actions)
+
+All Snowflake deployments go through GitHub Actions. Do not run `snow dbt deploy` manually from the CLI.
+
+| Workflow | What it deploys |
+|---|---|
+| `snowflake_dbt.yml` | `prism_dbt` package via `snow dbt deploy` (runs `dbt parse` as fast-fail gate) |
+| `snowflake_stored_procedure.yml` | `PRISM_LOOKUP` or `NAME_ADDRESS_LOOKUP` stored procedures |
+| `snowflake_function.yml` | `NAME_ADDRESS_HASH` UDF |
+
+---
+
+## PRISM dbt Package (prism_dbt v1.0)
+
+### The Four Primitives
+
+| Macro | Direction | Purpose |
+|---|---|---|
+| `waterfall_match` | identifiers → 1 person_id per row | Resolve mixed-identifier customer rows to a canonical `person_id`. First-match-wins by priority. |
+| `identifier_expand` | person_id → identifiers of ONE type | Fan out person_ids to deliverable identifiers. Call once per identifier type. |
+| `persons_project` | person_id → persons attributes | Project selected `persons` columns (name, address, lalvoterid, …) onto person_ids. |
+| `lookup` | one identifier → one person + linked IDs | Single-row debug. Also deployed as `PRISM_LOOKUP` stored procedure. |
+
+### v1.0 Data Sources
+- `PERSONS_LATEST` + `IDENTIFIER_INDEX_LATEST` (filtered to `identifier_rank = 1`)
+- `CROSSWALK_LATEST` exists but is **NOT used** in v1.0 (missing metadata for confidence filtering)
+- `ZIP11` routes automatically to `persons.zip11` (not `identifier_index`) via `_internal/lookup_strategy.sql`
+
+### Service Model Invocation
+
+```sql
+EXECUTE DBT PROJECT RESONATE.PRISM.prism_dbt
+  ARGS = $$run --select waterfall --vars '{
+    "input_relation":  "MY_DB.MY_SCHEMA.my_input",
+    "consumer":        "append",
+    "job_run_id":      "run_001",
+    "waterfall_order": [
+      {"identifier_type": "HEM_MD5",    "stitch_label": "hem"},
+      {"identifier_type": "HEM_SHA256", "stitch_label": "hem"},
+      {"identifier_type": "MAID",       "stitch_label": "maid"}
+    ]
+  }'$$;
+```
+
+### NAME_ADDRESS_HASH UDF
+
+Normalizes raw PII to a SHA-256 hash for name+address waterfall steps:
+
+```sql
+SELECT NAME_ADDRESS_HASH(first_name, last_name, address_line, zip) AS NAME_ADDRESS FROM my_input;
+```
+
+v1.0: `LOWER + TRIM + strip non-alphanumeric + LPAD zip + pipe-concat + SHA-256`. Does not yet do canonical-first-name resolution (Bill ≠ William — planned v1.1).
+
+---
+
+## Key Concepts and Bug Fixes in v1.0
+
+- **ZIP11 routing**: `waterfall_match` auto-routes `ZIP11` steps to `persons.zip11`. Just include `{"identifier_type": "ZIP11", "stitch_label": "zip11"}` in `waterfall_order`.
+- **Fan-out tiebreak**: Always deterministic. Order: confidence DESC → source_count DESC → max_effective_weight DESC → person_id ASC.
+- **`L2IndexJob` HOUSEHOLD_ID null guard**: `filterNulls = true` — prevents phantom household collapse.
+- **`OnboardingMatchJob` Phase 3 tiebreak**: `ORDER BY person_id ASC` as final tiebreak — fixes non-determinism.
+- **`ExperianPreprocessJob` aggregation**: `min(id_value)` not `first(id_value)` — deterministic.
+- **Null guards**: `identifier_value` filtered before `groupBy` in `PersonIdentityJob` and `MergeOnboardingLinksJob`.
+
+---
+
+## Project-Specific Rules
+
+- **All Snowflake deployments via GitHub Actions** — never `snow dbt deploy` from the CLI.
+- **`snowflake_profiles/profiles.yml` is a deploy stub** (account/user = `'_'`). Do not put real credentials there.
+- **dbt unit tests are offline** — run `--select test_type:unit` first for fast feedback.
+- **Singular tests require `resonate-dev` Snowflake** — live mock tables in `RESONATE.PRISM.*` (provisioned from `.context/cdp-119018-mock-tables.sql`).
+- **person_id ≠ RID**: PRISM produces `person_id`. `RID` (Resonate-internal) is owned by Append's downstream pipeline. `waterfall_match` output column is `prism_matched_person_id`.
+- **SemVer**: v1.0 is the Append cutover baseline. Release tagging tracked in CDP-119015.
diff --git a/claude-md-proposals/kshrivastava.CLAUDE.md b/claude-md-proposals/kshrivastava.CLAUDE.md
new file mode 100644
index 0000000..f473d21
--- /dev/null
+++ b/claude-md-proposals/kshrivastava.CLAUDE.md
@@ -0,0 +1,117 @@
+# CLAUDE.md — resonate/kshrivastava
+# This is a NEW file to be created at the root of the kshrivastava repo
+
+# kshrivastava — Multi-Agent Orchestration V3
+
+Personal repository for Kapil Shrivastava (POCs and experiments). The primary artifact is the **V3 multi-agent orchestration framework** in `multi-agent-orchestration-v3/`.
+
+---
+
+## V3 Multi-Agent Orchestration Framework
+
+### What It Is
+
+A Temporal-based orchestration system where AI agents collaborate to plan, code, review, and retrospect on software engineering stories. Each story runs as a Temporal workflow; agents implement domain-specific **roles** and consult domain-expert **skills** mid-run.
+
+```
+story ──▶ Temporal Workflow ──▶ Planner ──▶ Coder ──▶ Reviewer ──▶ Retro
+                                    │           │
+                                    └──▶ Specialist (consult-by-default)
+                                          ↑
+                                  append-agent / cleanroom /
+                                  audience-delivery / ...
+```
+
+### Repository Structure
+
+```
+multi-agent-orchestration-v3/
+├── framework/
+│   ├── role/
+│   │   ├── runner.py              # Runs a role; injects available_specialists catalog
+│   │   ├── loader.py              # Loads role YAML + prompt
+│   │   └── schemas.py             # CoderOutput, PlannerOutput, SpecialistOutput, etc.
+│   ├── recipe/
+│   │   ├── interpreter.py         # Runs recipe phases; Contract-4 consult hook
+│   │   └── specialist_consult.py  # run_specialist_consult()
+│   └── runtime/
+│       └── cassettes.py           # Request/response cassette recording for replay
+├── roles/
+│   ├── planner.yaml + prompts/planner.md
+│   ├── coder.yaml  + prompts/coder.md
+│   ├── reviewer.yaml + prompts/reviewer.md   # claude-sonnet-4 (upgraded from Haiku)
+│   ├── retro.yaml   + prompts/retro.md
+│   └── specialist.yaml + prompts/specialist.md  # Generic; self-selects skill by description
+├── skills/
+│   ├── WIRING.md                  # Specialist wiring docs
+│   ├── append-agent/
+│   │   ├── SKILL.md               # Grounding + playbooks for Append domain
+│   │   └── references/            # Source map: Confluence/Jira/GitHub/AWS/Splunk
+│   ├── cleanroom/                 # Placeholder (routable, content pending)
+│   └── audience-delivery/         # Placeholder (routable, content pending)
+├── recipes/
+│   └── default.yaml               # Phase sequence: Plan → Code → Review → Retro
+├── tests/unit/                    # 826+ unit tests
+└── docs/
+    ├── ENGINEER_ONBOARDING.md
+    └── presentations/
+        └── special-agent-explainer.html
+```
+
+### Key Concepts
+
+**Roles** — YAML-defined agents with a `role: <name>` field and matching `prompts/<name>.md`. Models:
+- Planner: `claude-opus-4` (quality-sensitive planning)
+- Coder: `claude-sonnet-4`
+- Reviewer: `claude-sonnet-4` (upgraded from Haiku — review quality is critical)
+- Retro: `claude-haiku-4`
+- Specialist: `claude-sonnet-4`
+
+**Skills** — Domain knowledge bundles in `skills/<domain>/`. Each needs a `SKILL.md` with a sharp `description` (the routing key) and grounding workflow. Auto-discovered by `discover_skills()` — no code wiring needed to add a domain.
+
+**Specialist / Consult System (Contract 4 — LIVE):**
+1. Any role emits `consultations_needed: [{specialist, query}]` in its output
+2. `framework/recipe/interpreter.py` (`_extract_consultations`) calls `run_specialist_consult()`
+3. The generic `specialist` role self-selects the skill by description matching
+4. `SpecialistOutput {answer, citations, confidence}` is threaded into caller's `prior_attempt_summaries`; caller re-runs with answer in context
+5. Per-phase round cap mirrors the escalation cap
+
+**Consult-by-default:**
+- `available_specialists` catalog (`{name, description}`) injected into EVERY role's `task_input` by `runner.py`
+- Planner/coder prompts: scan catalog first; consult before guessing or escalating a domain fact
+- Catalog excluded from cassette fingerprint — recordings replay regardless of deployed specialists
+
+**Ticket-based story lookup:**
+- `get_state` / `get_history` accept a bare Jira ticket (e.g. `CDP-117874`) resolving to workflow
+- Writes still need explicit `workflow_id`
+
+### Key Commands
+
+```bash
+cd multi-agent-orchestration-v3
+
+# Install dependencies
+uv sync
+
+# Run unit tests
+pytest tests/unit/                  # ~826 tests
+pytest tests/unit/test_<name>.py    # specific file
+
+# Start the dev worker (requires AWS SSO + Temporal cluster)
+aws sso login
+python framework/worker.py
+
+# Story CLI
+python cli.py start  --story "CDP-12345" --description "Fix the frobnicator"
+python cli.py get-state   --story CDP-12345
+python cli.py get-history --story CDP-12345
+```
+
+### Rules and Gotchas
+
+- **This is a POC repo.** Production V3 will live elsewhere once the framework is proven.
+- **Dev worker restart required** after merging branches. Reviewer→Sonnet model change takes effect only after worker restart.
+- **Adding a skill**: Drop `skills/<domain>/SKILL.md` with a sharp `description`. No code changes — `discover_skills()` auto-discovers it. The `description` is the routing key.
+- **`workspace_isolation: per_attempt_worktree`**: Specialist uses per-attempt worktrees. Read-only (Q&A) invocations simply don't write.
+- **Consult round cap**: After `MAX_CONSULTATION_ROUNDS_PER_PHASE` consults per phase the agent proceeds with what it has. Audit trail in `ctx.keys["_consultation_log"]`.
+- **Cassette fingerprint**: `available_specialists` excluded — replay works regardless of which domains are deployed at playback time.
diff --git a/claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md b/claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md
new file mode 100644
index 0000000..0d6f526
--- /dev/null
+++ b/claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md
@@ -0,0 +1,51 @@
+# CLAUDE.md update — resonate/step-function-workflow-orchestrator
+# Append the following two sections to the END of the existing CLAUDE.md
+
+---
+
+## Decommissioned Pipelines (Removed from Deployment)
+
+These pipelines have had their Terraform infrastructure removed and **must not be redeployed**.
+Pipeline code and config files are retained for historical reference only.
+
+| Pipeline | Reason | PR |
+|---|---|---|
+| `fusion-behavior-preprocess` | Dormant since 2023; inadvertently re-deployed 2026-05-28. Sovrn/Havas no longer a client. | #734 |
+| `cookiejar-sample-export` | Havas no longer a client; `CookieJarSampler` Scala app marked `@deprecated` in `core-data-pipelines-spark`. | #748 |
+
+Both pipelines still have AWS resources (Step Functions, EventBridge rules, IAM roles) requiring manual `terragrunt destroy` or console cleanup in both dev and prod accounts. See `DEPRECATED.md` inside each pipeline directory.
+
+---
+
+## EMR 7.12.0 / Spark 3 Migration (CDP-118269)
+
+These pipelines have been migrated from `emr-5.34.0` + Spark 2 to `emr-7.12.0` + Spark 3:
+
+| Pipeline | PR | Notes |
+|---|---|---|
+| `sovrn-weekly` | #729 | |
+| `maid-onboarder` | #730 | Prod-volume validated end-to-end |
+| `damlam-preparation` | #732 | |
+| `idsync-overlap` | #733 | YARN vcore fix: `yarn.nodemanager.resource.cpu-vcores = 16` on integration |
+| `behavior-stitch` | #737 | YARN vcore fix applied; use `--master yarn` for Spark 3 |
+| `marketops-overlap` | #731 | |
+| `topic-aggregation` | #736 | Requires `--master yarn` flag in Spark 3 spark-submit args |
+
+**Jar convention:** Spark 3 pipelines use `core-data-pipeline-spark3-latest.jar`; legacy pipelines use `core-data-pipeline-latest.jar` or `assembly-*.jar`. When migrating a new pipeline update both `params.json` (prod + integration) and `emr.json` (prod + integration + on-demand variants).
+
+**YARN memory on EMR 7 (r5.xlarge):** EMR 7 overhead factor ~18.8% (higher than EMR 5). Reduce integration executor memory (e.g. `21g → 16g`). Prod r5d.4xlarge can keep 21G.
+
+**Dev jar path:** Dev environments must reference the `-dev` S3 bucket (e.g. `resonate-core-datasets-dev/...`), not the prod bucket.
+
+---
+
+## Geo-Location Pipeline Recent Changes
+
+### TapAd-Derived HEM→RCID Stitch (CDP-118512)
+`l2-processing` pipeline now ingests TapAd Digital Graph + LiveRamp Tradedesk Agg to supplement LiveIntent-based HEM→RCID derivation (~+21% coverage lift on district-attributed RCIDs). The `dynamic-dates` Lambda resolves TapAd's flat-file dates (no date-named subdirectory) via a new flat-file sentinel mode.
+
+### Zip→District Mappings (CDP-118946)
+Four new CSVs under `pipelines/geo-location/config/zip-district-mappings/` (one per district type: congress, proposed-congress, state-senate, state-house). Schema: `zip, state, <district-type>`. Deployed to S3 via terragrunt `configurations` entries. Passed as `myZipDistrictMappingsBasePath` to both `GeoLocationDaily` and `GeoLocationFullBackfill`.
+
+### Backfill `Should Run Full` Bug Fix (CDP-118512)
+A bug in `geo_location.asl.json` where a pre-existing geo full for today overrode the backfill output was fixed by adding an `IsPresent` check on `$.FullInputPath` before the path-equality choice. If backfill output is not propagating, check the `Should Run Full` state machine choice state.

From 90fc01ed4696c79ec19954239f0b6b47e8bda784 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 5 Jun 2026 13:21:50 +0000
Subject: [PATCH 2/2] fix: address Copilot review comments on
 step-function-workflow-orchestrator proposal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Correct header: "two sections" → "three sections" (Decommissioned Pipelines,
  EMR 7.12.0/Spark 3 Migration, and Geo-Location Pipeline Recent Changes)
- Fix arg name: myZipDistrictMappingsBasePath → zipDistrictMappingsBasePath
  (consistent with dos-data-pipeline.CLAUDE.md and PR description)

https://claude.ai/code/session_014wxhfqnUnnc6kcDYEgWDx3
---
 .../step-function-workflow-orchestrator.CLAUDE.md             | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md b/claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md
index 0d6f526..9484883 100644
--- a/claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md
+++ b/claude-md-proposals/step-function-workflow-orchestrator.CLAUDE.md
@@ -1,5 +1,5 @@
 # CLAUDE.md update — resonate/step-function-workflow-orchestrator
-# Append the following two sections to the END of the existing CLAUDE.md
+# Append the following three sections to the END of the existing CLAUDE.md
 
 ---
 
@@ -45,7 +45,7 @@ These pipelines have been migrated from `emr-5.34.0` + Spark 2 to `emr-7.12.0` +
 `l2-processing` pipeline now ingests TapAd Digital Graph + LiveRamp Tradedesk Agg to supplement LiveIntent-based HEM→RCID derivation (~+21% coverage lift on district-attributed RCIDs). The `dynamic-dates` Lambda resolves TapAd's flat-file dates (no date-named subdirectory) via a new flat-file sentinel mode.
 
 ### Zip→District Mappings (CDP-118946)
-Four new CSVs under `pipelines/geo-location/config/zip-district-mappings/` (one per district type: congress, proposed-congress, state-senate, state-house). Schema: `zip, state, <district-type>`. Deployed to S3 via terragrunt `configurations` entries. Passed as `myZipDistrictMappingsBasePath` to both `GeoLocationDaily` and `GeoLocationFullBackfill`.
+Four new CSVs under `pipelines/geo-location/config/zip-district-mappings/` (one per district type: congress, proposed-congress, state-senate, state-house). Schema: `zip, state, <district-type>`. Deployed to S3 via terragrunt `configurations` entries. Passed as `zipDistrictMappingsBasePath` to both `GeoLocationDaily` and `GeoLocationFullBackfill`.
 
 ### Backfill `Should Run Full` Bug Fix (CDP-118512)
 A bug in `geo_location.asl.json` where a pre-existing geo full for today overrode the backfill output was fixed by adding an `IsPresent` check on `$.FullInputPath` before the path-equality choice. If backfill output is not propagating, check the `Should Run Full` state machine choice state.