AgentaFlow · dewitt4 · Jun 20, 2026 · Jun 20, 2026
diff --git a/.env.example b/.env.example
@@ -59,6 +59,9 @@ AI_SERVICE_URL=http://localhost:8000
 AI_SERVICE_PORT=8000
 # Used by the ai-service for CORS allow_origins (must match the running core URL)
 CORE_API_URL=http://localhost:3001
+# Shared secret for core→AI service calls. In Azure, generated by seed-keyvault.sh and
+# injected from Key Vault. Leave empty for local dev (check is skipped when unset).
+INTERNAL_SERVICE_TOKEN=
 
 # --- Marketplace ---
 MARKETPLACE_URL=https://marketplace.agentbase.dev/api/v1
@@ -79,6 +82,12 @@ STRIPE_WEBHOOK_SECRET=
 # Must use NEXT_PUBLIC_ prefix so Next.js exposes it to the browser
 NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=
 
+# --- Analytics (consent-gated — injected only after cookie opt-in) ---
+# GA4 measurement ID (format: G-XXXXXXXXXX). Leave empty to disable GA4.
+NEXT_PUBLIC_GA_MEASUREMENT_ID=
+# Microsoft UET tag ID for Microsoft Advertising. Leave empty to disable UET.
+NEXT_PUBLIC_MS_UET_TAG_ID=
+
 # --- Email (optional) ---
 SMTP_HOST=
 SMTP_PORT=587

diff --git a/README.md b/README.md
@@ -61,7 +61,7 @@ agentbase/
 | **SQL Database**   | PostgreSQL 16                                 |
 | **Document DB**    | MongoDB 7                                     |
 | **Cache**          | Redis 7                                       |
-| **Infrastructure** | Docker, Nginx, DigitalOcean Kubernetes (DOKS) |
+| **Infrastructure** | Docker · Azure App Service + Bicep IaC        |
 | **License**        | GPL-3.0                                       |
 
 ## Quick Start

diff --git a/azure-pipelines/scripts/seed-keyvault.sh b/azure-pipelines/scripts/seed-keyvault.sh
@@ -73,6 +73,9 @@ ensure_secret jwt-secret "$(or_placeholder "${JWT_SECRET:-$(gen)}")"
 ensure_secret jwt-refresh-secret "$(or_placeholder "${JWT_REFRESH_SECRET:-$(gen)}")"
 ensure_secret encryption-key "$(or_placeholder "${ENCRYPTION_KEY:-$(gen)}")"
 ensure_secret plugin-settings-encryption-key "$(or_placeholder "${PLUGIN_SETTINGS_ENCRYPTION_KEY:-$(gen)}")"
+# Shared secret for core→AI service calls. Generated independently from JWT_SECRET;
+# never derived from it. Rotate independently when needed.
+ensure_secret internal-service-token "$(gen)"
 
 # --- Optional integration secrets (placeholder keeps the KV reference resolvable) ---
 set_secret stripe-secret-key "$(or_placeholder "${STRIPE_SECRET_KEY:-}")"

diff --git a/docs/azure/architecture.md b/docs/azure/architecture.md
@@ -64,11 +64,10 @@ graph LR
 
   user --> fe --> core
   dev --> core
-  core --> ai --> llm
+  core -->|X-Internal-Token| ai --> llm
   core --> data
   ai --> data
   core -. MARKETPLACE_URL .-> mkt
-  fe --> ai
 ```
 
 The core platform connects to the Marketplace over `MARKETPLACE_URL` (dashed —
@@ -109,11 +108,14 @@ graph TD
 ```
 
 In **prod**, `networking.bicep` adds a VNet (app-integration subnet + private-
-endpoint subnet), private endpoints for PostgreSQL, Cosmos, Redis, Blob and Key
-Vault, and the matching private DNS zones — so the data tier has **no public
-network access** (constitution Principle II). In **staging**, the data services
-keep public access with an "allow Azure services" firewall rule to minimise cost
-and complexity.
+endpoint subnet), private endpoints for PostgreSQL, Cosmos, Redis, Blob, Key
+Vault, **and the AI service App Service site** — so the data tier and the AI
+service have **no public network access** (constitution Principle II). The AI
+service further restricts inbound to `snet-app` only via `ipSecurityRestrictions`.
+In **staging**, the data services keep public access with an "allow Azure
+services" firewall rule; the AI service is protected by the app-layer
+`INTERNAL_SERVICE_TOKEN` only (network restriction deferred until VNet
+integration is promoted to staging).
 
 ---
 
@@ -166,17 +168,18 @@ Principles applied:
 |------------------|-----------------------|-------------|--------|
 | `postgres-password` | `POSTGRES_PASSWORD` | core | variable group |
 | `mongo-uri` | `MONGO_URI` | core, ai | `az cosmosdb keys list` |
-| `redis-password` | `REDIS_PASSWORD` | core¹ | `az redis list-keys` |
+| `redis-password` | `REDIS_PASSWORD` | core | `az redis list-keys` |
 | `jwt-secret`, `jwt-refresh-secret` | `JWT_SECRET`, `JWT_REFRESH_SECRET` | core | generated once |
 | `encryption-key`, `plugin-settings-encryption-key` | same (upper-snake) | core | generated once |
+| `internal-service-token` | `INTERNAL_SERVICE_TOKEN` | core + ai | generated once, **independent** from jwt-secret |
 | `stripe-secret-key`, `stripe-webhook-secret` | `STRIPE_*` | core | variable group (optional) |
-| `openai-api-key`, `anthropic-api-key`, `gemini-api-key` | `*_API_KEY` | ai | variable group (optional) |
+| `openai-api-key`, `anthropic-api-key`, `gemini-api-key`, `huggingface-api-key` | `*_API_KEY` | ai | variable group (optional) |
 
-¹ Redis settings are injected and ready; the core's rate limiter is currently
-in-memory (`common/interceptors/rate-limit.interceptor.ts`). Swapping it for a
-Redis-backed limiter needs no infra change — `REDIS_HOST/PORT/TLS/PASSWORD` are
-already present. Secrets are seeded idempotently by
+Secrets are seeded idempotently by
 [`azure-pipelines/scripts/seed-keyvault.sh`](../../azure-pipelines/scripts/seed-keyvault.sh).
+`internal-service-token` uses `ensure_secret` (generated once, never overwritten
+automatically) and rotates independently from JWT keys — use different rotation
+cadences and ownership.
 
 ---
 

diff --git a/docs/azure/pipeline.md b/docs/azure/pipeline.md
@@ -19,13 +19,27 @@ az group create -n rg-agentbase-staging -l eastus
 az group create -n rg-agentbase-prod    -l eastus
 ```
 
-### 1.2 Service connection
+### 1.2 Service connections
 
-Create an Azure Resource Manager **service connection** (Project Settings →
-Service connections) scoped to the subscription, e.g. named
-`agentbase-azure`. Grant its service principal **Contributor** + **User Access
-Administrator** on both resource groups (User Access Administrator is required
-because the Bicep creates **role assignments** in `rbac.bicep`).
+Create **two** Azure Resource Manager service connections (Project Settings →
+Service connections), one per environment. Scope each to its resource group only
+(not the whole subscription) for least-privilege isolation.
+
+| ADO variable | Connection name (example) | Scoped to |
+|---|---|---|
+| `AZURE_SERVICE_CONNECTION_STAGING` | `agentbase-staging-sc` | `RG_STAGING` |
+| `AZURE_SERVICE_CONNECTION_PROD` | `agentbase-prod-sc` | `RG_PROD` |
+
+Grant each service principal **two roles** on its resource group:
+- **Owner** — required because `rbac.bicep` creates role assignments (Contributor
+  alone can't grant roles; you need User Access Administrator, which Owner includes).
+- **Key Vault Secrets Officer** (at RG scope, not KV resource scope) — data-plane
+  secret writes. Scoped to the RG so the role inherits to the Key Vault once
+  Bicep creates it on the first run (the KV doesn't exist yet when this is set up).
+
+For the prod service connection: choose **"Specific pipelines"** rather than
+"Grant access to all pipelines" in the security settings, and add only the
+`agentbase-deploy.yml` pipeline.
 
 ### 1.3 Variable group
 
@@ -34,7 +48,8 @@ Library) with:
 
 | Variable | Secret? | Example / purpose |
 |----------|:------:|-------------------|
-| `AZURE_SERVICE_CONNECTION` | no | `agentbase-azure` |
+| `AZURE_SERVICE_CONNECTION_STAGING` | no | `agentbase-staging-sc` |
+| `AZURE_SERVICE_CONNECTION_PROD` | no | `agentbase-prod-sc` |
 | `RG_STAGING` | no | `rg-agentbase-staging` |
 | `RG_PROD` | no | `rg-agentbase-prod` |
 | `PG_ADMIN_PASSWORD` | **yes** | PostgreSQL admin password (≥ 12 chars, complex) |
@@ -44,11 +59,13 @@ Library) with:
 | `OPENAI_API_KEY` | yes | *(optional)* AI provider |
 | `ANTHROPIC_API_KEY` | yes | *(optional)* |
 | `GEMINI_API_KEY` | yes | *(optional)* |
+| `HUGGINGFACE_API_KEY` | yes | *(optional)* |
 
 Optional secrets left undefined are stored in Key Vault as `not-configured`
 placeholders so their Key Vault references still resolve. `jwt-secret`,
-`jwt-refresh-secret`, `encryption-key`, and `plugin-settings-encryption-key`
-are **generated once** by the seed script and preserved across deploys.
+`jwt-refresh-secret`, `encryption-key`, `plugin-settings-encryption-key`, and
+`internal-service-token` are **generated once** by the seed script and preserved
+across deploys — do not add these to the variable group.
 
 ### 1.4 Environments + approval gate
 
@@ -143,7 +160,79 @@ az group delete --name rg-agentbase-staging --yes --no-wait
 
 ---
 
-## 6. Local validation (before pushing)
+## 6. Prelaunch checklist
+
+**This checklist must be signed off before the first production push.**
+Items marked **[GATE]** are hard blockers — the checklist cannot be signed
+off while any GATE item is unresolved. No-go audit findings become known issues
+that slip under launch pressure without explicit gates here.
+
+### Security
+
+- [ ] **[GATE]** `INTERNAL_SERVICE_TOKEN` is in Key Vault (`internal-service-token`
+      secret exists and is not `not-configured`) for both staging and prod.
+- [ ] **[GATE]** AI service `/api/ai/conversations` returns 401 without the token;
+      returns 200 with the correct `X-Internal-Token` header.
+- [ ] **[GATE]** Rate limiting enforced globally: verify with concurrent requests
+      across multiple instances that the Redis-backed counter triggers 429.
+- [ ] **[GATE]** Encryption key present in Key Vault (`encryption-key`) and is a
+      64-character hex string — test BYOK provider key save/load round-trip.
+- [ ] All security audit categories in `docs/azure/prelaunch-security-audit.md`
+      show **GO**.
+
+### Network lockdown (prod)
+
+- [ ] **[GATE]** AI service not reachable from the public internet in prod. Test:
+      `curl https://<aiAppName>.azurewebsites.net/api/ai/conversations` from
+      outside Azure — must return 403 or TCP connection refused (private endpoint).
+- [ ] **[GATE]** Core→AI calls succeed through the VNet path in prod.
+- [ ] AI service `ipSecurityRestrictions` applied: Azure portal → AI app →
+      Networking → Access Restrictions — only `snet-app` allow rule present.
+
+### SSE streaming
+
+- [ ] SSE stream completes normally end-to-end through the core proxy:
+      `curl -N https://<coreUrl>/v1/chat -H 'X-API-Key: <key>' -d '{"message":"hello"}'`
+- [ ] **[GATE — disconnect cleanup]** Manual verification: run the above curl, kill
+      it mid-stream with Ctrl-C, then check AI service logs for unclosed generator
+      errors. No `GeneratorExit` unhandled traces should appear.
+- [ ] No response buffering: chunks arrive incrementally (not in one burst after
+      stream ends). If on App Service, confirm `X-Accel-Buffering: no` header
+      is present in the response.
+
+### Analytics / consent
+
+- [ ] Cookie consent banner appears on first visit (no prior localStorage entry).
+- [ ] GA4 and UET scripts are **not** present in page source before consent is
+      given — verify with browser devtools network tab.
+- [ ] After accepting consent, GA4/UET scripts load and fire pageview events.
+- [ ] "Manage cookies" resets consent and banner reappears on reload.
+
+### Pipeline
+
+- [ ] `az bicep build --file infra/main.bicep` passes (no errors, warnings OK).
+- [ ] Validate stage (`what-if`) completes green on a staging run.
+- [ ] Staging deploy green with all three health checks passing.
+- [ ] Manual approval gate active on `agentbase-prod` environment in ADO.
+- [ ] Prod service connection uses "Specific pipelines" authorization.
+- [ ] Rollback procedure tested: re-point an app at a previous tag and verify
+      it comes up healthy.
+
+### Sign-off
+
+| Area | Signed off by | Date |
+| --- | --- | --- |
+| Security | | |
+| Network lockdown (prod) | | |
+| SSE streaming | | |
+| Analytics / consent | | |
+| Pipeline | | |
+
+All GATE items resolved and all rows signed off before merging to production.
+
+---
+
+## 7. Local validation (before pushing)
 
 ```bash
 az bicep build --file infra/main.bicep                 # lint