Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ AI_SERVICE_URL=http://localhost:8000
AI_SERVICE_PORT=8000
# Used by the ai-service for CORS allow_origins (must match the running core URL)
CORE_API_URL=http://localhost:3001
# Shared secret for core→AI service calls. In Azure, generated by seed-keyvault.sh and
# injected from Key Vault. Leave empty for local dev (check is skipped when unset).
INTERNAL_SERVICE_TOKEN=

# --- Marketplace ---
MARKETPLACE_URL=https://marketplace.agentbase.dev/api/v1
Expand All @@ -79,6 +82,12 @@ STRIPE_WEBHOOK_SECRET=
# Must use NEXT_PUBLIC_ prefix so Next.js exposes it to the browser
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=

# --- Analytics (consent-gated — injected only after cookie opt-in) ---
# GA4 measurement ID (format: G-XXXXXXXXXX). Leave empty to disable GA4.
NEXT_PUBLIC_GA_MEASUREMENT_ID=
# Microsoft UET tag ID for Microsoft Advertising. Leave empty to disable UET.
NEXT_PUBLIC_MS_UET_TAG_ID=

# --- Email (optional) ---
SMTP_HOST=
SMTP_PORT=587
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ agentbase/
| **SQL Database** | PostgreSQL 16 |
| **Document DB** | MongoDB 7 |
| **Cache** | Redis 7 |
| **Infrastructure** | Docker, Nginx, DigitalOcean Kubernetes (DOKS) |
| **Infrastructure** | Docker · Azure App Service + Bicep IaC |
| **License** | GPL-3.0 |

## Quick Start
Expand Down
3 changes: 3 additions & 0 deletions azure-pipelines/scripts/seed-keyvault.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ ensure_secret jwt-secret "$(or_placeholder "${JWT_SECRET:-$(gen)}")"
ensure_secret jwt-refresh-secret "$(or_placeholder "${JWT_REFRESH_SECRET:-$(gen)}")"
ensure_secret encryption-key "$(or_placeholder "${ENCRYPTION_KEY:-$(gen)}")"
ensure_secret plugin-settings-encryption-key "$(or_placeholder "${PLUGIN_SETTINGS_ENCRYPTION_KEY:-$(gen)}")"
# Shared secret for core→AI service calls. Generated independently from JWT_SECRET;
# never derived from it. Rotate independently when needed.
ensure_secret internal-service-token "$(gen)"

# --- Optional integration secrets (placeholder keeps the KV reference resolvable) ---
set_secret stripe-secret-key "$(or_placeholder "${STRIPE_SECRET_KEY:-}")"
Expand Down
29 changes: 16 additions & 13 deletions docs/azure/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,10 @@ graph LR

user --> fe --> core
dev --> core
core --> ai --> llm
core -->|X-Internal-Token| ai --> llm
core --> data
ai --> data
core -. MARKETPLACE_URL .-> mkt
fe --> ai
```

The core platform connects to the Marketplace over `MARKETPLACE_URL` (dashed —
Expand Down Expand Up @@ -109,11 +108,14 @@ graph TD
```

In **prod**, `networking.bicep` adds a VNet (app-integration subnet + private-
endpoint subnet), private endpoints for PostgreSQL, Cosmos, Redis, Blob and Key
Vault, and the matching private DNS zones — so the data tier has **no public
network access** (constitution Principle II). In **staging**, the data services
keep public access with an "allow Azure services" firewall rule to minimise cost
and complexity.
endpoint subnet), private endpoints for PostgreSQL, Cosmos, Redis, Blob, Key
Vault, **and the AI service App Service site** — so the data tier and the AI
service have **no public network access** (constitution Principle II). The AI
service further restricts inbound to `snet-app` only via `ipSecurityRestrictions`.
In **staging**, the data services keep public access with an "allow Azure
services" firewall rule; the AI service is protected by the app-layer
`INTERNAL_SERVICE_TOKEN` only (network restriction deferred until VNet
integration is promoted to staging).

---

Expand Down Expand Up @@ -166,17 +168,18 @@ Principles applied:
|------------------|-----------------------|-------------|--------|
| `postgres-password` | `POSTGRES_PASSWORD` | core | variable group |
| `mongo-uri` | `MONGO_URI` | core, ai | `az cosmosdb keys list` |
| `redis-password` | `REDIS_PASSWORD` | core¹ | `az redis list-keys` |
| `redis-password` | `REDIS_PASSWORD` | core | `az redis list-keys` |
| `jwt-secret`, `jwt-refresh-secret` | `JWT_SECRET`, `JWT_REFRESH_SECRET` | core | generated once |
| `encryption-key`, `plugin-settings-encryption-key` | same (upper-snake) | core | generated once |
| `internal-service-token` | `INTERNAL_SERVICE_TOKEN` | core + ai | generated once, **independent** from jwt-secret |
| `stripe-secret-key`, `stripe-webhook-secret` | `STRIPE_*` | core | variable group (optional) |
| `openai-api-key`, `anthropic-api-key`, `gemini-api-key` | `*_API_KEY` | ai | variable group (optional) |
| `openai-api-key`, `anthropic-api-key`, `gemini-api-key`, `huggingface-api-key` | `*_API_KEY` | ai | variable group (optional) |

¹ Redis settings are injected and ready; the core's rate limiter is currently
in-memory (`common/interceptors/rate-limit.interceptor.ts`). Swapping it for a
Redis-backed limiter needs no infra change — `REDIS_HOST/PORT/TLS/PASSWORD` are
already present. Secrets are seeded idempotently by
Secrets are seeded idempotently by
[`azure-pipelines/scripts/seed-keyvault.sh`](../../azure-pipelines/scripts/seed-keyvault.sh).
`internal-service-token` uses `ensure_secret` (generated once, never overwritten
automatically) and rotates independently from JWT keys — use different rotation
cadences and ownership.

---

Expand Down
109 changes: 99 additions & 10 deletions docs/azure/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,27 @@ az group create -n rg-agentbase-staging -l eastus
az group create -n rg-agentbase-prod -l eastus
```

### 1.2 Service connection
### 1.2 Service connections

Create an Azure Resource Manager **service connection** (Project Settings →
Service connections) scoped to the subscription, e.g. named
`agentbase-azure`. Grant its service principal **Contributor** + **User Access
Administrator** on both resource groups (User Access Administrator is required
because the Bicep creates **role assignments** in `rbac.bicep`).
Create **two** Azure Resource Manager service connections (Project Settings →
Service connections), one per environment. Scope each to its resource group only
(not the whole subscription) for least-privilege isolation.

| ADO variable | Connection name (example) | Scoped to |
|---|---|---|
| `AZURE_SERVICE_CONNECTION_STAGING` | `agentbase-staging-sc` | `RG_STAGING` |
| `AZURE_SERVICE_CONNECTION_PROD` | `agentbase-prod-sc` | `RG_PROD` |

Grant each service principal **two roles** on its resource group:
- **Owner** — required because `rbac.bicep` creates role assignments (Contributor
alone can't grant roles; you need User Access Administrator, which Owner includes).
- **Key Vault Secrets Officer** (at RG scope, not KV resource scope) — data-plane
secret writes. Scoped to the RG so the role inherits to the Key Vault once
Bicep creates it on the first run (the KV doesn't exist yet when this is set up).

For the prod service connection: choose **"Specific pipelines"** rather than
"Grant access to all pipelines" in the security settings, and add only the
`agentbase-deploy.yml` pipeline.

### 1.3 Variable group

Expand All @@ -34,7 +48,8 @@ Library) with:

| Variable | Secret? | Example / purpose |
|----------|:------:|-------------------|
| `AZURE_SERVICE_CONNECTION` | no | `agentbase-azure` |
| `AZURE_SERVICE_CONNECTION_STAGING` | no | `agentbase-staging-sc` |
| `AZURE_SERVICE_CONNECTION_PROD` | no | `agentbase-prod-sc` |
| `RG_STAGING` | no | `rg-agentbase-staging` |
| `RG_PROD` | no | `rg-agentbase-prod` |
| `PG_ADMIN_PASSWORD` | **yes** | PostgreSQL admin password (≥ 12 chars, complex) |
Expand All @@ -44,11 +59,13 @@ Library) with:
| `OPENAI_API_KEY` | yes | *(optional)* AI provider |
| `ANTHROPIC_API_KEY` | yes | *(optional)* |
| `GEMINI_API_KEY` | yes | *(optional)* |
| `HUGGINGFACE_API_KEY` | yes | *(optional)* |

Optional secrets left undefined are stored in Key Vault as `not-configured`
placeholders so their Key Vault references still resolve. `jwt-secret`,
`jwt-refresh-secret`, `encryption-key`, and `plugin-settings-encryption-key`
are **generated once** by the seed script and preserved across deploys.
`jwt-refresh-secret`, `encryption-key`, `plugin-settings-encryption-key`, and
`internal-service-token` are **generated once** by the seed script and preserved
across deploys — do not add these to the variable group.

### 1.4 Environments + approval gate

Expand Down Expand Up @@ -143,7 +160,79 @@ az group delete --name rg-agentbase-staging --yes --no-wait

---

## 6. Local validation (before pushing)
## 6. Prelaunch checklist

**This checklist must be signed off before the first production push.**
Items marked **[GATE]** are hard blockers — the checklist cannot be signed
off while any GATE item is unresolved. No-go audit findings become known issues
that slip under launch pressure without explicit gates here.

### Security

- [ ] **[GATE]** `INTERNAL_SERVICE_TOKEN` is in Key Vault (`internal-service-token`
secret exists and is not `not-configured`) for both staging and prod.
- [ ] **[GATE]** AI service `/api/ai/conversations` returns 401 without the token;
returns 200 with the correct `X-Internal-Token` header.
- [ ] **[GATE]** Rate limiting enforced globally: verify with concurrent requests
across multiple instances that the Redis-backed counter triggers 429.
- [ ] **[GATE]** Encryption key present in Key Vault (`encryption-key`) and is a
64-character hex string — test BYOK provider key save/load round-trip.
- [ ] All security audit categories in `docs/azure/prelaunch-security-audit.md`
show **GO**.

### Network lockdown (prod)

- [ ] **[GATE]** AI service not reachable from the public internet in prod. Test:
`curl https://<aiAppName>.azurewebsites.net/api/ai/conversations` from
outside Azure — must return 403 or TCP connection refused (private endpoint).
- [ ] **[GATE]** Core→AI calls succeed through the VNet path in prod.
- [ ] AI service `ipSecurityRestrictions` applied: Azure portal → AI app →
Networking → Access Restrictions — only `snet-app` allow rule present.

### SSE streaming

- [ ] SSE stream completes normally end-to-end through the core proxy:
`curl -N https://<coreUrl>/v1/chat -H 'X-API-Key: <key>' -d '{"message":"hello"}'`
- [ ] **[GATE — disconnect cleanup]** Manual verification: run the above curl, kill
it mid-stream with Ctrl-C, then check AI service logs for unclosed generator
errors. No `GeneratorExit` unhandled traces should appear.
- [ ] No response buffering: chunks arrive incrementally (not in one burst after
stream ends). If on App Service, confirm `X-Accel-Buffering: no` header
is present in the response.

### Analytics / consent

- [ ] Cookie consent banner appears on first visit (no prior localStorage entry).
- [ ] GA4 and UET scripts are **not** present in page source before consent is
given — verify with browser devtools network tab.
- [ ] After accepting consent, GA4/UET scripts load and fire pageview events.
- [ ] "Manage cookies" resets consent and banner reappears on reload.

### Pipeline

- [ ] `az bicep build --file infra/main.bicep` passes (no errors, warnings OK).
- [ ] Validate stage (`what-if`) completes green on a staging run.
- [ ] Staging deploy green with all three health checks passing.
- [ ] Manual approval gate active on `agentbase-prod` environment in ADO.
- [ ] Prod service connection uses "Specific pipelines" authorization.
- [ ] Rollback procedure tested: re-point an app at a previous tag and verify
it comes up healthy.

### Sign-off

| Area | Signed off by | Date |
| --- | --- | --- |
| Security | | |
| Network lockdown (prod) | | |
| SSE streaming | | |
| Analytics / consent | | |
| Pipeline | | |

All GATE items resolved and all rows signed off before merging to production.

---

## 7. Local validation (before pushing)

```bash
az bicep build --file infra/main.bicep # lint
Expand Down
Loading
Loading