diff --git a/CLAUDE.md b/CLAUDE.md
index 5f8d5fb4..0adeea4d 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -40,6 +40,12 @@ User Creates Session → Backend Creates CR → Operator Spawns Job →
Pod Runs Claude CLI → Results Stored in CR → UI Displays Progress
```
+📐 **Architecture Diagrams:** See [docs/architecture/](docs/architecture/) for comprehensive visual guides including:
+- [Core System Architecture](docs/architecture/core-system-architecture.md) - 4-component system with data flows
+- [Agentic Session Lifecycle](docs/architecture/agentic-session-lifecycle.md) - State machine and reconciliation
+- [Multi-Tenancy Architecture](docs/architecture/multi-tenancy-architecture.md) - Project isolation and RBAC
+- [Kubernetes Resources](docs/architecture/kubernetes-resources.md) - CRD structures and relationships
+
## Memory System - Loadable Context
This repository uses a structured **memory system** to provide targeted, loadable context instead of relying solely on this comprehensive CLAUDE.md file.
diff --git a/docs/architecture/agentic-session-lifecycle.md b/docs/architecture/agentic-session-lifecycle.md
new file mode 100644
index 00000000..97498217
--- /dev/null
+++ b/docs/architecture/agentic-session-lifecycle.md
@@ -0,0 +1,611 @@
+# Agentic Session Lifecycle
+
+## Overview
+
+An **AgenticSession** represents a single AI-powered automation task. This document describes the complete lifecycle from creation to completion, including state transitions, operator reconciliation, and error handling.
+
+## State Machine
+
+```mermaid
+stateDiagram-v2
+ [*] --> Pending: User creates session
(Backend creates CR)
+
+ Pending --> Running: Operator creates Job
Pod starts execution
+
+ Running --> Completed: Job succeeds
Results captured
+ Running --> Failed: Job fails
Error captured
+ Running --> Timeout: Timeout exceeded
Job terminated
+
+ Completed --> [*]
+ Failed --> [*]
+ Timeout --> [*]
+
+ note right of Pending
+ Initial state
+ - CR exists
+ - No Job created yet
+ - Operator will reconcile
+ end note
+
+ note right of Running
+ Active execution
+ - Job created
+ - Pod running
+ - Results streaming
+ - Status updates frequent
+ end note
+
+ note right of Completed
+ Success terminal state
+ - Results in CR status
+ - Job succeeded
+ - Resources cleaned up
+ end note
+
+ note right of Failed
+ Error terminal state
+ - Error message in CR
+ - Job failed
+ - Resources cleaned up
+ end note
+
+ note right of Timeout
+ Timeout terminal state
+ - Job terminated
+ - Partial results captured
+ - Resources cleaned up
+ end note
+```
+
+## Phase Descriptions
+
+### Pending
+
+**Entry Condition:** Backend API creates AgenticSession CR
+
+**State Characteristics:**
+- CR exists with `spec` populated
+- No `status` or `status.phase = "Pending"`
+- No Job created yet
+- No Pod running
+
+**Next Transition:** Operator detects CR and creates Job → `Running`
+
+**Typical Duration:** 1-5 seconds
+
+---
+
+### Running
+
+**Entry Condition:** Operator creates Job successfully
+
+**State Characteristics:**
+- Job exists with OwnerReference to AgenticSession
+- Pod scheduled and executing
+- `status.phase = "Running"`
+- `status.startTime` set
+- `status.results` may contain partial results
+
+**Status Updates:**
+- Operator monitors Job status every 5 seconds
+- Runner updates CR with progress logs
+- WebSocket broadcasts updates to frontend
+
+**Next Transitions:**
+- Job succeeds → `Completed`
+- Job fails → `Failed`
+- Timeout exceeded → `Timeout`
+
+**Typical Duration:** 30 seconds to 2 hours (configurable)
+
+---
+
+### Completed
+
+**Entry Condition:** Job completes successfully (exit code 0)
+
+**State Characteristics:**
+- `status.phase = "Completed"`
+- `status.completionTime` set
+- `status.results` contains final output
+- Per-repo `pushed` or `abandoned` status
+- Job and Pod cleaned up (OwnerReference cascade)
+
+**Terminal State:** No further transitions
+
+**Typical Retention:** CR persists for audit/history (manual deletion or TTL)
+
+---
+
+### Failed
+
+**Entry Condition:** Job fails (non-zero exit code)
+
+**State Characteristics:**
+- `status.phase = "Failed"`
+- `status.completionTime` set
+- `status.message` contains error details
+- `status.results` may contain partial output
+- Job and Pod cleaned up
+
+**Common Failure Reasons:**
+- Invalid Anthropic API key
+- Git authentication failure
+- Runner execution error
+- Resource limits exceeded
+
+**Terminal State:** No further transitions
+
+**Typical Retention:** CR persists for debugging (manual deletion)
+
+---
+
+### Timeout
+
+**Entry Condition:** Execution exceeds configured timeout
+
+**State Characteristics:**
+- `status.phase = "Timeout"`
+- `status.completionTime` set
+- `status.message` indicates timeout
+- `status.results` contains partial output
+- Job terminated by operator
+- Pod cleaned up
+
+**Timeout Configuration:**
+- Default: 1 hour
+- Configurable via `spec.timeout` (seconds)
+- ProjectSettings can set default per project
+
+**Terminal State:** No further transitions
+
+---
+
+## Operator Reconciliation Flow
+
+```mermaid
+flowchart TD
+ Start([Watch Event:
AgenticSession Added/Modified])
+
+ Start --> GetCR[Get current CR from API]
+ GetCR --> Exists{CR exists?}
+
+ Exists -->|No - IsNotFound| LogDelete[Log: Resource deleted]
+ LogDelete --> End([Return - Not an error])
+
+ Exists -->|Yes| GetPhase[Extract status.phase]
+ GetPhase --> CheckPhase{phase?}
+
+ CheckPhase -->|Pending| CheckJob{Job exists?}
+ CheckPhase -->|Running| MonitorJob[Continue monitoring
goroutine exists]
+ CheckPhase -->|Completed/Failed/Timeout| End
+
+ CheckJob -->|Yes| LogExists[Log: Job already exists]
+ LogExists --> End
+
+ CheckJob -->|No| CreateJob[Create Job with:
- OwnerReference
- Runner image
- Env vars from ProjectSettings
- PVC mount]
+
+ CreateJob --> JobCreated{Job created?}
+
+ JobCreated -->|No| UpdateError[Update CR status:
phase=Failed
message=error]
+ UpdateError --> End
+
+ JobCreated -->|Yes| UpdateRunning[Update CR status:
phase=Running
startTime=now]
+ UpdateRunning --> StartMonitor[Start goroutine:
monitorJob]
+ StartMonitor --> End
+
+ MonitorJob --> End
+
+ style Start fill:#e1f5ff
+ style End fill:#e1ffe1
+ style CheckPhase fill:#fff4e1
+ style CreateJob fill:#f0e1ff
+ style UpdateError fill:#ffe1e1
+ style UpdateRunning fill:#e1ffe1
+```
+
+## Job Monitoring Loop
+
+```mermaid
+sequenceDiagram
+ participant Op as Operator
(goroutine)
+ participant K8s as Kubernetes API
+ participant CR as AgenticSession CR
+ participant Job as Job
+ participant Pod as Pod
+
+ Note over Op: Started by
reconciliation loop
+
+ loop Every 5 seconds
+ Op->>CR: Check if CR still exists
+
+ alt CR deleted
+ CR-->>Op: IsNotFound error
+ Note over Op: Exit goroutine
(session deleted by user)
+ end
+
+ Op->>Job: Get Job status
+
+ alt Job deleted
+ Job-->>Op: IsNotFound error
+ Op->>CR: Update status:
phase=Failed
message="Job was deleted"
+ Note over Op: Exit goroutine
+ end
+
+ Job-->>Op: Job status
+
+ alt Job succeeded
+ Op->>CR: Update status:
phase=Completed
completionTime=now
+ Op->>Job: Delete Job
(cleanup)
+ Note over Op: Exit goroutine
(success)
+
+ else Job failed
+ Op->>Pod: Get Pod logs
(last 100 lines)
+ Pod-->>Op: Error logs
+ Op->>CR: Update status:
phase=Failed
message=error
results=logs
+ Op->>Job: Delete Job
(cleanup)
+ Note over Op: Exit goroutine
(failure)
+
+ else Job still running
+ Op->>CR: Update status:
progress info
+ Note over Op: Continue monitoring
+ end
+
+ Note over Op: Check timeout
+ alt Timeout exceeded
+ Op->>Job: Delete Job
(terminate)
+ Op->>CR: Update status:
phase=Timeout
message="Exceeded timeout"
+ Note over Op: Exit goroutine
(timeout)
+ end
+ end
+```
+
+## Status Update Patterns
+
+### Operator Status Updates
+
+**Use Case:** Operator updates phase transitions
+
+**Pattern:** Update via `/status` subresource
+
+```go
+// components/operator/internal/handlers/sessions.go
+func updateAgenticSessionStatus(namespace, name string, updates map[string]interface{}) error {
+ gvr := types.GetAgenticSessionResource()
+
+ // Get current CR
+ obj, err := config.DynamicClient.Resource(gvr).
+ Namespace(namespace).
+ Get(ctx, name, v1.GetOptions{})
+
+ if errors.IsNotFound(err) {
+ log.Printf("CR deleted, skipping status update")
+ return nil // Not an error
+ }
+
+ // Initialize status if needed
+ if obj.Object["status"] == nil {
+ obj.Object["status"] = make(map[string]interface{})
+ }
+
+ status := obj.Object["status"].(map[string]interface{})
+ for k, v := range updates {
+ status[k] = v
+ }
+
+ // Update via /status subresource
+ _, err = config.DynamicClient.Resource(gvr).
+ Namespace(namespace).
+ UpdateStatus(ctx, obj, v1.UpdateOptions{})
+
+ if errors.IsNotFound(err) {
+ return nil // CR deleted during update
+ }
+
+ return err
+}
+```
+
+### Runner Status Updates
+
+**Use Case:** Runner pod updates results incrementally
+
+**Pattern:** Runner has minted token with limited permissions
+
+```python
+# components/runners/claude-code-runner/runner.py
+def update_session_status(results: Dict[str, Any]):
+ """Update CR status from runner pod."""
+ try:
+ # Use minted token from Secret
+ token = os.environ.get("RUNNER_TOKEN")
+
+ # Update via Kubernetes API
+ response = requests.patch(
+ f"{k8s_api}/apis/vteam.ambient-code/v1alpha1/namespaces/{namespace}/agenticsessions/{name}/status",
+ headers={"Authorization": f"Bearer {token}"},
+ json={"status": {"results": results}}
+ )
+
+ response.raise_for_status()
+ except Exception as e:
+ log.error(f"Failed to update status: {e}")
+ # Non-fatal: operator will update eventually
+```
+
+## Resource Lifecycle and Cleanup
+
+```mermaid
+graph TD
+ subgraph "Resource Creation"
+ CR[AgenticSession CR
Created by Backend]
+ Job[Job
Created by Operator]
+ Pod[Pod
Created by Job Controller]
+ Secret[Secret
Minted token]
+ PVC[PVC
Workspace storage]
+ end
+
+ subgraph "OwnerReferences"
+ CR -->|controller=true| Job
+ Job -->|controller=true| Pod
+ CR -->|controller=true| Secret
+ end
+
+ subgraph "Cleanup Scenarios"
+ Delete1[User deletes CR]
+ Delete2[Job completes
Operator deletes Job]
+ TTL[TTL expired
K8s deletes CR]
+ end
+
+ Delete1 --> CascadeDelete1[Kubernetes cascades:
Job → Pod → Secret]
+ Delete2 --> NormalCleanup[Operator deletes Job
Pod cleaned by Job controller]
+ TTL --> CascadeDelete2[Same as user delete]
+
+ style CR fill:#ffe1e1
+ style Job fill:#fff4e1
+ style Pod fill:#e1ffe1
+ style Secret fill:#f0e1ff
+ style Delete1 fill:#ffe1e1
+ style Delete2 fill:#e1ffe1
+```
+
+**Key Cleanup Principles:**
+
+1. **OwnerReferences** ensure automatic cleanup when parent is deleted
+2. **Controller=true** on primary owner (only one per resource)
+3. **No BlockOwnerDeletion** (causes permission issues in multi-tenant)
+4. Operator explicitly deletes Jobs on completion (don't wait for cascade)
+5. PVCs persist for debugging (manual cleanup or TTL)
+
+**Reference:** [Backend/Operator Development Standards](../../CLAUDE.md#resource-management)
+
+---
+
+## Error Handling Patterns
+
+### Non-Fatal Errors (Operator)
+
+**Scenario:** Resource deleted during processing
+
+```go
+if errors.IsNotFound(err) {
+ log.Printf("AgenticSession %s no longer exists, skipping", name)
+ return nil // Not treated as error - user deleted it
+}
+```
+
+### Retriable Errors (Operator)
+
+**Scenario:** Transient K8s API failure
+
+```go
+if err != nil {
+ log.Printf("Failed to create Job: %v", err)
+ updateAgenticSessionStatus(ns, name, map[string]interface{}{
+ "phase": "Error",
+ "message": fmt.Sprintf("Failed to create Job: %v", err),
+ })
+ return fmt.Errorf("failed to create Job: %w", err)
+ // Operator watch loop will retry on next event
+}
+```
+
+### Terminal Errors (Runner)
+
+**Scenario:** Invalid API key
+
+```python
+try:
+ client = anthropic.Anthropic(api_key=api_key)
+ response = client.messages.create(...)
+except anthropic.AuthenticationError as e:
+ # Update CR with terminal error
+ update_session_status({
+ "phase": "Failed",
+ "message": f"Invalid Anthropic API key: {e}",
+ "completionTime": datetime.now().isoformat()
+ })
+ sys.exit(1) # Exit pod with failure
+```
+
+---
+
+## Interactive vs Batch Execution
+
+### Batch Mode (Default)
+
+**Characteristics:**
+- Single prompt execution
+- Timeout enforced (default 1 hour)
+- Results written to CR on completion
+- Pod exits after execution
+
+**Use Cases:**
+- One-off automation tasks
+- Scripted workflows
+- RFE generation
+
+**Flow:**
+```
+User → Prompt → Runner executes → Results → Pod exits
+```
+
+---
+
+### Interactive Mode
+
+**Characteristics:**
+- Long-running session (no timeout)
+- User sends messages via inbox file
+- Runner responds via outbox file
+- Pod continues running until explicitly stopped
+
+**Use Cases:**
+- Iterative development
+- Multi-turn conversations
+- Complex debugging sessions
+
+**Flow:**
+```
+User → Initial prompt → Runner starts
+ ↓
+User writes to inbox → Runner reads → Executes → Writes to outbox
+ ↓
+User reads outbox → Continues conversation...
+ ↓
+User signals completion → Pod exits
+```
+
+**Configuration:**
+```yaml
+apiVersion: vteam.ambient-code/v1alpha1
+kind: AgenticSession
+metadata:
+ name: interactive-session
+spec:
+ interactive: true # Enable interactive mode
+ prompt: "Initial prompt"
+ repos:
+ - input:
+ url: https://github.com/org/repo
+ branch: main
+```
+
+**File Locations:**
+- Inbox: `/workspace/inbox.txt` (user writes)
+- Outbox: `/workspace/outbox.txt` (runner writes)
+- Workspace: `/workspace/repos/` (cloned repositories)
+
+---
+
+## Multi-Repo Execution
+
+```mermaid
+flowchart LR
+ subgraph "AgenticSession Spec"
+ MainIdx[mainRepoIndex: 1]
+ Repos[repos array:
0: repo-A
1: repo-B
2: repo-C]
+ end
+
+ subgraph "Runner Workspace"
+ WS[/workspace/repos/]
+ RepoA[repo-A/
cloned from repos[0]]
+ RepoB[repo-B/
cloned from repos[1]
WORKING DIRECTORY]
+ RepoC[repo-C/
cloned from repos[2]]
+ end
+
+ subgraph "Status Tracking"
+ StatusA[repos[0].status:
pushed=true]
+ StatusB[repos[1].status:
pushed=true]
+ StatusC[repos[2].status:
abandoned=true]
+ end
+
+ MainIdx -->|Specifies| RepoB
+ Repos --> WS
+ WS --> RepoA
+ WS --> RepoB
+ WS --> RepoC
+
+ RepoA -.-> StatusA
+ RepoB -.-> StatusB
+ RepoC -.-> StatusC
+
+ style RepoB fill:#e1ffe1
+ style MainIdx fill:#fff4e1
+```
+
+**Key Concepts:**
+
+1. **mainRepoIndex** (default: 0): Sets Claude Code working directory
+2. **Cloning Order**: Repos cloned in array order
+3. **Per-Repo Status**: Each repo tracked individually (pushed/abandoned)
+4. **Cross-Repo References**: Claude can access all repos in workspace
+
+**Reference:** [ADR-0003: Multi-Repository Support](../adr/0003-multi-repo-support.md)
+
+---
+
+## Timeout Handling
+
+### Timeout Configuration
+
+```yaml
+apiVersion: vteam.ambient-code/v1alpha1
+kind: AgenticSession
+spec:
+ timeout: 3600 # seconds (1 hour)
+```
+
+**Timeout Sources (priority order):**
+1. `spec.timeout` on AgenticSession CR
+2. `defaultTimeout` in ProjectSettings CR
+3. Global default (1 hour)
+
+### Timeout Enforcement
+
+**Operator monitors elapsed time:**
+
+```go
+func monitorJob(jobName, sessionName, namespace string) {
+ startTime := time.Now()
+ timeout := getTimeoutForSession(namespace, sessionName)
+
+ for {
+ time.Sleep(5 * time.Second)
+
+ elapsed := time.Since(startTime)
+ if elapsed > timeout {
+ log.Printf("Session %s exceeded timeout (%v)", sessionName, timeout)
+
+ // Terminate Job
+ deleteJob(namespace, jobName)
+
+ // Update CR status
+ updateAgenticSessionStatus(namespace, sessionName, map[string]interface{}{
+ "phase": "Timeout",
+ "message": fmt.Sprintf("Exceeded timeout of %v", timeout),
+ "completionTime": time.Now().Format(time.RFC3339),
+ })
+
+ return // Exit monitoring
+ }
+
+ // ... check Job status ...
+ }
+}
+```
+
+**Graceful Shutdown:**
+- Runner receives SIGTERM from Kubernetes
+- Runner captures partial results
+- Runner updates CR status before exit
+
+---
+
+## Related Documentation
+
+- [Core System Architecture](./core-system-architecture.md) - Component overview
+- [Kubernetes Resources](./kubernetes-resources.md) - CR schemas
+- [Multi-Tenancy Architecture](./multi-tenancy-architecture.md) - Project isolation
+- [Operator Development Standards](../../CLAUDE.md#operator-patterns)
+- [ADR-0001: Kubernetes-Native Architecture](../adr/0001-kubernetes-native-architecture.md)
diff --git a/docs/architecture/core-system-architecture.md b/docs/architecture/core-system-architecture.md
new file mode 100644
index 00000000..5ae70e79
--- /dev/null
+++ b/docs/architecture/core-system-architecture.md
@@ -0,0 +1,402 @@
+# Core System Architecture
+
+## Overview
+
+The Ambient Code Platform follows a Kubernetes-native architecture with four primary components that work together to orchestrate AI-powered automation tasks.
+
+## High-Level Architecture
+
+```mermaid
+graph TB
+ subgraph "User Interface"
+ UI[Frontend
NextJS + Shadcn UI
React Query]
+ end
+
+ subgraph "API Layer"
+ API[Backend API
Go + Gin
REST + WebSocket]
+ end
+
+ subgraph "Kubernetes Cluster"
+ subgraph "Control Plane"
+ OP[Agentic Operator
Go Controller
Watches CRs]
+ end
+
+ subgraph "Custom Resources"
+ AS[AgenticSession
CR]
+ PS[ProjectSettings
CR]
+ RFE[RFEWorkflow
CR]
+ end
+
+ subgraph "Execution"
+ JOB[Kubernetes Job]
+ POD[Runner Pod
Python + Claude SDK]
+ PVC[Persistent Volume
Workspace Storage]
+ end
+ end
+
+ UI -->|HTTP/HTTPS
REST API + WS| API
+ API -->|K8s Dynamic Client
User Token| AS
+ API -->|K8s Dynamic Client
User Token| PS
+ API -->|K8s Dynamic Client
User Token| RFE
+
+ OP -->|Watches| AS
+ OP -->|Watches| PS
+ OP -->|Watches| RFE
+
+ OP -->|Creates & Monitors| JOB
+ JOB -->|Spawns| POD
+ POD -->|Mounts| PVC
+
+ POD -->|Updates Status| AS
+ OP -->|Updates Status| AS
+
+ AS -.->|OwnerReference| JOB
+ JOB -.->|OwnerReference| POD
+
+ style UI fill:#e1f5ff
+ style API fill:#fff4e1
+ style OP fill:#f0e1ff
+ style POD fill:#e1ffe1
+ style AS fill:#ffe1e1
+ style PS fill:#ffe1e1
+ style RFE fill:#ffe1e1
+```
+
+## Component Breakdown
+
+### 1. Frontend (NextJS + Shadcn UI)
+
+**Technology Stack:**
+- NextJS 14+ with App Router
+- Shadcn UI component library
+- React Query for data fetching
+- TypeScript for type safety
+
+**Responsibilities:**
+- User interface for session management
+- Real-time status updates via WebSocket
+- Project and settings management
+- RFE workflow visualization
+
+**Key Patterns:**
+- Server-side rendering for performance
+- Optimistic updates with React Query
+- Type-safe API client integration
+
+**Reference:** [Frontend Development Standards](../../CLAUDE.md#frontend-development-standards)
+
+---
+
+### 2. Backend API (Go + Gin)
+
+**Technology Stack:**
+- Go 1.21+
+- Gin web framework
+- Kubernetes Dynamic Client
+- OpenShift OAuth integration
+
+**Responsibilities:**
+- REST API for CRUD operations on Custom Resources
+- WebSocket server for real-time updates
+- Multi-tenant project isolation (namespace mapping)
+- User authentication and authorization (RBAC)
+- Git operations (clone, fork, PR creation)
+
+**Key Endpoints:**
+- `/api/projects/:project/agentic-sessions` - Session management
+- `/api/projects/:project/project-settings` - Configuration
+- `/api/projects/:project/rfe-workflows` - RFE orchestration
+- `/ws` - WebSocket for real-time updates
+
+**Key Patterns:**
+- User token authentication for all operations
+- Project-scoped endpoints with RBAC validation
+- Middleware chain: Recovery → Logging → CORS → Auth → Validation
+- Error handling with structured responses
+
+**Reference:** [Backend Development Standards](../../CLAUDE.md#backend-and-operator-development-standards)
+
+---
+
+### 3. Agentic Operator (Go Controller)
+
+**Technology Stack:**
+- Go 1.21+
+- Kubernetes controller-runtime patterns
+- Watch/reconciliation loop
+- Custom Resource Definitions (CRDs)
+
+**Responsibilities:**
+- Watch AgenticSession, ProjectSettings, RFEWorkflow CRs
+- Reconcile desired state with actual state
+- Create and manage Kubernetes Jobs for session execution
+- Monitor Job completion and update CR status
+- Handle timeouts and cleanup
+
+**Reconciliation Flow:**
+1. Watch for CR events (Added, Modified, Deleted)
+2. Check resource phase (Pending, Running, Completed, Failed)
+3. Create Job if phase is Pending
+4. Monitor Job status and update CR
+5. Handle errors and retries with exponential backoff
+
+**Key Patterns:**
+- Reconnection logic for watch failures
+- Idempotent resource creation
+- OwnerReferences for automatic cleanup
+- Status updates via `/status` subresource
+- Goroutine monitoring for long-running jobs
+
+**Reference:** [Operator Development Standards](../../CLAUDE.md#operator-patterns)
+
+---
+
+### 4. Claude Code Runner (Python)
+
+**Technology Stack:**
+- Python 3.11+
+- Claude Code SDK (≥0.0.23)
+- Anthropic API (≥0.68.0)
+- Git integration
+
+**Responsibilities:**
+- Execute Claude Code CLI in containerized environment
+- Manage workspace synchronization via PVC
+- Handle interactive vs. batch execution modes
+- Capture results and update CR status
+- Multi-agent collaboration coordination
+
+**Execution Modes:**
+- **Batch Mode:** Single prompt execution with timeout
+- **Interactive Mode:** Long-running chat using inbox/outbox files
+
+**Key Patterns:**
+- Workspace isolation per session
+- Multi-repo support with mainRepoIndex
+- Result capture and structured output
+- Error propagation to operator
+
+**Reference:** [Runner Documentation](../../components/runners/claude-code-runner/README.md)
+
+---
+
+## Data Flow: Agentic Session Execution
+
+```mermaid
+sequenceDiagram
+ actor User
+ participant UI as Frontend
+ participant API as Backend API
+ participant K8s as Kubernetes API
+ participant Op as Operator
+ participant Job as Job/Pod
+ participant CR as AgenticSession CR
+
+ User->>UI: Create Session
+ UI->>API: POST /api/projects/{project}/agentic-sessions
+
+ Note over API: Extract user token
Validate RBAC permissions
+
+ API->>K8s: Create AgenticSession CR
(using user token)
+ K8s-->>API: CR Created (UID)
+ API-->>UI: 201 Created {name, uid}
+
+ Note over Op: Watch loop detects
new CR event
+
+ Op->>K8s: Get AgenticSession CR
+ K8s-->>Op: CR with phase=Pending
+
+ Op->>K8s: Create Job with OwnerReference
+ Note over Op: Set controller=true
for automatic cleanup
+
+ K8s-->>Op: Job Created
+ Op->>K8s: Update CR status
phase=Running
+
+ K8s->>Job: Schedule Pod
+
+ Note over Job: Runner executes
Claude Code CLI
+
+ loop Monitoring
+ Op->>K8s: Check Job status
+ K8s-->>Op: Job status (running/succeeded/failed)
+
+ Op->>K8s: Update CR status
(progress, logs, errors)
+ end
+
+ Job->>K8s: Update CR status
(results, completionTime)
+
+ Op->>K8s: Update CR status
phase=Completed
+
+ K8s-->>API: Status change event
+ API-->>UI: WebSocket update
+ UI-->>User: Display results
+```
+
+## Multi-Tenancy Model
+
+```mermaid
+graph LR
+ subgraph "Project A"
+ PA[Project 'team-alpha']
+ NSA[Namespace: team-alpha]
+ ASA1[AgenticSession-1]
+ ASA2[AgenticSession-2]
+ PSA[ProjectSettings]
+ end
+
+ subgraph "Project B"
+ PB[Project 'team-beta']
+ NSB[Namespace: team-beta]
+ ASB1[AgenticSession-1]
+ PSB[ProjectSettings]
+ end
+
+ PA -->|Maps to| NSA
+ PB -->|Maps to| NSB
+
+ NSA -->|Contains| ASA1
+ NSA -->|Contains| ASA2
+ NSA -->|Contains| PSA
+
+ NSB -->|Contains| ASB1
+ NSB -->|Contains| PSB
+
+ style PA fill:#e1f5ff
+ style PB fill:#ffe1e1
+ style NSA fill:#e1f5ff
+ style NSB fill:#ffe1e1
+```
+
+**Isolation Guarantees:**
+- Each project maps to a dedicated Kubernetes namespace (1:1 mapping)
+- User tokens enforce RBAC at namespace boundaries
+- Resources cannot cross namespace boundaries
+- Backend validates project access before CR operations
+
+**Reference:** [Multi-Tenancy Architecture](./multi-tenancy-architecture.md)
+
+---
+
+## Key Architectural Decisions
+
+### 1. Kubernetes-Native Design
+
+**Why:** Leverage Kubernetes for orchestration, scheduling, resource management, and RBAC.
+
+**Benefits:**
+- Declarative resource model via Custom Resources
+- Built-in RBAC and multi-tenancy
+- Horizontal scalability
+- Self-healing and automatic cleanup via OwnerReferences
+
+**Reference:** [ADR-0001: Kubernetes-Native Architecture](../adr/0001-kubernetes-native-architecture.md)
+
+---
+
+### 2. User Token Authentication
+
+**Why:** Enforce per-user RBAC for all API operations instead of using elevated service account permissions.
+
+**Pattern:**
+- Frontend extracts user token from OAuth flow
+- Backend validates token and uses it for K8s API calls
+- Service account only for CR writes and token minting
+
+**Security Benefits:**
+- Audit trail per user
+- Least-privilege access
+- No privilege escalation risks
+
+**Reference:** [ADR-0002: User Token Authentication](../adr/0002-user-token-authentication.md)
+
+---
+
+### 3. Asynchronous Execution Model
+
+**Why:** Long-running AI tasks cannot block HTTP requests.
+
+**Pattern:**
+- **Synchronous:** User request → Backend creates CR → Return immediately
+- **Asynchronous:** Operator watches → Creates Job → Monitors → Updates status
+- **Feedback:** WebSocket or polling for status updates
+
+**Benefits:**
+- Responsive UI (no hanging requests)
+- Resilient to operator/pod restarts
+- Kubernetes handles scheduling and retries
+
+---
+
+### 4. Go Backend + Python Runner
+
+**Why:** Use the best tool for each layer.
+
+**Rationale:**
+- **Go for Backend/Operator:** Performance, K8s client libraries, concurrency
+- **Python for Runner:** Claude SDK, rich AI/ML ecosystem, rapid development
+
+**Reference:** [ADR-0004: Go Backend + Python Runner](../adr/0004-go-backend-python-runner.md)
+
+---
+
+## Component Communication Matrix
+
+| Source | Target | Protocol | Auth | Purpose |
+|--------|--------|----------|------|---------|
+| Frontend | Backend API | HTTPS (REST) | OAuth Token | CRUD operations |
+| Frontend | Backend API | WebSocket | OAuth Token | Real-time updates |
+| Backend API | Kubernetes API | K8s Dynamic Client | User Token | CR operations |
+| Operator | Kubernetes API | K8s Dynamic Client | Service Account | Watch CRs, manage Jobs |
+| Runner Pod | Kubernetes API | K8s Dynamic Client | Pod SA + Minted Token | Update CR status |
+| Operator | Runner Job | - | OwnerReference | Lifecycle management |
+
+---
+
+## Scalability Considerations
+
+### Horizontal Scaling
+
+**Frontend:**
+- Stateless NextJS instances
+- Scale with Kubernetes Deployment replicas
+- Load balancing via Ingress/Route
+
+**Backend API:**
+- Stateless Go instances
+- Scale with Kubernetes Deployment replicas
+- WebSocket sessions require session affinity (sticky sessions)
+
+**Operator:**
+- Single-replica controller (leader election for HA)
+- Watch multiple namespaces concurrently
+- Goroutine per Job for monitoring
+
+**Runner Pods:**
+- One Pod per AgenticSession (isolation)
+- Kubernetes handles scheduling across nodes
+- Resource limits prevent resource exhaustion
+
+### Resource Limits
+
+```yaml
+# Example resource configuration
+resources:
+ requests:
+ memory: "512Mi"
+ cpu: "250m"
+ limits:
+ memory: "2Gi"
+ cpu: "1000m"
+```
+
+**Reference:** [Production Considerations](../../CLAUDE.md#production-considerations)
+
+---
+
+## Related Documentation
+
+- [Agentic Session Lifecycle](./agentic-session-lifecycle.md) - State machine and reconciliation flow
+- [Multi-Tenancy Architecture](./multi-tenancy-architecture.md) - Project isolation and RBAC
+- [Kubernetes Resources](./kubernetes-resources.md) - CRD structures and schemas
+- [Backend Development Standards](../../CLAUDE.md#backend-and-operator-development-standards)
+- [Frontend Development Standards](../../components/frontend/DESIGN_GUIDELINES.md)
diff --git a/docs/architecture/index.md b/docs/architecture/index.md
new file mode 100644
index 00000000..2a1da12b
--- /dev/null
+++ b/docs/architecture/index.md
@@ -0,0 +1,344 @@
+# Architecture Overview
+
+Welcome to the **Ambient Code Platform Architecture Documentation**. This section provides comprehensive visual diagrams and detailed explanations of the platform's design, components, and patterns.
+
+## Purpose
+
+This architecture documentation helps you:
+
+- **Understand** the platform's component interactions and data flows
+- **Navigate** complex distributed systems with clear visual aids
+- **Make informed decisions** when extending or modifying the platform
+- **Onboard quickly** with structured visual learning
+
+## Navigation Guide
+
+### Core Architecture
+
+Start here to understand the foundational platform architecture:
+
+| Document | Description | Key Diagrams |
+|----------|-------------|--------------|
+| **[Core System Architecture](./core-system-architecture.md)** | 4-component system overview, data flows, and component responsibilities | System architecture, sequence diagrams, multi-tenancy model |
+| **[Agentic Session Lifecycle](./agentic-session-lifecycle.md)** | Session state machine, operator reconciliation, and execution patterns | State diagram, reconciliation flowchart, monitoring loop |
+| **[Multi-Tenancy Architecture](./multi-tenancy-architecture.md)** | Project isolation, RBAC enforcement, and security boundaries | Namespace mapping, authentication flow, permission matrix |
+| **[Kubernetes Resources](./kubernetes-resources.md)** | Custom Resource Definitions (CRDs), schemas, and resource relationships | CR hierarchy, class diagrams, cleanup strategies |
+
+---
+
+## Quick Start by Role
+
+### For Developers
+
+**Start here if you're:**
+- Adding new features to the backend or frontend
+- Debugging session execution issues
+- Understanding component interactions
+
+**Recommended Reading Order:**
+1. [Core System Architecture](./core-system-architecture.md) - Get the big picture
+2. [Agentic Session Lifecycle](./agentic-session-lifecycle.md) - Understand execution flow
+3. [Kubernetes Resources](./kubernetes-resources.md) - Learn CR structures
+
+---
+
+### For Platform Engineers
+
+**Start here if you're:**
+- Deploying the platform to production
+- Setting up multi-tenancy and RBAC
+- Troubleshooting operator issues
+
+**Recommended Reading Order:**
+1. [Core System Architecture](./core-system-architecture.md) - Component overview
+2. [Multi-Tenancy Architecture](./multi-tenancy-architecture.md) - Isolation and security
+3. [Agentic Session Lifecycle](./agentic-session-lifecycle.md) - Operator patterns
+
+---
+
+### For Architects
+
+**Start here if you're:**
+- Evaluating the platform for adoption
+- Planning integrations or extensions
+- Understanding architectural decisions
+
+**Recommended Reading Order:**
+1. [Core System Architecture](./core-system-architecture.md) - Full system design
+2. Review [Architecture Decision Records](../adr/) - Understand "why" behind decisions
+3. [Multi-Tenancy Architecture](./multi-tenancy-architecture.md) - Security model
+4. [Kubernetes Resources](./kubernetes-resources.md) - Resource model
+
+---
+
+## Architectural Principles
+
+The Ambient Code Platform is built on these core principles:
+
+### 1. Kubernetes-Native Design
+
+**Why:** Leverage Kubernetes for orchestration, scheduling, and resource management.
+
+**How:**
+- Custom Resource Definitions (CRDs) for declarative state
+- Operator pattern for reconciliation
+- Built-in RBAC for multi-tenancy
+- OwnerReferences for automatic cleanup
+
+**Reference:** [ADR-0001: Kubernetes-Native Architecture](../adr/0001-kubernetes-native-architecture.md)
+
+---
+
+### 2. User Token Authentication
+
+**Why:** Enforce per-user RBAC instead of using elevated service account permissions.
+
+**How:**
+- Frontend extracts OAuth token
+- Backend validates and uses token for K8s API calls
+- Service account only for specific elevated operations (CR writes, token minting)
+
+**Reference:** [ADR-0002: User Token Authentication](../adr/0002-user-token-authentication.md)
+
+---
+
+### 3. Asynchronous Execution
+
+**Why:** Long-running AI tasks cannot block HTTP requests.
+
+**How:**
+- Synchronous: User request → Backend creates CR → Return immediately
+- Asynchronous: Operator watches → Creates Job → Monitors → Updates status
+- Feedback: WebSocket or polling for status updates
+
+**Benefits:**
+- Responsive UI
+- Resilient to restarts
+- Kubernetes handles scheduling
+
+---
+
+### 4. Multi-Repository Support
+
+**Why:** Real-world automation often requires changes across multiple codebases.
+
+**How:**
+- Sessions can reference multiple Git repositories
+- `mainRepoIndex` specifies working directory
+- Per-repo status tracking (pushed, abandoned, PR URL)
+
+**Reference:** [ADR-0003: Multi-Repository Support](../adr/0003-multi-repo-support.md)
+
+---
+
+### 5. Polyglot Architecture
+
+**Why:** Use the best language for each layer.
+
+**How:**
+- **Go** for backend/operator: Performance, K8s libraries, concurrency
+- **Python** for runner: Claude SDK, AI/ML ecosystem, rapid development
+- **TypeScript/NextJS** for frontend: Modern web development, type safety
+
+**Reference:** [ADR-0004: Go Backend + Python Runner](../adr/0004-go-backend-python-runner.md)
+
+---
+
+## System Components
+
+### Frontend (NextJS + Shadcn UI)
+
+**Purpose:** Web UI for session management and monitoring
+
+**Technology:**
+- NextJS 14+ with App Router
+- Shadcn UI component library
+- React Query for data fetching
+- TypeScript for type safety
+
+**Reference:** [Frontend Development Standards](../../components/frontend/DESIGN_GUIDELINES.md)
+
+---
+
+### Backend API (Go + Gin)
+
+**Purpose:** REST API for CRUD operations on Custom Resources
+
+**Technology:**
+- Go 1.21+
+- Gin web framework
+- Kubernetes Dynamic Client
+- OpenShift OAuth integration
+
+**Key Endpoints:**
+- `/api/projects/:project/agentic-sessions` - Session management
+- `/api/projects/:project/project-settings` - Configuration
+- `/api/projects/:project/rfe-workflows` - RFE orchestration
+- `/ws` - WebSocket for real-time updates
+
+**Reference:** [Backend Development Standards](../../CLAUDE.md#backend-and-operator-development-standards)
+
+---
+
+### Agentic Operator (Go Controller)
+
+**Purpose:** Watch Custom Resources and reconcile state
+
+**Technology:**
+- Go 1.21+
+- Kubernetes controller-runtime patterns
+- Watch/reconciliation loop
+
+**Responsibilities:**
+- Watch AgenticSession, ProjectSettings, RFEWorkflow CRs
+- Create and manage Kubernetes Jobs
+- Monitor Job completion and update CR status
+- Handle timeouts and cleanup
+
+**Reference:** [Operator Development Standards](../../CLAUDE.md#operator-patterns)
+
+---
+
+### Claude Code Runner (Python)
+
+**Purpose:** Execute Claude Code CLI in containerized environment
+
+**Technology:**
+- Python 3.11+
+- Claude Code SDK (≥0.0.23)
+- Anthropic API (≥0.68.0)
+- Git integration
+
+**Responsibilities:**
+- Execute AI-powered automation tasks
+- Manage workspace synchronization
+- Capture results and update CR status
+- Handle interactive and batch modes
+
+**Reference:** [Runner Documentation](../../components/runners/claude-code-runner/README.md)
+
+---
+
+## Data Flow Summary
+
+```mermaid
+graph LR
+ User[User] -->|HTTPS| FE[Frontend]
+ FE -->|REST API| BE[Backend API]
+ BE -->|K8s Dynamic Client| K8s[Kubernetes API]
+
+ K8s -->|CR Created| OP[Operator]
+ OP -->|Creates Job| JOB[Job]
+ JOB -->|Spawns Pod| POD[Runner Pod]
+
+ POD -->|Updates Status| K8s
+ K8s -->|Status Change| BE
+ BE -->|WebSocket| FE
+ FE -->|Display| User
+
+ style User fill:#e1f5ff
+ style FE fill:#fff4e1
+ style BE fill:#ffe1e1
+ style K8s fill:#f0e1ff
+ style OP fill:#e1ffe1
+ style POD fill:#ffe1e1
+```
+
+**High-Level Flow:**
+
+1. **User** interacts with **Frontend** UI
+2. **Frontend** sends API request to **Backend**
+3. **Backend** creates Custom Resource via **Kubernetes API** (using user token)
+4. **Operator** detects CR and creates **Job**
+5. **Job** spawns **Runner Pod** to execute task
+6. **Runner** updates CR status with results
+7. **Backend** sends WebSocket update to **Frontend**
+8. **Frontend** displays results to **User**
+
+**Reference:** [Core System Architecture - Data Flow](./core-system-architecture.md#data-flow-agentic-session-execution)
+
+---
+
+## Architecture Decision Records (ADRs)
+
+ADRs document **why** architectural decisions were made, not just **what** was implemented.
+
+| ADR | Title | Date | Status |
+|-----|-------|------|--------|
+| [0001](../adr/0001-kubernetes-native-architecture.md) | Kubernetes-Native Architecture | 2024-11 | Accepted |
+| [0002](../adr/0002-user-token-authentication.md) | User Token Authentication for API Operations | 2024-11 | Accepted |
+| [0003](../adr/0003-multi-repo-support.md) | Multi-Repository Support in AgenticSessions | 2024-11 | Accepted |
+| [0004](../adr/0004-go-backend-python-runner.md) | Go Backend + Python Runner Technology Stack | 2024-11 | Accepted |
+| [0005](../adr/0005-nextjs-shadcn-react-query.md) | NextJS + Shadcn + React Query Frontend Stack | 2024-11 | Accepted |
+
+**See also:** [Decision Log](../decisions.md) for chronological record of all major decisions.
+
+---
+
+## Design Documents
+
+Detailed design documents for specific features:
+
+| Document | Description |
+|----------|-------------|
+| [Declarative Session Reconciliation](../design/declarative-session-reconciliation.md) | Operator reconciliation patterns |
+| [Session Initialization Flows](../design/session-initialization-flows.md) | Session creation and startup |
+| [Session Status Redesign](../design/session-status-redesign.md) | Status tracking and reporting |
+| [Runner-Operator Contracts](../design/runner-operator-contracts.md) | Communication between runner and operator |
+
+---
+
+## Related Context Files
+
+Loadable context files for specific development tasks:
+
+| Context File | Use When |
+|--------------|----------|
+| [Backend Development](../../.claude/context/backend-development.md) | Working on Go backend or operator |
+| [Frontend Development](../../.claude/context/frontend-development.md) | Working on NextJS frontend |
+| [Security Standards](../../.claude/context/security-standards.md) | Reviewing security practices |
+
+**Reference:** [Repomix Usage Guide](../../.claude/repomix-guide.md) for using architecture views.
+
+---
+
+## Code Pattern Catalog
+
+Common patterns used throughout the codebase:
+
+| Pattern File | Description |
+|--------------|-------------|
+| [Error Handling](../../.claude/patterns/error-handling.md) | Consistent error patterns (backend, operator, runner) |
+| [K8s Client Usage](../../.claude/patterns/k8s-client-usage.md) | When to use user token vs. service account |
+| [React Query Usage](../../.claude/patterns/react-query-usage.md) | Data fetching patterns (queries, mutations, caching) |
+
+---
+
+## Contributing to Architecture Docs
+
+When adding or updating architecture documentation:
+
+1. **Use Mermaid diagrams** for visualizations (compatible with MkDocs and GitHub)
+2. **Follow established patterns** (see existing architecture docs for examples)
+3. **Link to related documentation** (ADRs, design docs, code patterns)
+4. **Update this index** when adding new architecture pages
+5. **Test diagrams** at [mermaid.live](https://mermaid.live) before committing
+
+**Diagram Format Examples:**
+- System architecture → `graph TB` or `graph LR`
+- State transitions → `stateDiagram-v2`
+- Workflows → `sequenceDiagram`
+- Class structures → `classDiagram`
+- Flows → `flowchart`
+
+---
+
+## Questions or Feedback?
+
+For questions about the architecture:
+
+- **Technical questions:** See [Developer Guide](../developer/index.md)
+- **Architecture proposals:** Create an issue with the `architecture` label
+- **Corrections:** Submit a PR with proposed changes
+
+**Repository:** [https://github.com/ambient-code/platform](https://github.com/ambient-code/platform)
diff --git a/docs/architecture/kubernetes-resources.md b/docs/architecture/kubernetes-resources.md
new file mode 100644
index 00000000..c9f1f924
--- /dev/null
+++ b/docs/architecture/kubernetes-resources.md
@@ -0,0 +1,1042 @@
+# Kubernetes Custom Resources
+
+## Overview
+
+The Ambient Code Platform uses Kubernetes Custom Resource Definitions (CRDs) to represent AI automation tasks and configuration. This document details the structure, lifecycle, and relationships of the three primary CRDs.
+
+## Custom Resource Hierarchy
+
+```mermaid
+graph TB
+ subgraph "Namespace: team-alpha"
+ PS[ProjectSettings
settings
API keys, defaults]
+
+ AS1[AgenticSession
session-1
Batch mode]
+ AS2[AgenticSession
session-2
Interactive mode]
+
+ RFE1[RFEWorkflow
rfe-auth-feature
7-step council]
+
+ Job1[Job
session-1-runner]
+ Job2[Job
session-2-runner]
+
+ Pod1[Pod
session-1-runner-xyz]
+ Pod2[Pod
session-2-runner-abc]
+
+ Secret1[Secret
runner-token-session-1]
+ Secret2[Secret
runner-token-session-2]
+
+ PVC1[PVC
workspace-session-1]
+ PVC2[PVC
workspace-session-2]
+ end
+
+ PS -.->|Referenced by| AS1
+ PS -.->|Referenced by| AS2
+ PS -.->|Referenced by| RFE1
+
+ AS1 -->|OwnerReference
controller=true| Job1
+ AS1 -->|OwnerReference
controller=true| Secret1
+
+ AS2 -->|OwnerReference
controller=true| Job2
+ AS2 -->|OwnerReference
controller=true| Secret2
+
+ Job1 -->|OwnerReference
controller=true| Pod1
+ Job2 -->|OwnerReference
controller=true| Pod2
+
+ Pod1 -.->|Mounts| PVC1
+ Pod2 -.->|Mounts| PVC2
+
+ style PS fill:#ffe1e1
+ style AS1 fill:#e1f5ff
+ style AS2 fill:#e1f5ff
+ style RFE1 fill:#fff4e1
+ style Job1 fill:#f0e1ff
+ style Job2 fill:#f0e1ff
+```
+
+**Legend:**
+- Solid arrows (→): OwnerReference (parent → child)
+- Dashed arrows (-.->): Reference or mount (not ownership)
+
+---
+
+## AgenticSession Custom Resource
+
+### Purpose
+
+Represents a single AI-powered automation task executed via Claude Code.
+
+### API Definition
+
+**Group:** `vteam.ambient-code`
+**Version:** `v1alpha1`
+**Kind:** `AgenticSession`
+**Plural:** `agenticsessions`
+**Shortname:** `as`
+
+### Resource Structure
+
+```mermaid
+classDiagram
+ class AgenticSession {
+ +metadata ObjectMeta
+ +spec AgenticSessionSpec
+ +status AgenticSessionStatus
+ }
+
+ class AgenticSessionSpec {
+ +prompt string
+ +repos []RepoConfig
+ +mainRepoIndex int
+ +interactive bool
+ +timeout int
+ +model string
+ +anthropicApiKeySecret string
+ }
+
+ class RepoConfig {
+ +input RepoInput
+ +output RepoOutput
+ }
+
+ class RepoInput {
+ +url string
+ +branch string
+ +authSecret string
+ }
+
+ class RepoOutput {
+ +forkRepo string
+ +targetBranch string
+ +createPR bool
+ }
+
+ class AgenticSessionStatus {
+ +phase string
+ +startTime string
+ +completionTime string
+ +results string
+ +message string
+ +repos []RepoStatus
+ }
+
+ class RepoStatus {
+ +index int
+ +pushed bool
+ +prUrl string
+ +error string
+ }
+
+ AgenticSession --> AgenticSessionSpec
+ AgenticSession --> AgenticSessionStatus
+ AgenticSessionSpec --> RepoConfig
+ RepoConfig --> RepoInput
+ RepoConfig --> RepoOutput
+ AgenticSessionStatus --> RepoStatus
+```
+
+### Spec Fields
+
+#### `spec.prompt` (required)
+
+**Type:** `string`
+
+**Description:** The instruction or task for Claude Code to execute.
+
+**Examples:**
+```yaml
+prompt: "Add unit tests for the authentication module"
+```
+
+```yaml
+prompt: "Refactor the database connection logic to use connection pooling"
+```
+
+---
+
+#### `spec.repos` (required)
+
+**Type:** `[]RepoConfig`
+
+**Description:** Array of Git repositories to operate on. At least one repo required.
+
+**Structure:**
+
+```yaml
+repos:
+ - input:
+ url: "https://github.com/org/backend"
+ branch: "main"
+ authSecret: "git-credentials" # optional
+ output:
+ forkRepo: "https://github.com/user/backend" # optional
+ targetBranch: "feature/auth-refactor" # optional
+ createPR: true # optional
+```
+
+**Fields:**
+
+- **`input.url`** (required): Git repository URL (HTTPS or SSH)
+- **`input.branch`** (required): Branch to clone and work on
+- **`input.authSecret`** (optional): Secret name containing Git credentials
+- **`output.forkRepo`** (optional): Fork repository URL for pushing changes
+- **`output.targetBranch`** (optional): Target branch for PR creation
+- **`output.createPR`** (optional): Whether to create PR after pushing
+
+**Reference:** [ADR-0003: Multi-Repository Support](../adr/0003-multi-repo-support.md)
+
+---
+
+#### `spec.mainRepoIndex` (optional)
+
+**Type:** `int`
+
+**Description:** Index of the repository to use as Claude Code's working directory.
+
+**Default:** `0` (first repository)
+
+**Example:**
+
+```yaml
+repos:
+ - input:
+ url: "https://github.com/org/shared-lib"
+ branch: "main"
+ - input:
+ url: "https://github.com/org/api-service"
+ branch: "develop"
+mainRepoIndex: 1 # Work in api-service repo
+```
+
+---
+
+#### `spec.interactive` (optional)
+
+**Type:** `bool`
+
+**Description:** Enable interactive mode for multi-turn conversations.
+
+**Default:** `false` (batch mode)
+
+**Interactive Mode:**
+- Pod continues running after initial execution
+- User sends messages via inbox file (`/workspace/inbox.txt`)
+- Runner responds via outbox file (`/workspace/outbox.txt`)
+- No timeout enforced
+
+**Example:**
+
+```yaml
+interactive: true
+prompt: "Help me debug the authentication flow"
+```
+
+---
+
+#### `spec.timeout` (optional)
+
+**Type:** `int`
+
+**Description:** Timeout in seconds for batch mode execution.
+
+**Default:** Uses ProjectSettings default or 3600 (1 hour)
+
+**Ignored in interactive mode**
+
+**Example:**
+
+```yaml
+timeout: 7200 # 2 hours
+```
+
+---
+
+#### `spec.model` (optional)
+
+**Type:** `string`
+
+**Description:** Claude model to use for execution.
+
+**Default:** Uses ProjectSettings default or `claude-sonnet-4-5`
+
+**Valid Values:**
+- `claude-opus-4-5`
+- `claude-sonnet-4-5`
+- `claude-haiku-4`
+
+**Example:**
+
+```yaml
+model: "claude-opus-4-5" # Use most capable model
+```
+
+---
+
+#### `spec.anthropicApiKeySecret` (optional)
+
+**Type:** `string`
+
+**Description:** Secret name containing Anthropic API key.
+
+**Default:** Uses ProjectSettings default
+
+**Secret Format:**
+
+```yaml
+apiVersion: v1
+kind: Secret
+metadata:
+ name: anthropic-api-key
+type: Opaque
+stringData:
+ ANTHROPIC_API_KEY: sk-ant-...
+```
+
+---
+
+### Status Fields
+
+#### `status.phase` (set by operator)
+
+**Type:** `string`
+
+**Description:** Current phase of session execution.
+
+**Valid Values:**
+- `Pending` - CR created, waiting for operator to create Job
+- `Running` - Job created, pod executing
+- `Completed` - Execution succeeded
+- `Failed` - Execution failed
+- `Timeout` - Execution exceeded timeout
+
+**Reference:** [Agentic Session Lifecycle](./agentic-session-lifecycle.md)
+
+---
+
+#### `status.startTime` (set by operator)
+
+**Type:** `string` (RFC3339 timestamp)
+
+**Description:** When execution started (Job created).
+
+**Example:** `"2025-12-08T14:30:00Z"`
+
+---
+
+#### `status.completionTime` (set by operator/runner)
+
+**Type:** `string` (RFC3339 timestamp)
+
+**Description:** When execution completed (success, failure, or timeout).
+
+**Example:** `"2025-12-08T15:45:00Z"`
+
+---
+
+#### `status.results` (set by runner)
+
+**Type:** `string`
+
+**Description:** Execution results, logs, or output from Claude Code.
+
+**May contain:**
+- Generated code snippets
+- File paths modified
+- Test results
+- Error messages
+- Partial results (if timeout/failure)
+
+---
+
+#### `status.message` (set by operator/runner)
+
+**Type:** `string`
+
+**Description:** Human-readable status message (especially for errors).
+
+**Examples:**
+- `"Execution completed successfully"`
+- `"Failed to authenticate with Anthropic API"`
+- `"Exceeded timeout of 3600 seconds"`
+- `"Git repository not found"`
+
+---
+
+#### `status.repos` (set by runner)
+
+**Type:** `[]RepoStatus`
+
+**Description:** Per-repository status tracking.
+
+**Structure:**
+
+```yaml
+status:
+ repos:
+ - index: 0
+ pushed: true
+ prUrl: "https://github.com/org/backend/pulls/123"
+ - index: 1
+ pushed: false
+ error: "No changes to push"
+```
+
+**Fields:**
+
+- **`index`**: Corresponds to `spec.repos[index]`
+- **`pushed`**: Whether changes were pushed to remote
+- **`prUrl`**: Pull request URL (if created)
+- **`error`**: Error message (if push/PR creation failed)
+
+---
+
+### Complete Example
+
+```yaml
+apiVersion: vteam.ambient-code/v1alpha1
+kind: AgenticSession
+metadata:
+ name: add-auth-tests
+ namespace: team-alpha
+ labels:
+ project: backend-api
+ type: testing
+spec:
+ prompt: |
+ Add comprehensive unit tests for the authentication module.
+ Ensure coverage of:
+ - Login/logout flows
+ - Token validation
+ - Password reset
+ - Edge cases (expired tokens, invalid credentials)
+
+ repos:
+ - input:
+ url: "https://github.com/org/backend-api"
+ branch: "develop"
+ authSecret: "github-pat"
+ output:
+ forkRepo: "https://github.com/user/backend-api"
+ targetBranch: "feature/auth-tests"
+ createPR: true
+
+ mainRepoIndex: 0
+ interactive: false
+ timeout: 3600
+ model: "claude-sonnet-4-5"
+ anthropicApiKeySecret: "anthropic-api-key"
+
+status:
+ phase: "Completed"
+ startTime: "2025-12-08T14:30:00Z"
+ completionTime: "2025-12-08T14:52:30Z"
+ results: |
+ Successfully added unit tests:
+ - tests/auth/test_login.py (12 tests)
+ - tests/auth/test_token_validation.py (8 tests)
+ - tests/auth/test_password_reset.py (6 tests)
+
+ Coverage increased from 68% to 89% for auth module.
+
+ message: "Execution completed successfully"
+
+ repos:
+ - index: 0
+ pushed: true
+ prUrl: "https://github.com/org/backend-api/pulls/456"
+```
+
+---
+
+## ProjectSettings Custom Resource
+
+### Purpose
+
+Stores project-wide configuration such as default models, API keys, and timeout settings.
+
+### API Definition
+
+**Group:** `vteam.ambient-code`
+**Version:** `v1alpha1`
+**Kind:** `ProjectSettings`
+**Plural:** `projectsettings`
+**Shortname:** `ps`
+
+### Resource Structure
+
+```mermaid
+classDiagram
+ class ProjectSettings {
+ +metadata ObjectMeta
+ +spec ProjectSettingsSpec
+ }
+
+ class ProjectSettingsSpec {
+ +defaultModel string
+ +defaultTimeout int
+ +anthropicApiKeySecret string
+ +gitCredentialsSecret string
+ +enableAutoCleanup bool
+ +retentionDays int
+ }
+
+ ProjectSettings --> ProjectSettingsSpec
+```
+
+### Spec Fields
+
+#### `spec.defaultModel` (optional)
+
+**Type:** `string`
+
+**Description:** Default Claude model for sessions without explicit `model` field.
+
+**Default:** `claude-sonnet-4-5`
+
+**Example:**
+
+```yaml
+defaultModel: "claude-opus-4-5" # Use most capable model by default
+```
+
+---
+
+#### `spec.defaultTimeout` (optional)
+
+**Type:** `int`
+
+**Description:** Default timeout (seconds) for batch mode sessions.
+
+**Default:** `3600` (1 hour)
+
+**Example:**
+
+```yaml
+defaultTimeout: 7200 # 2 hours for complex tasks
+```
+
+---
+
+#### `spec.anthropicApiKeySecret` (optional)
+
+**Type:** `string`
+
+**Description:** Default Secret name for Anthropic API key.
+
+**Sessions without explicit `anthropicApiKeySecret` use this default.**
+
+**Example:**
+
+```yaml
+anthropicApiKeySecret: "anthropic-api-key"
+```
+
+---
+
+#### `spec.gitCredentialsSecret` (optional)
+
+**Type:** `string`
+
+**Description:** Default Secret name for Git authentication.
+
+**Sessions without explicit `authSecret` in repo config use this default.**
+
+**Example:**
+
+```yaml
+gitCredentialsSecret: "github-pat"
+```
+
+---
+
+#### `spec.enableAutoCleanup` (optional)
+
+**Type:** `bool`
+
+**Description:** Enable automatic cleanup of completed sessions.
+
+**Default:** `false`
+
+**Example:**
+
+```yaml
+enableAutoCleanup: true
+retentionDays: 7 # Delete completed sessions after 7 days
+```
+
+---
+
+#### `spec.retentionDays` (optional)
+
+**Type:** `int`
+
+**Description:** Days to retain completed sessions before auto-cleanup.
+
+**Default:** `7`
+
+**Only applies if `enableAutoCleanup: true`**
+
+---
+
+### Complete Example
+
+```yaml
+apiVersion: vteam.ambient-code/v1alpha1
+kind: ProjectSettings
+metadata:
+ name: settings
+ namespace: team-alpha
+spec:
+ defaultModel: "claude-sonnet-4-5"
+ defaultTimeout: 5400 # 90 minutes
+ anthropicApiKeySecret: "anthropic-api-key"
+ gitCredentialsSecret: "github-pat"
+ enableAutoCleanup: true
+ retentionDays: 14
+```
+
+---
+
+## RFEWorkflow Custom Resource
+
+### Purpose
+
+Orchestrates a 7-step agent council process for Request For Enhancement (RFE) refinement.
+
+### API Definition
+
+**Group:** `vteam.ambient-code`
+**Version:** `v1alpha1`
+**Kind:** `RFEWorkflow`
+**Plural:** `rfeworkflows`
+**Shortname:** `rfe`
+
+### Resource Structure
+
+```mermaid
+classDiagram
+ class RFEWorkflow {
+ +metadata ObjectMeta
+ +spec RFEWorkflowSpec
+ +status RFEWorkflowStatus
+ }
+
+ class RFEWorkflowSpec {
+ +request string
+ +context string
+ +repos []RepoConfig
+ +stepTimeout int
+ }
+
+ class RFEWorkflowStatus {
+ +phase string
+ +currentStep int
+ +steps []StepStatus
+ +finalRFE string
+ +startTime string
+ +completionTime string
+ }
+
+ class StepStatus {
+ +stepNumber int
+ +agent string
+ +status string
+ +output string
+ +startTime string
+ +completionTime string
+ }
+
+ RFEWorkflow --> RFEWorkflowSpec
+ RFEWorkflow --> RFEWorkflowStatus
+ RFEWorkflowStatus --> StepStatus
+```
+
+### 7-Step Agent Council
+
+```mermaid
+flowchart LR
+ Request[User Request] --> Step1
+
+ Step1[Step 1:
Product Manager
Requirements clarification] --> Step2
+ Step2[Step 2:
Solution Architect
Technical design] --> Step3
+ Step3[Step 3:
Staff Engineer
Implementation plan] --> Step4
+ Step4[Step 4:
Product Owner
Acceptance criteria] --> Step5
+ Step5[Step 5:
Team Lead
Task breakdown] --> Step6
+ Step6[Step 6:
Team Member
Effort estimation] --> Step7
+ Step7[Step 7:
Delivery Owner
Risk assessment] --> Final
+
+ Final[Final RFE Document]
+
+ style Request fill:#e1f5ff
+ style Final fill:#e1ffe1
+ style Step1 fill:#ffe1e1
+ style Step2 fill:#fff4e1
+ style Step3 fill:#f0e1ff
+ style Step4 fill:#ffe1e1
+ style Step5 fill:#fff4e1
+ style Step6 fill:#f0e1ff
+ style Step7 fill:#ffe1e1
+```
+
+**Agent Roles:**
+
+1. **Product Manager:** Clarifies requirements, defines user stories
+2. **Solution Architect:** Designs technical architecture, identifies dependencies
+3. **Staff Engineer:** Creates implementation plan, reviews code patterns
+4. **Product Owner:** Defines acceptance criteria and success metrics
+5. **Team Lead:** Breaks down into tasks, assigns priorities
+6. **Team Member:** Estimates effort, identifies blockers
+7. **Delivery Owner:** Assesses risks, creates rollback plan
+
+---
+
+### Spec Fields
+
+#### `spec.request` (required)
+
+**Type:** `string`
+
+**Description:** Initial RFE request or feature description.
+
+**Example:**
+
+```yaml
+request: |
+ Add support for OAuth2 authentication in the API.
+ Users should be able to authenticate using Google, GitHub, and Microsoft accounts.
+```
+
+---
+
+#### `spec.context` (optional)
+
+**Type:** `string`
+
+**Description:** Additional context for the council (codebase state, constraints, preferences).
+
+**Example:**
+
+```yaml
+context: |
+ - Existing authentication uses JWT tokens
+ - Frontend is React-based
+ - Backend is Go + Gin framework
+ - Prefer minimal dependencies
+```
+
+---
+
+#### `spec.repos` (required)
+
+**Type:** `[]RepoConfig`
+
+**Description:** Repositories for council to analyze (same structure as AgenticSession).
+
+---
+
+#### `spec.stepTimeout` (optional)
+
+**Type:** `int`
+
+**Description:** Timeout (seconds) per step.
+
+**Default:** `1800` (30 minutes)
+
+---
+
+### Status Fields
+
+#### `status.phase` (set by operator)
+
+**Type:** `string`
+
+**Valid Values:**
+- `Pending` - Workflow created, not started
+- `Running` - Executing steps
+- `Completed` - All steps completed
+- `Failed` - One or more steps failed
+
+---
+
+#### `status.currentStep` (set by operator)
+
+**Type:** `int`
+
+**Description:** Currently executing step (1-7).
+
+---
+
+#### `status.steps` (set by operator/runner)
+
+**Type:** `[]StepStatus`
+
+**Description:** Status for each of the 7 steps.
+
+**Fields:**
+
+- **`stepNumber`**: 1-7
+- **`agent`**: Agent role (e.g., "Product Manager")
+- **`status`**: `Pending`, `Running`, `Completed`, `Failed`
+- **`output`**: Agent's output for this step
+- **`startTime`**: RFC3339 timestamp
+- **`completionTime`**: RFC3339 timestamp
+
+---
+
+#### `status.finalRFE` (set by runner)
+
+**Type:** `string`
+
+**Description:** Final synthesized RFE document combining all agent outputs.
+
+---
+
+### Complete Example
+
+```yaml
+apiVersion: vteam.ambient-code/v1alpha1
+kind: RFEWorkflow
+metadata:
+ name: oauth-authentication
+ namespace: team-alpha
+spec:
+ request: |
+ Add OAuth2 authentication to the API supporting Google, GitHub, and Microsoft.
+
+ context: |
+ - Current auth uses JWT tokens
+ - Backend: Go + Gin
+ - Frontend: React + NextJS
+
+ repos:
+ - input:
+ url: "https://github.com/org/backend-api"
+ branch: "develop"
+ - input:
+ url: "https://github.com/org/frontend"
+ branch: "develop"
+
+ stepTimeout: 1800
+
+status:
+ phase: "Completed"
+ currentStep: 7
+
+ steps:
+ - stepNumber: 1
+ agent: "Product Manager"
+ status: "Completed"
+ output: |
+ Requirements clarified:
+ - Support 3 OAuth providers
+ - Fallback to JWT for API clients
+ - User profile sync on first login
+ startTime: "2025-12-08T10:00:00Z"
+ completionTime: "2025-12-08T10:15:00Z"
+
+ - stepNumber: 2
+ agent: "Solution Architect"
+ status: "Completed"
+ output: |
+ Technical design:
+ - Use golang.org/x/oauth2 library
+ - Add OAuthProvider table (Postgres)
+ - Extend User model with provider_id field
+ - Create /auth/oauth/{provider} endpoints
+ startTime: "2025-12-08T10:15:00Z"
+ completionTime: "2025-12-08T10:35:00Z"
+
+ # ... (steps 3-7)
+
+ finalRFE: |
+ # RFE: OAuth2 Authentication
+
+ ## Overview
+ Add OAuth2 authentication supporting Google, GitHub, and Microsoft.
+
+ ## Requirements
+ - Support 3 OAuth providers
+ - Fallback to JWT for API clients
+ - User profile sync on first login
+
+ ## Technical Design
+ - Use golang.org/x/oauth2 library
+ - Add OAuthProvider table
+ - Extend User model
+ - Create /auth/oauth/{provider} endpoints
+
+ ## Implementation Plan
+ (Detailed steps from Staff Engineer)
+
+ ## Acceptance Criteria
+ (Criteria from Product Owner)
+
+ ## Task Breakdown
+ (Tasks from Team Lead)
+
+ ## Effort Estimation
+ (Estimates from Team Member)
+
+ ## Risk Assessment
+ (Risks and mitigation from Delivery Owner)
+
+ startTime: "2025-12-08T10:00:00Z"
+ completionTime: "2025-12-08T13:45:00Z"
+```
+
+---
+
+## OwnerReferences and Cleanup
+
+### OwnerReference Pattern
+
+**Purpose:** Automatic resource cleanup when parent is deleted.
+
+**Structure:**
+
+```yaml
+apiVersion: vteam.ambient-code/v1alpha1
+kind: AgenticSession
+metadata:
+ name: session-1
+ namespace: team-alpha
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+ name: session-1-runner
+ namespace: team-alpha
+ ownerReferences:
+ - apiVersion: vteam.ambient-code/v1alpha1
+ kind: AgenticSession
+ name: session-1
+ uid: a1b2c3d4-e5f6-7890-abcd-ef1234567890
+ controller: true
+ # blockOwnerDeletion: false (default, do not set to true)
+```
+
+**Key Fields:**
+
+- **`controller: true`**: Only ONE owner can be controller (primary parent)
+- **`blockOwnerDeletion`**: **Omit this field** (causes permission issues in multi-tenant)
+
+**Cleanup Behavior:**
+
+1. User deletes AgenticSession CR
+2. Kubernetes cascades delete to owned resources:
+ - Job (which cascades to Pod)
+ - Secret (runner token)
+ - PVC (workspace, if configured)
+
+**Reference:** [Backend/Operator Standards - OwnerReferences](../../CLAUDE.md#ownerreferences-pattern)
+
+---
+
+### Cleanup Strategies
+
+#### Automatic (OwnerReferences)
+
+**When:** Parent CR deleted
+
+**How:** Kubernetes garbage collector cascades delete
+
+**Pros:**
+- No manual cleanup required
+- Consistent behavior
+- Works even if operator is down
+
+**Cons:**
+- Deletion order not controllable
+- All child resources deleted (no selective retention)
+
+---
+
+#### Manual (Operator Cleanup)
+
+**When:** Session completes successfully
+
+**How:** Operator explicitly deletes Job (Pod cleaned by Job controller)
+
+**Pattern:**
+
+```go
+func cleanupCompletedSession(namespace, jobName string) {
+ policy := v1.DeletePropagationBackground
+
+ err := K8sClient.BatchV1().Jobs(namespace).Delete(
+ context.Background(), jobName, v1.DeleteOptions{
+ PropagationPolicy: &policy,
+ })
+
+ if err != nil && !errors.IsNotFound(err) {
+ log.Printf("Failed to delete job: %v", err)
+ }
+}
+```
+
+**Pros:**
+- Immediate cleanup on completion
+- Selective retention (e.g., keep PVC, delete Job)
+
+**Cons:**
+- Requires operator to be running
+- More complex logic
+
+---
+
+#### Time-Based (TTL)
+
+**When:** ProjectSettings enables `enableAutoCleanup`
+
+**How:** Operator periodically deletes old completed CRs
+
+**Pattern:**
+
+```go
+func cleanupOldSessions(namespace string, retentionDays int) {
+ cutoff := time.Now().AddDate(0, 0, -retentionDays)
+
+ list, _ := DynamicClient.Resource(gvr).Namespace(namespace).List(
+ context.Background(), v1.ListOptions{})
+
+ for _, item := range list.Items {
+ phase, _, _ := unstructured.NestedString(item.Object, "status", "phase")
+ if phase != "Completed" && phase != "Failed" {
+ continue // Only cleanup terminal states
+ }
+
+ completionTime, _, _ := unstructured.NestedString(item.Object, "status", "completionTime")
+ if completionTime == "" {
+ continue
+ }
+
+ t, err := time.Parse(time.RFC3339, completionTime)
+ if err != nil || t.After(cutoff) {
+ continue // Too recent or invalid timestamp
+ }
+
+ // Delete old completed session
+ DynamicClient.Resource(gvr).Namespace(namespace).Delete(
+ context.Background(), item.GetName(), v1.DeleteOptions{})
+
+ log.Printf("Deleted old session %s (completed %s)", item.GetName(), completionTime)
+ }
+}
+```
+
+**Pros:**
+- Automatic space management
+- Configurable retention period
+
+**Cons:**
+- Loses audit trail (consider archiving first)
+- Requires periodic operator execution
+
+---
+
+## Related Documentation
+
+- [Core System Architecture](./core-system-architecture.md) - Component overview
+- [Agentic Session Lifecycle](./agentic-session-lifecycle.md) - Session state machine
+- [Multi-Tenancy Architecture](./multi-tenancy-architecture.md) - Namespace isolation
+- [ADR-0001: Kubernetes-Native Architecture](../adr/0001-kubernetes-native-architecture.md)
+- [ADR-0003: Multi-Repository Support](../adr/0003-multi-repo-support.md)
diff --git a/docs/architecture/multi-tenancy-architecture.md b/docs/architecture/multi-tenancy-architecture.md
new file mode 100644
index 00000000..ebdf8641
--- /dev/null
+++ b/docs/architecture/multi-tenancy-architecture.md
@@ -0,0 +1,756 @@
+# Multi-Tenancy Architecture
+
+## Overview
+
+The Ambient Code Platform implements **namespace-based multi-tenancy** where each project maps to a dedicated Kubernetes namespace. This ensures complete isolation between tenants while leveraging Kubernetes RBAC for fine-grained access control.
+
+## Project-to-Namespace Mapping
+
+```mermaid
+graph TB
+ subgraph "Frontend Layer"
+ UI[User Interface
Project Selection]
+ end
+
+ subgraph "Backend API Layer"
+ API[Backend API
Project Context Validation]
+ MW[Middleware:
ValidateProjectContext]
+ end
+
+ subgraph "Kubernetes Cluster"
+ subgraph "Project: team-alpha"
+ NSA[Namespace: team-alpha]
+ RBA[RoleBinding: team-alpha-users]
+ ASA1[AgenticSession: session-1]
+ ASA2[AgenticSession: session-2]
+ PSA[ProjectSettings: settings]
+ PVC_A[PVC: workspace-session-1]
+ end
+
+ subgraph "Project: team-beta"
+ NSB[Namespace: team-beta]
+ RBB[RoleBinding: team-beta-users]
+ ASB1[AgenticSession: session-1]
+ PSB[ProjectSettings: settings]
+ PVC_B[PVC: workspace-session-1]
+ end
+
+ subgraph "Project: team-gamma"
+ NSC[Namespace: team-gamma]
+ RBC[RoleBinding: team-gamma-users]
+ ASC1[AgenticSession: session-1]
+ PSC[ProjectSettings: settings]
+ end
+ end
+
+ UI -->|GET /api/projects| API
+ API -->|List namespaces
user has access to| NSA
+ API -->|List namespaces
user has access to| NSB
+ API -->|List namespaces
user has access to| NSC
+
+ UI -->|POST /api/projects/team-alpha/agentic-sessions| MW
+ MW -->|Validate RBAC| RBA
+ MW -->|Create CR| ASA1
+
+ style NSA fill:#e1f5ff
+ style NSB fill:#ffe1e1
+ style NSC fill:#e1ffe1
+ style MW fill:#fff4e1
+ style RBA fill:#f0e1ff
+ style RBB fill:#f0e1ff
+ style RBC fill:#f0e1ff
+```
+
+**Key Principles:**
+
+1. **1:1 Mapping:** Each project corresponds to exactly one Kubernetes namespace
+2. **Namespace = Isolation Boundary:** Resources cannot cross namespace boundaries
+3. **Project Name = Namespace Name:** Simplifies mapping and debugging
+4. **RBAC Enforced:** User must have permissions on namespace to access project
+
+---
+
+## User Authentication Flow
+
+```mermaid
+sequenceDiagram
+ actor User
+ participant Browser
+ participant OAuth as OAuth Proxy
(OpenShift OAuth)
+ participant FE as Frontend
+ participant BE as Backend API
+ participant K8s as Kubernetes API
+
+ User->>Browser: Access platform URL
+ Browser->>OAuth: Request (no token)
+
+ Note over OAuth: User not authenticated
+
+ OAuth->>User: Redirect to OpenShift login
+ User->>OAuth: Provide credentials
+ OAuth->>OAuth: Validate credentials
Generate OAuth token
+
+ OAuth->>Browser: Set token in cookie/header
+ Browser->>FE: Load frontend app
(with token)
+
+ Note over FE: Token stored in memory/cookie
+
+ FE->>BE: API request
Authorization: Bearer {token}
+
+ Note over BE: Extract token from header
X-Forwarded-User from OAuth proxy
+
+ BE->>BE: Validate token format
+
+ BE->>K8s: Create K8s client
with user token
+
+ K8s-->>BE: Client configured
+
+ BE->>K8s: Perform operation
(e.g., List AgenticSessions)
+
+ Note over K8s: Kubernetes validates token
Checks RBAC permissions
+
+ alt User has permissions
+ K8s-->>BE: Resources returned
+ BE-->>FE: 200 OK + data
+ else User lacks permissions
+ K8s-->>BE: 403 Forbidden
+ BE-->>FE: 403 Forbidden
+ end
+
+ FE-->>Browser: Display result
+ Browser-->>User: Show UI
+```
+
+**Authentication Components:**
+
+1. **OAuth Proxy:** Intercepts requests, enforces authentication, injects X-Forwarded-User header
+2. **Frontend:** Receives token, includes in all API requests
+3. **Backend:** Extracts token, creates K8s client with user credentials
+4. **Kubernetes API:** Validates token against ServiceAccount/User, enforces RBAC
+
+**Reference:** [ADR-0002: User Token Authentication](../adr/0002-user-token-authentication.md)
+
+---
+
+## RBAC Model
+
+### Role Hierarchy
+
+```mermaid
+graph TB
+ subgraph "Cluster Roles (Platform Admin)"
+ CA[ClusterRole:
cluster-admin]
+ CVR[ClusterRole:
vteam-view-all]
+ end
+
+ subgraph "Namespace Roles (Project Team)"
+ NA[Role:
vteam-admin
(CRUD all resources)]
+ NE[Role:
vteam-editor
(CRUD sessions)]
+ NV[Role:
vteam-viewer
(Read-only)]
+ end
+
+ subgraph "Service Accounts"
+ SAB[ServiceAccount:
backend
(CR writes, token minting)]
+ SAO[ServiceAccount:
operator
(Watch CRs, manage Jobs)]
+ SAR[ServiceAccount:
runner
(Update CR status)]
+ end
+
+ CA -->|Has all permissions| NA
+ CA -->|Has all permissions| NE
+ CA -->|Has all permissions| NV
+
+ CVR -->|Can read| NV
+
+ NA -->|Includes| NE
+ NE -->|Includes| NV
+
+ SAB -->|Bound to| NA
+ SAO -->|Bound to| NA
+ SAR -->|Bound to| NV
+
+ style CA fill:#ffe1e1
+ style CVR fill:#ffe1e1
+ style NA fill:#e1f5ff
+ style NE fill:#fff4e1
+ style NV fill:#e1ffe1
+ style SAB fill:#f0e1ff
+ style SAO fill:#f0e1ff
+ style SAR fill:#f0e1ff
+```
+
+### Permission Matrix
+
+| Resource | vteam-viewer | vteam-editor | vteam-admin | backend SA | operator SA |
+|----------|--------------|--------------|-------------|------------|-------------|
+| **AgenticSession** |
+| list | ✓ | ✓ | ✓ | ✓ | ✓ |
+| get | ✓ | ✓ | ✓ | ✓ | ✓ |
+| watch | - | - | - | - | ✓ |
+| create | - | ✓ | ✓ | ✓ | - |
+| update | - | ✓ | ✓ | ✓ | - |
+| update/status | - | - | - | ✓ | ✓ |
+| delete | - | ✓ | ✓ | ✓ | - |
+| **ProjectSettings** |
+| list | ✓ | ✓ | ✓ | ✓ | ✓ |
+| get | ✓ | ✓ | ✓ | ✓ | ✓ |
+| create | - | - | ✓ | ✓ | - |
+| update | - | - | ✓ | ✓ | - |
+| delete | - | - | ✓ | ✓ | - |
+| **RFEWorkflow** |
+| list | ✓ | ✓ | ✓ | ✓ | ✓ |
+| get | ✓ | ✓ | ✓ | ✓ | ✓ |
+| create | - | ✓ | ✓ | ✓ | - |
+| update | - | ✓ | ✓ | ✓ | - |
+| delete | - | ✓ | ✓ | ✓ | - |
+| **Jobs** |
+| list | ✓ | ✓ | ✓ | - | ✓ |
+| get | ✓ | ✓ | ✓ | - | ✓ |
+| create | - | - | - | - | ✓ |
+| delete | - | - | ✓ | - | ✓ |
+| **Secrets** |
+| list | - | - | ✓ | ✓ | ✓ |
+| get | - | - | ✓ | ✓ | ✓ |
+| create | - | - | - | ✓ | ✓ |
+| delete | - | - | ✓ | ✓ | ✓ |
+
+**Legend:**
+- ✓ = Permission granted
+- \- = Permission denied
+
+---
+
+## Backend API Authorization Pattern
+
+### Middleware Chain
+
+```mermaid
+flowchart LR
+ Req[HTTP Request] --> Recovery[gin.Recovery]
+ Recovery --> Logger[gin.Logger
Token redaction]
+ Logger --> CORS[CORS
middleware]
+ CORS --> Identity[forwardedIdentityMiddleware
Extract X-Forwarded-User]
+ Identity --> Validate[ValidateProjectContext
RBAC check]
+ Validate --> Handler[Route Handler
Business logic]
+
+ style Req fill:#e1f5ff
+ style Validate fill:#fff4e1
+ style Handler fill:#e1ffe1
+```
+
+### User Token Extraction
+
+**Backend Pattern** (`components/backend/handlers/helpers.go`):
+
+```go
+// GetK8sClientsForRequest creates K8s clients using user token from request
+func GetK8sClientsForRequest(c *gin.Context) (*kubernetes.Clientset, dynamic.Interface) {
+ // 1. Extract Authorization header
+ rawAuth := c.GetHeader("Authorization")
+ if rawAuth == "" {
+ log.Printf("Missing Authorization header")
+ return nil, nil
+ }
+
+ // 2. Parse Bearer token
+ parts := strings.SplitN(rawAuth, " ", 2)
+ if len(parts) != 2 || !strings.EqualFold(parts[0], "Bearer") {
+ log.Printf("Invalid Authorization header format")
+ return nil, nil
+ }
+
+ token := strings.TrimSpace(parts[1])
+ if token == "" {
+ log.Printf("Empty token")
+ return nil, nil
+ }
+
+ log.Printf("Creating K8s client with user token (len=%d)", len(token))
+
+ // 3. Create K8s client with user token
+ config := &rest.Config{
+ Host: os.Getenv("KUBERNETES_SERVICE_HOST"),
+ BearerToken: token,
+ TLSClientConfig: rest.TLSClientConfig{
+ Insecure: false,
+ CAFile: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt",
+ },
+ }
+
+ k8sClient, err := kubernetes.NewForConfig(config)
+ if err != nil {
+ log.Printf("Failed to create K8s client: %v", err)
+ return nil, nil
+ }
+
+ dynClient, err := dynamic.NewForConfig(config)
+ if err != nil {
+ log.Printf("Failed to create dynamic client: %v", err)
+ return nil, nil
+ }
+
+ return k8sClient, dynClient
+}
+```
+
+### RBAC Validation Middleware
+
+**Pattern** (`components/backend/handlers/middleware.go`):
+
+```go
+func ValidateProjectContext() gin.HandlerFunc {
+ return func(c *gin.Context) {
+ projectName := c.Param("projectName")
+ if projectName == "" {
+ c.JSON(http.StatusBadRequest, gin.H{"error": "Missing project name"})
+ c.Abort()
+ return
+ }
+
+ // Get user-scoped K8s client
+ reqK8s, _ := GetK8sClientsForRequest(c)
+ if reqK8s == nil {
+ c.JSON(http.StatusUnauthorized, gin.H{"error": "Invalid or missing token"})
+ c.Abort()
+ return
+ }
+
+ // Check if user has access to namespace
+ ssar := &authv1.SelfSubjectAccessReview{
+ Spec: authv1.SelfSubjectAccessReviewSpec{
+ ResourceAttributes: &authv1.ResourceAttributes{
+ Group: "vteam.ambient-code",
+ Resource: "agenticsessions",
+ Verb: "list",
+ Namespace: projectName,
+ },
+ },
+ }
+
+ res, err := reqK8s.AuthorizationV1().SelfSubjectAccessReviews().Create(
+ context.Background(), ssar, v1.CreateOptions{})
+
+ if err != nil || !res.Status.Allowed {
+ c.JSON(http.StatusForbidden, gin.H{
+ "error": fmt.Sprintf("No access to project %s", projectName),
+ })
+ c.Abort()
+ return
+ }
+
+ // Store project in context for handler
+ c.Set("project", projectName)
+ c.Next()
+ }
+}
+```
+
+### Handler Usage
+
+**Example** (`components/backend/handlers/sessions.go`):
+
+```go
+func ListSessions(c *gin.Context) {
+ project := c.GetString("project") // From middleware
+
+ // Get user-scoped K8s clients
+ _, reqDyn := GetK8sClientsForRequest(c)
+ if reqDyn == nil {
+ c.JSON(http.StatusUnauthorized, gin.H{"error": "Invalid token"})
+ return
+ }
+
+ gvr := schema.GroupVersionResource{
+ Group: "vteam.ambient-code",
+ Version: "v1alpha1",
+ Resource: "agenticsessions",
+ }
+
+ // List sessions using user token (RBAC enforced by K8s)
+ list, err := reqDyn.Resource(gvr).Namespace(project).List(
+ context.Background(), v1.ListOptions{})
+
+ if err != nil {
+ log.Printf("Failed to list sessions in project %s: %v", project, err)
+ c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to list sessions"})
+ return
+ }
+
+ c.JSON(http.StatusOK, gin.H{"items": list.Items})
+}
+```
+
+**Key Security Patterns:**
+
+1. **Always use user token** for user-initiated operations
+2. **Never fall back** to service account if user token is invalid
+3. **Validate RBAC** before resource access
+4. **Log securely** - never log token values (use `len(token)`)
+5. **Return 401** for auth failures, **403** for authorization failures
+
+**Reference:** [Backend Development Standards](../../CLAUDE.md#user-scoped-clients-for-api-operations)
+
+---
+
+## Service Account Usage
+
+### Backend Service Account
+
+**Purpose:** Limited elevated operations
+
+**Permissions:**
+- Create/update Custom Resources (after user validation)
+- Create Secrets for runner token minting
+- Read ProjectSettings for configuration
+
+**Usage Pattern:**
+
+```go
+// ONLY use backend service account for:
+// 1. Writing CRs after user token validation
+// 2. Minting runner tokens
+
+func CreateSession(c *gin.Context) {
+ project := c.GetString("project")
+
+ // Step 1: Validate user has permission using USER TOKEN
+ reqK8s, reqDyn := GetK8sClientsForRequest(c)
+ if reqK8s == nil {
+ c.JSON(http.StatusUnauthorized, gin.H{"error": "Invalid token"})
+ return
+ }
+
+ // Validate user can create sessions
+ if !userCanCreateSessions(reqK8s, project) {
+ c.JSON(http.StatusForbidden, gin.H{"error": "No permission to create sessions"})
+ return
+ }
+
+ // Step 2: Create CR using BACKEND SERVICE ACCOUNT
+ // (user token may not have write permissions on status subresource)
+ obj := buildSessionObject(...)
+
+ created, err := DynamicClient.Resource(gvr).Namespace(project).Create(
+ context.Background(), obj, v1.CreateOptions{})
+
+ if err != nil {
+ log.Printf("Failed to create session: %v", err)
+ c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create session"})
+ return
+ }
+
+ // Step 3: Mint token for runner using BACKEND SERVICE ACCOUNT
+ runnerToken, err := mintRunnerToken(project, created.GetName())
+ if err != nil {
+ log.Printf("Failed to mint runner token: %v", err)
+ // Continue - operator can handle missing token
+ }
+
+ c.JSON(http.StatusCreated, gin.H{
+ "name": created.GetName(),
+ "uid": created.GetUID(),
+ })
+}
+```
+
+**Never Use Backend Service Account For:**
+- ❌ List/Get operations on behalf of users
+- ❌ Delete operations initiated by users
+- ❌ Skipping RBAC validation
+- ❌ Accessing resources user doesn't have permission for
+
+---
+
+### Operator Service Account
+
+**Purpose:** Watch and reconcile Custom Resources
+
+**Permissions:**
+- Watch all Custom Resources (cluster-wide or namespace-scoped)
+- Create/delete Jobs
+- Create/delete Secrets
+- Update CR status subresource
+
+**Usage Pattern:**
+
+```go
+// Operator uses its service account for ALL operations
+func WatchAgenticSessions() {
+ gvr := types.GetAgenticSessionResource()
+
+ // Watch using operator's service account
+ watcher, err := config.DynamicClient.Resource(gvr).Watch(
+ context.Background(), v1.ListOptions{})
+
+ if err != nil {
+ log.Printf("Failed to create watcher: %v", err)
+ return
+ }
+
+ for event := range watcher.ResultChan() {
+ obj := event.Object.(*unstructured.Unstructured)
+ handleAgenticSession(obj)
+ }
+}
+```
+
+**Note:** Operator has **cluster-wide permissions** to watch and reconcile resources across all namespaces. This is acceptable because:
+1. Operator is trusted infrastructure component
+2. Operator only automates declarative state (no user input)
+3. Operator does not expose user-facing API
+
+---
+
+### Runner Service Account
+
+**Purpose:** Update CR status from pod
+
+**Permissions:**
+- Update `/status` subresource for parent AgenticSession
+- Read ConfigMaps/Secrets in namespace
+- Limited read access to other CRs (for RFE workflows)
+
+**Token Minting:**
+
+Backend mints a time-limited token for runner:
+
+```go
+func mintRunnerToken(namespace, sessionName string) (string, error) {
+ // Create ServiceAccount for runner
+ sa := &corev1.ServiceAccount{
+ ObjectMeta: v1.ObjectMeta{
+ Name: fmt.Sprintf("runner-%s", sessionName),
+ Namespace: namespace,
+ },
+ }
+
+ _, err := K8sClient.CoreV1().ServiceAccounts(namespace).Create(
+ context.Background(), sa, v1.CreateOptions{})
+
+ if err != nil && !errors.IsAlreadyExists(err) {
+ return "", err
+ }
+
+ // Create token for ServiceAccount
+ treq := &authv1.TokenRequest{
+ Spec: authv1.TokenRequestSpec{
+ ExpirationSeconds: int64Ptr(3600), // 1 hour
+ },
+ }
+
+ token, err := K8sClient.CoreV1().ServiceAccounts(namespace).CreateToken(
+ context.Background(), sa.Name, treq, v1.CreateOptions{})
+
+ if err != nil {
+ return "", err
+ }
+
+ return token.Status.Token, nil
+}
+```
+
+**Usage in Runner:**
+
+```python
+# Runner reads minted token from environment
+token = os.environ.get("RUNNER_TOKEN")
+
+# Use token to update CR status
+requests.patch(
+ f"{k8s_api}/apis/vteam.ambient-code/v1alpha1/namespaces/{namespace}/agenticsessions/{name}/status",
+ headers={"Authorization": f"Bearer {token}"},
+ json={"status": {"results": results}}
+)
+```
+
+---
+
+## Isolation Guarantees
+
+### Namespace Isolation
+
+**What's Isolated:**
+- ✓ Custom Resources (AgenticSession, ProjectSettings, RFEWorkflow)
+- ✓ Jobs and Pods
+- ✓ Secrets and ConfigMaps
+- ✓ PersistentVolumeClaims
+- ✓ NetworkPolicies (if configured)
+
+**What's Shared:**
+- Kubernetes cluster infrastructure (nodes, storage classes)
+- CRDs (cluster-scoped)
+- ClusterRoles and ClusterRoleBindings
+- Platform services (backend, operator)
+
+### RBAC Isolation
+
+**User A (team-alpha):**
+- ✓ Can list/create/delete sessions in `team-alpha` namespace
+- ❌ Cannot list sessions in `team-beta` namespace
+- ❌ Cannot modify ProjectSettings in `team-gamma` namespace
+
+**User B (team-beta):**
+- ✓ Can list sessions in `team-beta` namespace
+- ❌ Cannot access `team-alpha` resources
+- ❌ Cannot create sessions in `team-gamma` namespace
+
+**Enforcement:**
+- Backend validates user token + RBAC before operations
+- Kubernetes API enforces RBAC on every request
+- Operator uses namespace-scoped clients where possible
+
+### Resource Quotas (Optional)
+
+**Per-Namespace Limits:**
+
+```yaml
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+ name: project-quota
+ namespace: team-alpha
+spec:
+ hard:
+ requests.cpu: "10"
+ requests.memory: "20Gi"
+ limits.cpu: "20"
+ limits.memory: "40Gi"
+ pods: "50"
+ persistentvolumeclaims: "10"
+```
+
+**Prevents:**
+- Resource exhaustion by single tenant
+- Noisy neighbor problems
+- Runaway session costs
+
+---
+
+## Security Boundaries
+
+```mermaid
+graph TB
+ subgraph "External"
+ User[User Browser]
+ Git[Git Repositories]
+ end
+
+ subgraph "Platform Boundary"
+ OAuth[OAuth Proxy
Authentication]
+ end
+
+ subgraph "API Boundary"
+ BE[Backend API
RBAC Validation]
+ end
+
+ subgraph "Kubernetes RBAC Boundary"
+ K8s[Kubernetes API
Token + RBAC enforcement]
+ end
+
+ subgraph "Namespace: team-alpha"
+ NSA[Resources for team-alpha]
+ PodA[Runner Pod A]
+ end
+
+ subgraph "Namespace: team-beta"
+ NSB[Resources for team-beta]
+ PodB[Runner Pod B]
+ end
+
+ User -->|HTTPS| OAuth
+ OAuth -->|Token| BE
+ BE -->|User Token| K8s
+
+ K8s -->|RBAC allows| NSA
+ K8s -.->|RBAC denies| NSB
+
+ NSA -->|Contains| PodA
+ NSB -->|Contains| PodB
+
+ PodA -.->|Cannot access| NSB
+ PodB -.->|Cannot access| NSA
+
+ PodA -->|Can clone| Git
+ PodB -->|Can clone| Git
+
+ style OAuth fill:#ffe1e1
+ style BE fill:#fff4e1
+ style K8s fill:#f0e1ff
+ style NSA fill:#e1f5ff
+ style NSB fill:#ffe1e1
+```
+
+**Security Layers:**
+
+1. **OAuth Proxy:** Ensures user is authenticated
+2. **Backend API:** Validates user token + RBAC permissions
+3. **Kubernetes API:** Enforces RBAC on every resource access
+4. **Namespace Isolation:** Resources cannot cross boundaries
+5. **NetworkPolicies (optional):** Restrict pod-to-pod communication
+
+---
+
+## Project Lifecycle
+
+### Project Creation
+
+```mermaid
+sequenceDiagram
+ actor Admin
+ participant UI as Frontend
+ participant API as Backend API
+ participant K8s as Kubernetes
+
+ Admin->>UI: Create new project "team-delta"
+ UI->>API: POST /api/projects
{"name": "team-delta"}
+
+ API->>K8s: Create Namespace
name: team-delta
+
+ K8s-->>API: Namespace created
+
+ API->>K8s: Create RoleBinding
vteam-admin → admin user
+
+ API->>K8s: Create ProjectSettings CR
(default configuration)
+
+ K8s-->>API: Resources created
+
+ API-->>UI: 201 Created
+ UI-->>Admin: Project ready
+```
+
+### Project Deletion
+
+```mermaid
+sequenceDiagram
+ actor Admin
+ participant UI as Frontend
+ participant API as Backend API
+ participant K8s as Kubernetes
+
+ Admin->>UI: Delete project "team-delta"
+ UI->>API: DELETE /api/projects/team-delta
+
+ API->>K8s: Delete Namespace
team-delta
+
+ Note over K8s: Cascade delete ALL resources:
- AgenticSessions
- Jobs/Pods
- Secrets
- PVCs
- ProjectSettings
+
+ K8s-->>API: Namespace deleted
+
+ API-->>UI: 204 No Content
+ UI-->>Admin: Project deleted
+```
+
+**Cleanup:**
+- Kubernetes automatically deletes all resources in namespace
+- No manual cleanup required
+- PVCs deleted (data loss - consider backups)
+
+---
+
+## Related Documentation
+
+- [Core System Architecture](./core-system-architecture.md) - Component overview
+- [Agentic Session Lifecycle](./agentic-session-lifecycle.md) - Session execution flow
+- [Backend Development Standards](../../CLAUDE.md#backend-and-operator-development-standards)
+- [ADR-0001: Kubernetes-Native Architecture](../adr/0001-kubernetes-native-architecture.md)
+- [ADR-0002: User Token Authentication](../adr/0002-user-token-authentication.md)
+- [Security Standards Context](./.claude/context/security-standards.md)
diff --git a/docs/index.md b/docs/index.md
index 6a6a8db5..de9813c9 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -17,6 +17,8 @@ The platform follows a cloud-native microservices architecture:
- Custom Resource Definitions (AgenticSession, ProjectSettings, RFEWorkflow)
- Operator-based reconciliation for declarative session management
+📐 **[Architecture Diagrams](architecture/index.md)** - Visual guides to system design, component interactions, and data flows
+
## Quick Start
### Local Development
@@ -64,6 +66,13 @@ For production OpenShift clusters:
## Documentation Structure
+### [📐 Architecture](architecture/index.md)
+Visual guides and detailed explanations of the platform's design:
+- [Core System Architecture](architecture/core-system-architecture.md) - 4-component system overview
+- [Agentic Session Lifecycle](architecture/agentic-session-lifecycle.md) - State machine and reconciliation
+- [Multi-Tenancy Architecture](architecture/multi-tenancy-architecture.md) - Project isolation and RBAC
+- [Kubernetes Resources](architecture/kubernetes-resources.md) - CRD structures and schemas
+
### [📘 User Guide](user-guide/index.md)
Learn how to use the Ambient Code Platform for AI-powered automation:
- [Getting Started](user-guide/getting-started.md) - Installation and first session
diff --git a/mkdocs.yml b/mkdocs.yml
index 0d80bbea..c64a3116 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -41,6 +41,12 @@ theme:
nav:
- Home: index.md
+ - Architecture:
+ - Overview: architecture/index.md
+ - Core System Architecture: architecture/core-system-architecture.md
+ - Agentic Session Lifecycle: architecture/agentic-session-lifecycle.md
+ - Multi-Tenancy Architecture: architecture/multi-tenancy-architecture.md
+ - Kubernetes Resources: architecture/kubernetes-resources.md
- User Guide:
- Overview: user-guide/index.md
- Getting Started: user-guide/getting-started.md