A production-grade, multi-tenant benchmarking platform for evaluating AI agents across multiple providers (OpenAI, Anthropic, OpenRouter, NVIDIA, MCP, and OpenAI-compatible APIs).
# Development: Build and start all services
docker-compose up --build
# With frontend hot-reload (Vite dev server)
docker-compose --profile dev up --build
# Production: Database + Go API only (frontend typically deployed separately)
docker-compose -f docker-compose.prod.yml up -dDevelopment (docker-compose.yml) starts:
- PostgreSQL (internal, no exposed port)
- Go API on port
8080 - Frontend on port
3010(production build) orfrontend-devwith hot-reload when using--profile dev
Production (docker-compose.prod.yml) starts:
- PostgreSQL (internal)
- Go API (behind reverse proxy)
# Check Go API health
curl http://localhost:8080/healthThe platform includes an automated migration runner. Place SQL migration files in server_go/migrations/ (naming convention: XXX_description.sql). They are automatically applied on server startup.
- Initial Schema:
server_go/migrations/001_initial_schema.sqlcontains the baseline database structure.
The project supports two main environments:
-
Development (
docker-compose.yml):- Hot-reloading for Frontend (Vite)
- Debug ports exposed
- Local volume mounts
-
Production (
docker-compose.prod.yml):- Optimized production builds (Nginx serving static files)
- Secure proxy configuration
- Minimized container images
Use the included reset.sh script for environment management:
# Default: Resets Database only (Fast)
./reset.sh
# Soft Reset: Rebuilds containers, preserves DB data
./reset.sh --soft-reset
# Hard Reset: Wipes DB volume, rebuilds everything (Fresh Start)
./reset.sh --hard-reset
# Deploy to Production
./reset.sh --prodTo protect dev/prod proxy access behind an extra password gate:
# 1) Generate/update credentials + protected hosts (local only, not committed)
./scripts/set-basic-auth.sh <username> <password> <domain[,domain2,...]>
# 2) Deploy production
./reset.sh --prodNotes:
- Credentials are stored in
ops/nginx/.htpasswd(gitignored). - Protected hosts are stored in
ops/nginx/.basic-auth-hosts.map(gitignored). - Both proxies (
ops/nginx/nginx.confandops/nginx/nginx.prod.conf) enforce HTTP Basic Auth only for hosts listed in that local map. - Examples without real secrets/domains:
ops/nginx/.htpasswd.exampleandops/nginx/.basic-auth-hosts.map.example. - Rollback: remove the auth directives from
ops/nginx/nginx.prod.confand redeploy./reset.sh --prod.
This platform uses a WebSocket-first architecture. All real-time operations (agents, question sets, runs, evaluations, stats) are handled via WebSocket messages.
Only essential auth endpoints use REST:
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| POST | /auth/register |
Legacy registration (Dev only) |
| POST | /auth/login |
Legacy login (Dev only) |
| POST | /auth/bootstrap-admin |
Create initial admin |
| GET | /auth/check-admin |
Check if admin exists |
| GET | /auth/me |
Get current user (protected) |
| POST | /auth/refresh |
Refresh JWT token (protected) |
| POST | /auth/logout |
Logout (protected) |
| POST | /auth/join-organization |
Join org via invite (protected) |
| POST | /auth/select-organization |
Switch organization (protected) |
| Endpoint | Description |
|---|---|
GET /ws?token=<jwt>&workspace_id=<uuid> |
Main WebSocket connection |
All messages use a standard envelope: { "type": "REQ_*", "correlation_id": "...", "payload": {...} }. For a complete reference of every message type (REQ_, CMD_, DATA_, EVT_), payloads, and responses, see docs/websocket-messages.md.
| Provider | Required Config Keys | Notes |
|---|---|---|
mcp |
endpoint, token |
Model Context Protocol (HTTP) |
openai |
api_key |
Managed (prompt_id) or standard (model) |
openai_compatible |
api_key, base_url |
Any OpenAI-compatible API |
openrouter |
api_key |
Optional: model, base_url, system_prompt |
nvidia |
api_key |
NVIDIA NIM; optional model, base_url |
anthropic |
api_key |
Claude; optional model, base_url |
evaluator |
Resolves to one of above | Auto-extracts scores from responses |
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
— | PostgreSQL connection string |
JWT_SECRET |
— | JWT signing secret (min 32 chars) |
ENCRYPTION_KEY |
— | AES key for encrypted agent configs. Preferred: raw 32 chars. Compatibility: raw 16/24/32 chars or hex 32/48/64 chars |
ENCRYPTION_KEY_PREVIOUS |
— | Previous AES key kept temporarily during rotation so existing encrypted configs can still be read and re-encrypted |
ENCRYPTION_KEY_ROTATE_ON_START |
false |
When true, the backend re-encrypts supported encrypted columns from ENCRYPTION_KEY_PREVIOUS to ENCRYPTION_KEY during startup |
PORT |
8080 |
API port |
APP_ENV |
development |
development or production (disables dev features) |
FIREBASE_SERVICE_ACCOUNT |
— | Path to Firebase Service Account JSON |
ALLOWED_ORIGINS |
— | Comma-separated CORS origins (production) |
VITE_AFK_TIMEOUT_MS |
600000 |
Frontend idle timeout (ms) before WebSocket disconnect (min: 60000; tripled during active runs) |
VITE_HMR_HOST, VITE_HMR_CLIENT_PORT, VITE_HMR_PROTOCOL |
— | Optional HMR config for dev behind proxy |
The application currently encrypts only these database fields:
agents.configquestion_set_agents.config
Other user-facing records such as user names, emails, login logs, run answers, evaluations, and question set data are not protected by ENCRYPTION_KEY.
- The app accepts
ENCRYPTION_KEYas raw AES key material (16,24, or32chars) or as hex (32,48, or64chars). - When
ENCRYPTION_KEY_PREVIOUSis configured, decrypt reads try the active key first and then the previous key. - New writes always use
ENCRYPTION_KEY. - When
ENCRYPTION_KEY_ROTATE_ON_START=true, startup attempts an in-place re-encryption of:agents.configquestion_set_agents.config
- The startup rotator uses a PostgreSQL advisory lock so only one instance performs the rewrite during a rollout.
- On startup, the backend stores a non-reversible fingerprint of the active key plus a sentinel ciphertext in
encryption_key_states. - The Admin Debug view shows:
- current key status and detected format
- current fingerprint prefix
- stored fingerprint prefix
- whether the current key matches the persisted state
- whether sentinel verification succeeded
This allows the system to detect future key changes or read/decrypt incompatibilities.
Use this procedure when encrypted configs must be preserved and you want the deploy itself to perform the migration.
- Confirm the current deployment is healthy in Admin Debug:
- key status is
loaded - key state status is
match - no unexpected decrypt failures in
agents.configorquestion_set_agents.config
- key status is
- Deploy the new revision with:
ENCRYPTION_KEY= new keyENCRYPTION_KEY_PREVIOUS= old keyENCRYPTION_KEY_ROTATE_ON_START=true
- Let startup perform the migration:
- reads
agents.configandquestion_set_agents.config - decrypts each value with the old key when needed
- re-encrypts each value with the new key
- updates the persisted key fingerprint/sentinel state to the new active key
- reads
- Verify again in Admin Debug:
- key state status is
match - sentinel verification succeeds
- encrypted config decrypt failures remain at zero (or expected baseline)
- key state status is
- Once the rollout is confirmed healthy, remove
ENCRYPTION_KEY_PREVIOUSand setENCRYPTION_KEY_ROTATE_ON_START=falsein the next deploy.
A live rotation is only possible if the rotation process has access to both the old key and the new key at the same time. Without both keys, existing encrypted configs cannot be re-encrypted safely.
The current implementation supports deploy-time rotation with one active key plus one previous key. It does not yet provide:
- ciphertext-level
key_idmetadata - support for more than two simultaneous keys
- a long-running background rotator with progressive batches
The intended path today is: deploy with current + previous keys, let startup rotate in place, verify, then remove the previous key.
If encrypted agent configs do not need to be preserved, you can reset from the current point forward:
- Backup the database if the data matters.
- Replace or clear the affected encrypted fields:
agents.configquestion_set_agents.config
- Set the desired
ENCRYPTION_KEY. - Reconfigure affected agents manually.
This is destructive for encrypted config data, but it is the simplest recovery path when test data is disposable.
# Backend Tests
cd server_go
go test ./... -v
# Backend Lint + Vet + Tests (matches CI gate)
cd server_go
make check # runs: go vet, golangci-lint, go test
# Frontend Tests
cd frontend
npm run testThe backend lint configuration lives in server_go/.golangci.yml.
make lint auto-installs the expected golangci-lint version into
$GOPATH/bin on first run.
# Terminal 1: Start Postgres
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=benchmarking postgres:15
# Terminal 2: Start Go API
cd server_go
export DATABASE_URL="host=localhost user=postgres password=postgres dbname=benchmarking port=5432 sslmode=disable"
export FIREBASE_SERVICE_ACCOUNT="./firebase-service-account.json"
go run .Place firebase-service-account.json in server_go/ before running. Without it, Firebase-based login will fail.
flowchart TB
subgraph Client [Client]
Browser[Browser / Vue]
end
subgraph Proxy [Reverse Proxy]
Nginx[Nginx]
end
subgraph Backend [Backend]
GoAPI[Go API + WebSocket]
end
subgraph Data [Data]
Postgres[(PostgreSQL)]
end
subgraph Auth [Auth]
Firebase[Firebase Auth]
WebAuthn[WebAuthn / Passkeys]
end
subgraph Agents [Agent Providers]
MCP[MCP Servers]
OpenAI[OpenAI API]
Anthropic[Anthropic]
OpenRouter[OpenRouter]
Nvidia[NVIDIA NIM]
end
Browser --> Nginx
Nginx --> GoAPI
GoAPI --> Postgres
GoAPI --> Firebase
GoAPI --> WebAuthn
GoAPI --> MCP
GoAPI --> OpenAI
GoAPI --> Anthropic
GoAPI --> OpenRouter
GoAPI --> Nvidia
High-level flow: Browser connects via Nginx (proxy). Go API handles REST + WebSocket, persists to PostgreSQL, authenticates via Firebase/WebAuthn, and executes benchmark tasks by calling external agent providers (MCP, OpenAI, Anthropic, etc.).
- docs/websocket-messages.md — Complete reference of all WebSocket envelope messages (REQ_, CMD_, DATA_, EVT_)
- docs/websocket-api.md — WebSocket API guide (connection, handshake, examples)
- docs/db_schema.md — Database schema diagram
Licensed under the Apache License 2.0. See LICENSE for details.