Tracks build progress against the phases in LEGATE_AGENT_BUILD_PLAN.md (Section 12).
| Phase | Title | Status |
|---|---|---|
| 0 | Foundation and scaffolding | ✅ Complete |
| 1 | Auth, workspaces, RBAC | ✅ Complete |
| 2 | LLM layer and Agent Core | ✅ Complete |
| 3 | RAG knowledge layer | ✅ Complete |
| 4 | Connector framework and credential vault | ✅ Complete |
| 5 | Policy Engine and approvals | ✅ Complete |
| 6 | Workflow engine and triggers | ✅ Complete |
| 7 | Dashboard UX | ✅ Complete |
| 8 | Enterprise hardening | ✅ Complete |
Done
- Monorepo layout per Section 10 (
apps/api,apps/web,packages/shared-types,docs/). docker-compose.ymlwith postgres, redis, qdrant, api, worker, web (+ prod overlay with Caddy TLS).- FastAPI app skeleton with
GET /healthreturning 200. - Pydantic-settings config loader reading every variable in Section 11.
- Tooling: ruff, black, mypy (backend); eslint/prettier config (frontend); pytest.
- GitHub Actions CI: lint + format check + types + tests + CodeQL.
.env.example,README.md,PROGRESS.md, MITLICENSE,Makefile.
Acceptance: docker compose up brings up all services; GET /health returns 200; CI runs lint + tests.
Done
- Models + Alembic migration:
User,Workspace,Membership,ApiKey,AuditLog(UUID PKs, timestamps, tenantworkspace_id). - Register / login / refresh with JWT access + refresh tokens (short access TTL, rotation on refresh).
- Argon2 password hashing.
- Workspace create + list + switch (active workspace carried in the access token).
- RBAC: Owner / Admin / Builder / Operator / Viewer with the exact permission matrix from Section 7.3, enforced by a
require(permission)dependency. - API keys: create (plaintext shown once), list, revoke; stored as SHA-256 hashes with scopes;
X-API-Keyauth with scope checks. - Hash-chained, append-only audit log; every privileged action writes an entry.
- Tenant filtering at the repository layer — a principal can only read/write within its active workspace.
Acceptance: a user can register, create a workspace, invite a member with a role, and role checks block unauthorized calls; API key auth works; every privileged action writes an audit entry; tests cover the RBAC matrix, tenant isolation, and the audit hash chain.
Decisions / TODOs
# TODO(decision): invited users who do not yet exist are created as inactive, password-less accounts; the accept-invite + set-password flow is deferred to the dashboard phase.# TODO(decision): API-key management (create/list/revoke) is gated under themembers:managepermission (Owner/Admin) until a dedicated admin capability is introduced.- Audit
seqis assigned per-workspace in the application layer; a DB-level monotonic guard is a follow-up for high-concurrency writes.
Done
- Provider-agnostic async LLM client (
legate/llm): OpenRouter + Ollama providers, a config-driven factory, and a scriptedFakeLLMClientfor tests. Agent,Run,Stepmodels + Alembic migration (RunStatus/StepTypeenums).- Agent CRUD API with tool-allow-list validation, plus
GET /toolsdiscovery. - Plan-act-observe loop (
legate/agent/runner.py): structured{thought, action, args}output, JSON-schema arg validation, Policy Engine gate, tool execution, and observations fed back as untrusted data. - Guardrails/budgets: max steps, max tokens, and max wall-clock — per-agent overrides falling back to workspace settings; malformed output is retried (bounded) then fails safe.
- Short-term memory: transcript token budgeting/trimming (system messages always retained).
- Two built-in tools:
kb.search(stub until Phase 3) and read-onlyhttp.request. - Live run streaming over WebSocket (
/runs/{id}/stream) via an in-process event bus, with DB replay for late subscribers; background vs inline execution. - Minimal Policy Engine (
legate/policy/engine.py) with the finalallow/deny/require_approvalvocabulary — default-deny + sensitive-tool suspend now, full rules in Phase 5.
Acceptance: create an agent, run it, and watch steps stream in over WebSocket; the agent calls a tool, gets an observation, and returns a final answer; step/token/wall-clock budgets are enforced; malformed LLM output is retried then fails safe. Tests mock the LLM and assert the loop, budgets, retries, RBAC, tenant isolation, and streaming.
Decisions / TODOs
# TODO(decision): sensitive tools currently suspend the run aswaiting_approvalwith no Approval record; the inbox/resume flow lands in Phase 5.# TODO(decision):kb.searchreturns an empty result set until the RAG layer (Phase 3) backs it with Qdrant.- Background runs use
asyncio.create_task(in-process); moving execution onto Celery workers is a later hardening step.
Done
KnowledgeBase,Document,Chunkmodels (+DocumentStatusenum) and Alembic migration.- Swappable embeddings (
legate/rag/embeddings):OpenAIEmbedder+ an offline, deterministicHashingEmbedder, behind a factory. - Swappable vector store (
legate/rag/vectorstore):QdrantVectorStore+ in-processInMemoryVectorStore; lazy collection creation, per-workspace+KB isolation. - Ingestion pipeline (
legate/rag/ingestion.py): parse (TXT/MD native, PDF via pypdf, DOCX via python-docx) → overlapping chunking → embed → upsert, withpending → processing → ready | failedstatus tracking; idempotent, retryable re-ingestion. - Local object storage for uploads (
legate/rag/storage.py); Celery ingestion task + inline mode (KB_INGEST_INLINE). - Retrieval with citations (
legate/rag/retrieval.py); thekb.searchtool now performs real per-workspace retrieval (degrading to empty when unconfigured). - Knowledge API: KB CRUD, document upload/list/get/delete/reingest, and
POST /knowledge-bases/{id}/search.
Acceptance: upload a document, watch it ingest, and have an agent answer from it with citations; deleting a document removes its vectors; ingestion failures are visible (status=failed, error) and retryable via /documents/{id}/reingest. Verified by unit tests, API tests, an end-to-end agent-RAG test, and a 21/21 live-server smoke run.
Decisions / TODOs
# TODO(decision):kb.searchsearches all of a workspace's knowledge bases; a per-agent KB scoping option can come with the agent builder UI.- S3 storage backend and embedding via Ollama are deferred (local storage + OpenAI/hashing now).
- Document→Chunk cleanup relies on Postgres
ON DELETE CASCADEin production; the API also clears chunks/vectors explicitly on delete.
Done
- Connector interface + registry (
legate/connectors):Connectorprotocol withvalidate_config,list_tools,test_connection,execute; a self-registering catalog so adding a connector needs no core changes. - First real connectors: HTTP (
http.call, authenticated/any-method, sensitive), Slack (slack.post_message/list_channels/reply_thread), Email (email.send(sensitive) /email.draft/email.search). Connector+Credentialmodels and Alembic migration.- Envelope-encrypted credential vault (
legate/vault): per-record DEK wrapped by a versioned master key (LEGATE_MASTER_KEY);rewrapsupports key rotation. Plaintext only in memory; never logged or returned. - Connector API: install / list / get / update / delete,
POST /connectors/{id}/test,GET /connectors/{id}/tools, andGET /connector-types. Secrets are write-only. - Tool resolution (
legate/connectors/resolution.py): each run's tool registry = built-ins + the workspace's installed-connector tools; a connector tool decrypts its credential only when invoked. - Connector tools flow through the existing policy gate and JSON-schema arg validation; httpx/SMTP/IMAP behind injection seams for offline tests.
Acceptance: install a Slack connector with credentials, test the connection, and have an agent post a message; secrets are encrypted at rest and never returned by the API; adding a new connector requires no core changes. Verified by vault tests, connector unit tests, connector API tests (incl. a secret-leakage check), an end-to-end agent-posts-to-Slack test, and a 24/24 live-server smoke run.
Decisions / TODOs
# TODO(decision):slack.post_messageis non-sensitive (so the agent can post in Phase 4);http.callandemail.sendare sensitive and will route through the approval inbox once Phase 5 lands.- One credential per connector (1:1); multi-account per type and per-connector tool disambiguation are future work.
- Connector catalog since expanded well beyond the first three — now ten built-in types: GitHub (
github.*), Notion (notion.*), an outgoing Webhook (webhook.send, HMAC-signed), PostgreSQL and MySQL (sharedBaseSqlConnector:*.queryread-only-guarded /*.executesensitive), Stripe (stripe.get_customer/list_charges/create_refundsensitive), and Google Workspace (gsheets.*+gdrive.*over one OAuth2 credential) are all implemented and tested. Multi-account-per-type and Drive binary upload remain future work. - MCP client support (
legate/mcp): Legate connects to remote MCP servers over Streamable HTTP as anmcpconnector type. Tools are discovered on install/test and cached on the connector, then registered per-run asmcp.<id>.<tool>(sensitive unless the server marks a tool read-only). Surfaced automatically in the agent allow-list and workflow tool picker. Verified end-to-end against a live MCP server. stdio transport and exposing Legate as an MCP server are future work.
Done
- Full Policy Engine (
legate/policy/engine.py):allow/deny/require_approval, with per-workspace approval rules (auto_approve/denylists) read fromworkspace.settings_json. Approvalmodel (+ApprovalStatusenum) and Alembic migration.- Suspend/resume in the runner: on
require_approvalit snapshots the transcript into anApproval, sets the run towaiting_approval, and stops;AgentRunner.resumerebuilds the transcript and continues — executing the action on approve, or feeding a rejection observation back to replan on reject. - Approvals inbox API:
GET /approvals,GET /approvals/{id},POST /approvals/{id}/approve|reject. Deciding requiresapprovals:decide; resume dispatches inline (tests) or in the background (prod). - Expiry (
APPROVAL_TTL_HOURS) with auto-expire on read/decide; optional per-approval role gate;409on expired or already-decided approvals. - Every policy decision audited (
policy.deny/policy.require_approval/policy.auto_approve) plusapproval.approve/approval.reject.
Acceptance: an agent attempting a sensitive action is paused; an authorized approver approves it from the inbox and the action then executes; rejection resumes the run so the agent replans; an unauthorized user cannot approve; all decisions are audited. Verified by policy-engine unit tests and a full approval-flow suite (suspend / approve+execute / reject+replan / RBAC / expiry / double-decide / auto-approve / tenant isolation / audit), plus a 28/28 live-server smoke run.
Decisions / TODOs
- Reject resumes the run so the agent can replan (rather than hard-stopping); a "reject-and-cancel" option can be added later.
- Approval expiry is evaluated lazily (on read/decide); a Celery-beat sweep for proactive expiry is a later addition.
required_roleis stored and enforced when set; the UI to set it arrives with the dashboard.
Done
Workflowmodel (+runs.workflow_id/runs.state_json) and Alembic migration.- Graph executor (
legate/workflow/executor.py): topological execution with edge/branch following, oneStepper node, so runs are fully reconstructable. - Node executors:
trigger,agent(nested agent run),tool(through the policy engine),condition,transform,approval,delay(subworkflowreserved). - Sandboxed
{{ }}templating (legate/workflow/templating.py) with a hand-written AST evaluator for conditions — no code execution. - Triggers: manual (
POST /workflows/{id}/run), webhook (HMAC-signedPOST /hooks/{token}, invalid → 401), schedule (cron viacroniter+ a Celery-beat task), and internal events (emit_event). - Pause/resume:
approvalnodes and sensitivetoolnodes suspend the run and resume through the existing approvals inbox (state inrun.state_json). - Per-node retry with backoff.
- Workflow API: CRUD, run, and the signed webhook hook; managing requires
build:write, running requiresruns:execute.
Acceptance: build a workflow (webhook → agent → condition → tool) and fire the webhook to execute it end to end including an approval gate; a scheduled workflow runs on cron; invalid webhook signatures are rejected; runs are fully reconstructable from Steps. Verified by templating unit tests and a workflow suite (manual/condition/webhook/agent-node/approval-gate/reject/slack-tool/retry/schedule/event/RBAC/tenancy), plus a 34/34 live-server smoke run.
Decisions / TODOs
# TODO(decision): trigger config lives on theWorkflow(trigger_type+trigger_config+webhook_*) rather than a separateTriggertable.delaynodes continue immediately in inline execution; a scheduled delayed-resume is a later addition.subworkflownodes are deferred.- Node join semantics are OR (a node runs if any incoming edge is taken); richer AND/wait-all joins can come later.
Done
- Premium Next.js 14 dashboard (
apps/web) with an original "Ink & Amber control-deck" design system — distinctive fonts (Bricolage Grotesque / Hanken Grotesk / JetBrains Mono), a single authoritative amber accent, monospace precision for IDs/hashes, and a subtle layered atmosphere. - Auth screens (login/register) with a two-column brand aside; client auth context with JWT storage + silent refresh + workspace switching.
- App shell: sidebar nav (Overview/Build/Operate/Govern), workspace switcher, role-aware controls, mobile drawer.
- Agent builder (prompt, model, tool allow-list with sensitive badges, guardrail budgets) + list + run.
- Workflow builder (JSON graph editor + trigger picker; webhook credentials shown once) + list + run.
- Run viewer with live WebSocket streaming — a step-by-step "mission log" timeline with per-step input/output.
- Approval inbox (approve/reject with reason, tabs, live refresh).
- Connector setup wizard (schema-driven install for any connector type) + test connection.
- Knowledge base manager (create, upload, per-KB document status, search with citations).
- Audit explorer (hash-chained, tamper-evident) and Admin (members + roles, API keys shown once).
- Backend: CORS middleware for the dashboard origin (
CORS_ORIGINS). - Playwright e2e covering the full acceptance path (
apps/web/tests/e2e/core-path.spec.ts).
Acceptance: a non-technical user can register, create an agent, connect Slack, build and run a workflow, approve an action, and read the audit trail — all from the UI, no API calls. Verified by a green Playwright e2e run against the live stack, plus manual screenshots of every core screen.
Decisions / TODOs
- The workflow builder ships as a JSON/form editor (per the plan's de-risking note); a visual graph editor is a follow-up.
- Data fetching uses TanStack Query; the run viewer streams over the existing WebSocket.
Done
- Billing & plans (
legate/billing):free/pro/enterpriseplans with limits + features; enforcement on agents/workflows/members/run-quota (402 over limit); usage metering (UsageRecord+ migration);GET /billing,GET /usage; Stripe checkout + webhook that updates a workspace's plan. Gated byBILLING_ENABLED. - SSO via OIDC (
legate/security/oidc.py,legate/api/sso.py): authorization-code login + callback with just-in-time user provisioning; swappable client so tests run against a simulated IdP. SAML is a documented extension point. - Rate limiting (
legate/security/ratelimit.py): per-workspace run limiter (429), gated byRATE_LIMIT_ENABLED; plan run quotas layer on top. - Observability: Prometheus
/metrics+ request middleware + run counters; OpenTelemetry tracing (config-gated, optional[tracing]extra); structured JSON logging in production. - CORS for the dashboard origin.
- Infra: TLS
deploy/Caddyfile; a full kustomize k8s manifest set (deploy/k8s/*— configmap/secret template, Postgres/Redis/Qdrant, api/worker/beat/web, ingress, migrate init-container); hardeneddocker-compose.prod.ymloverlay. - Docs:
docs/backup.md(backup + restore, incl. the master-key caveat),docs/operations.md. - CI security pass: CodeQL (existing) +
pip-audit+ gitleaks secret scan.
Acceptance: plans gate features and meter usage (402 + /usage); SSO login works against a (test) IdP; metrics and traces are exportable (/metrics, OTLP); a documented restore procedure brings the system back (incl. verifying the master key still unwraps credentials); CI runs CodeQL + dependency + secret scans. Verified by billing/SSO/rate-limit/metrics tests.
Decisions / TODOs
- OIDC is implemented; SAML is documented as an extension behind the same provisioning seam.
- The rate limiter is in-process (fine for one replica); a Redis-backed limiter drops in behind
enforce_run_ratefor multi-replica. - k8s manifests ship as kustomize; a packaged Helm chart is a straightforward follow-up.
Phases 0–8 are done: the agent runtime, RAG, connectors + vault, policy + approvals, workflows, the dashboard, and enterprise hardening. See each phase section above for details.