Skip to content

Latest commit

 

History

History
192 lines (151 loc) · 18.1 KB

File metadata and controls

192 lines (151 loc) · 18.1 KB

Progress

Tracks build progress against the phases in LEGATE_AGENT_BUILD_PLAN.md (Section 12).

Status summary

Phase Title Status
0 Foundation and scaffolding ✅ Complete
1 Auth, workspaces, RBAC ✅ Complete
2 LLM layer and Agent Core ✅ Complete
3 RAG knowledge layer ✅ Complete
4 Connector framework and credential vault ✅ Complete
5 Policy Engine and approvals ✅ Complete
6 Workflow engine and triggers ✅ Complete
7 Dashboard UX ✅ Complete
8 Enterprise hardening ✅ Complete

Phase 0 — Foundation and scaffolding ✅

Done

  • Monorepo layout per Section 10 (apps/api, apps/web, packages/shared-types, docs/).
  • docker-compose.yml with postgres, redis, qdrant, api, worker, web (+ prod overlay with Caddy TLS).
  • FastAPI app skeleton with GET /health returning 200.
  • Pydantic-settings config loader reading every variable in Section 11.
  • Tooling: ruff, black, mypy (backend); eslint/prettier config (frontend); pytest.
  • GitHub Actions CI: lint + format check + types + tests + CodeQL.
  • .env.example, README.md, PROGRESS.md, MIT LICENSE, Makefile.

Acceptance: docker compose up brings up all services; GET /health returns 200; CI runs lint + tests.

Phase 1 — Auth, workspaces, RBAC ✅

Done

  • Models + Alembic migration: User, Workspace, Membership, ApiKey, AuditLog (UUID PKs, timestamps, tenant workspace_id).
  • Register / login / refresh with JWT access + refresh tokens (short access TTL, rotation on refresh).
  • Argon2 password hashing.
  • Workspace create + list + switch (active workspace carried in the access token).
  • RBAC: Owner / Admin / Builder / Operator / Viewer with the exact permission matrix from Section 7.3, enforced by a require(permission) dependency.
  • API keys: create (plaintext shown once), list, revoke; stored as SHA-256 hashes with scopes; X-API-Key auth with scope checks.
  • Hash-chained, append-only audit log; every privileged action writes an entry.
  • Tenant filtering at the repository layer — a principal can only read/write within its active workspace.

Acceptance: a user can register, create a workspace, invite a member with a role, and role checks block unauthorized calls; API key auth works; every privileged action writes an audit entry; tests cover the RBAC matrix, tenant isolation, and the audit hash chain.

Decisions / TODOs

  • # TODO(decision): invited users who do not yet exist are created as inactive, password-less accounts; the accept-invite + set-password flow is deferred to the dashboard phase.
  • # TODO(decision): API-key management (create/list/revoke) is gated under the members:manage permission (Owner/Admin) until a dedicated admin capability is introduced.
  • Audit seq is assigned per-workspace in the application layer; a DB-level monotonic guard is a follow-up for high-concurrency writes.

Phase 2 — LLM layer and Agent Core ✅

Done

  • Provider-agnostic async LLM client (legate/llm): OpenRouter + Ollama providers, a config-driven factory, and a scripted FakeLLMClient for tests.
  • Agent, Run, Step models + Alembic migration (RunStatus / StepType enums).
  • Agent CRUD API with tool-allow-list validation, plus GET /tools discovery.
  • Plan-act-observe loop (legate/agent/runner.py): structured {thought, action, args} output, JSON-schema arg validation, Policy Engine gate, tool execution, and observations fed back as untrusted data.
  • Guardrails/budgets: max steps, max tokens, and max wall-clock — per-agent overrides falling back to workspace settings; malformed output is retried (bounded) then fails safe.
  • Short-term memory: transcript token budgeting/trimming (system messages always retained).
  • Two built-in tools: kb.search (stub until Phase 3) and read-only http.request.
  • Live run streaming over WebSocket (/runs/{id}/stream) via an in-process event bus, with DB replay for late subscribers; background vs inline execution.
  • Minimal Policy Engine (legate/policy/engine.py) with the final allow / deny / require_approval vocabulary — default-deny + sensitive-tool suspend now, full rules in Phase 5.

Acceptance: create an agent, run it, and watch steps stream in over WebSocket; the agent calls a tool, gets an observation, and returns a final answer; step/token/wall-clock budgets are enforced; malformed LLM output is retried then fails safe. Tests mock the LLM and assert the loop, budgets, retries, RBAC, tenant isolation, and streaming.

Decisions / TODOs

  • # TODO(decision): sensitive tools currently suspend the run as waiting_approval with no Approval record; the inbox/resume flow lands in Phase 5.
  • # TODO(decision): kb.search returns an empty result set until the RAG layer (Phase 3) backs it with Qdrant.
  • Background runs use asyncio.create_task (in-process); moving execution onto Celery workers is a later hardening step.

Phase 3 — RAG knowledge layer ✅

Done

  • KnowledgeBase, Document, Chunk models (+ DocumentStatus enum) and Alembic migration.
  • Swappable embeddings (legate/rag/embeddings): OpenAIEmbedder + an offline, deterministic HashingEmbedder, behind a factory.
  • Swappable vector store (legate/rag/vectorstore): QdrantVectorStore + in-process InMemoryVectorStore; lazy collection creation, per-workspace+KB isolation.
  • Ingestion pipeline (legate/rag/ingestion.py): parse (TXT/MD native, PDF via pypdf, DOCX via python-docx) → overlapping chunking → embed → upsert, with pending → processing → ready | failed status tracking; idempotent, retryable re-ingestion.
  • Local object storage for uploads (legate/rag/storage.py); Celery ingestion task + inline mode (KB_INGEST_INLINE).
  • Retrieval with citations (legate/rag/retrieval.py); the kb.search tool now performs real per-workspace retrieval (degrading to empty when unconfigured).
  • Knowledge API: KB CRUD, document upload/list/get/delete/reingest, and POST /knowledge-bases/{id}/search.

Acceptance: upload a document, watch it ingest, and have an agent answer from it with citations; deleting a document removes its vectors; ingestion failures are visible (status=failed, error) and retryable via /documents/{id}/reingest. Verified by unit tests, API tests, an end-to-end agent-RAG test, and a 21/21 live-server smoke run.

Decisions / TODOs

  • # TODO(decision): kb.search searches all of a workspace's knowledge bases; a per-agent KB scoping option can come with the agent builder UI.
  • S3 storage backend and embedding via Ollama are deferred (local storage + OpenAI/hashing now).
  • Document→Chunk cleanup relies on Postgres ON DELETE CASCADE in production; the API also clears chunks/vectors explicitly on delete.

Phase 4 — Connector framework and credential vault ✅

Done

  • Connector interface + registry (legate/connectors): Connector protocol with validate_config, list_tools, test_connection, execute; a self-registering catalog so adding a connector needs no core changes.
  • First real connectors: HTTP (http.call, authenticated/any-method, sensitive), Slack (slack.post_message / list_channels / reply_thread), Email (email.send (sensitive) / email.draft / email.search).
  • Connector + Credential models and Alembic migration.
  • Envelope-encrypted credential vault (legate/vault): per-record DEK wrapped by a versioned master key (LEGATE_MASTER_KEY); rewrap supports key rotation. Plaintext only in memory; never logged or returned.
  • Connector API: install / list / get / update / delete, POST /connectors/{id}/test, GET /connectors/{id}/tools, and GET /connector-types. Secrets are write-only.
  • Tool resolution (legate/connectors/resolution.py): each run's tool registry = built-ins + the workspace's installed-connector tools; a connector tool decrypts its credential only when invoked.
  • Connector tools flow through the existing policy gate and JSON-schema arg validation; httpx/SMTP/IMAP behind injection seams for offline tests.

Acceptance: install a Slack connector with credentials, test the connection, and have an agent post a message; secrets are encrypted at rest and never returned by the API; adding a new connector requires no core changes. Verified by vault tests, connector unit tests, connector API tests (incl. a secret-leakage check), an end-to-end agent-posts-to-Slack test, and a 24/24 live-server smoke run.

Decisions / TODOs

  • # TODO(decision): slack.post_message is non-sensitive (so the agent can post in Phase 4); http.call and email.send are sensitive and will route through the approval inbox once Phase 5 lands.
  • One credential per connector (1:1); multi-account per type and per-connector tool disambiguation are future work.
  • Connector catalog since expanded well beyond the first three — now ten built-in types: GitHub (github.*), Notion (notion.*), an outgoing Webhook (webhook.send, HMAC-signed), PostgreSQL and MySQL (shared BaseSqlConnector: *.query read-only-guarded / *.execute sensitive), Stripe (stripe.get_customer / list_charges / create_refund sensitive), and Google Workspace (gsheets.* + gdrive.* over one OAuth2 credential) are all implemented and tested. Multi-account-per-type and Drive binary upload remain future work.
  • MCP client support (legate/mcp): Legate connects to remote MCP servers over Streamable HTTP as an mcp connector type. Tools are discovered on install/test and cached on the connector, then registered per-run as mcp.<id>.<tool> (sensitive unless the server marks a tool read-only). Surfaced automatically in the agent allow-list and workflow tool picker. Verified end-to-end against a live MCP server. stdio transport and exposing Legate as an MCP server are future work.

Phase 5 — Policy Engine and approvals ✅

Done

  • Full Policy Engine (legate/policy/engine.py): allow / deny / require_approval, with per-workspace approval rules (auto_approve / deny lists) read from workspace.settings_json.
  • Approval model (+ ApprovalStatus enum) and Alembic migration.
  • Suspend/resume in the runner: on require_approval it snapshots the transcript into an Approval, sets the run to waiting_approval, and stops; AgentRunner.resume rebuilds the transcript and continues — executing the action on approve, or feeding a rejection observation back to replan on reject.
  • Approvals inbox API: GET /approvals, GET /approvals/{id}, POST /approvals/{id}/approve|reject. Deciding requires approvals:decide; resume dispatches inline (tests) or in the background (prod).
  • Expiry (APPROVAL_TTL_HOURS) with auto-expire on read/decide; optional per-approval role gate; 409 on expired or already-decided approvals.
  • Every policy decision audited (policy.deny / policy.require_approval / policy.auto_approve) plus approval.approve / approval.reject.

Acceptance: an agent attempting a sensitive action is paused; an authorized approver approves it from the inbox and the action then executes; rejection resumes the run so the agent replans; an unauthorized user cannot approve; all decisions are audited. Verified by policy-engine unit tests and a full approval-flow suite (suspend / approve+execute / reject+replan / RBAC / expiry / double-decide / auto-approve / tenant isolation / audit), plus a 28/28 live-server smoke run.

Decisions / TODOs

  • Reject resumes the run so the agent can replan (rather than hard-stopping); a "reject-and-cancel" option can be added later.
  • Approval expiry is evaluated lazily (on read/decide); a Celery-beat sweep for proactive expiry is a later addition.
  • required_role is stored and enforced when set; the UI to set it arrives with the dashboard.

Phase 6 — Workflow engine and triggers ✅

Done

  • Workflow model (+ runs.workflow_id / runs.state_json) and Alembic migration.
  • Graph executor (legate/workflow/executor.py): topological execution with edge/branch following, one Step per node, so runs are fully reconstructable.
  • Node executors: trigger, agent (nested agent run), tool (through the policy engine), condition, transform, approval, delay (subworkflow reserved).
  • Sandboxed {{ }} templating (legate/workflow/templating.py) with a hand-written AST evaluator for conditions — no code execution.
  • Triggers: manual (POST /workflows/{id}/run), webhook (HMAC-signed POST /hooks/{token}, invalid → 401), schedule (cron via croniter + a Celery-beat task), and internal events (emit_event).
  • Pause/resume: approval nodes and sensitive tool nodes suspend the run and resume through the existing approvals inbox (state in run.state_json).
  • Per-node retry with backoff.
  • Workflow API: CRUD, run, and the signed webhook hook; managing requires build:write, running requires runs:execute.

Acceptance: build a workflow (webhook → agent → condition → tool) and fire the webhook to execute it end to end including an approval gate; a scheduled workflow runs on cron; invalid webhook signatures are rejected; runs are fully reconstructable from Steps. Verified by templating unit tests and a workflow suite (manual/condition/webhook/agent-node/approval-gate/reject/slack-tool/retry/schedule/event/RBAC/tenancy), plus a 34/34 live-server smoke run.

Decisions / TODOs

  • # TODO(decision): trigger config lives on the Workflow (trigger_type + trigger_config + webhook_*) rather than a separate Trigger table.
  • delay nodes continue immediately in inline execution; a scheduled delayed-resume is a later addition. subworkflow nodes are deferred.
  • Node join semantics are OR (a node runs if any incoming edge is taken); richer AND/wait-all joins can come later.

Phase 7 — Dashboard UX ✅

Done

  • Premium Next.js 14 dashboard (apps/web) with an original "Ink & Amber control-deck" design system — distinctive fonts (Bricolage Grotesque / Hanken Grotesk / JetBrains Mono), a single authoritative amber accent, monospace precision for IDs/hashes, and a subtle layered atmosphere.
  • Auth screens (login/register) with a two-column brand aside; client auth context with JWT storage + silent refresh + workspace switching.
  • App shell: sidebar nav (Overview/Build/Operate/Govern), workspace switcher, role-aware controls, mobile drawer.
  • Agent builder (prompt, model, tool allow-list with sensitive badges, guardrail budgets) + list + run.
  • Workflow builder (JSON graph editor + trigger picker; webhook credentials shown once) + list + run.
  • Run viewer with live WebSocket streaming — a step-by-step "mission log" timeline with per-step input/output.
  • Approval inbox (approve/reject with reason, tabs, live refresh).
  • Connector setup wizard (schema-driven install for any connector type) + test connection.
  • Knowledge base manager (create, upload, per-KB document status, search with citations).
  • Audit explorer (hash-chained, tamper-evident) and Admin (members + roles, API keys shown once).
  • Backend: CORS middleware for the dashboard origin (CORS_ORIGINS).
  • Playwright e2e covering the full acceptance path (apps/web/tests/e2e/core-path.spec.ts).

Acceptance: a non-technical user can register, create an agent, connect Slack, build and run a workflow, approve an action, and read the audit trail — all from the UI, no API calls. Verified by a green Playwright e2e run against the live stack, plus manual screenshots of every core screen.

Decisions / TODOs

  • The workflow builder ships as a JSON/form editor (per the plan's de-risking note); a visual graph editor is a follow-up.
  • Data fetching uses TanStack Query; the run viewer streams over the existing WebSocket.

Phase 8 — Enterprise hardening ✅

Done

  • Billing & plans (legate/billing): free/pro/enterprise plans with limits + features; enforcement on agents/workflows/members/run-quota (402 over limit); usage metering (UsageRecord + migration); GET /billing, GET /usage; Stripe checkout + webhook that updates a workspace's plan. Gated by BILLING_ENABLED.
  • SSO via OIDC (legate/security/oidc.py, legate/api/sso.py): authorization-code login + callback with just-in-time user provisioning; swappable client so tests run against a simulated IdP. SAML is a documented extension point.
  • Rate limiting (legate/security/ratelimit.py): per-workspace run limiter (429), gated by RATE_LIMIT_ENABLED; plan run quotas layer on top.
  • Observability: Prometheus /metrics + request middleware + run counters; OpenTelemetry tracing (config-gated, optional [tracing] extra); structured JSON logging in production.
  • CORS for the dashboard origin.
  • Infra: TLS deploy/Caddyfile; a full kustomize k8s manifest set (deploy/k8s/* — configmap/secret template, Postgres/Redis/Qdrant, api/worker/beat/web, ingress, migrate init-container); hardened docker-compose.prod.yml overlay.
  • Docs: docs/backup.md (backup + restore, incl. the master-key caveat), docs/operations.md.
  • CI security pass: CodeQL (existing) + pip-audit + gitleaks secret scan.

Acceptance: plans gate features and meter usage (402 + /usage); SSO login works against a (test) IdP; metrics and traces are exportable (/metrics, OTLP); a documented restore procedure brings the system back (incl. verifying the master key still unwraps credentials); CI runs CodeQL + dependency + secret scans. Verified by billing/SSO/rate-limit/metrics tests.

Decisions / TODOs

  • OIDC is implemented; SAML is documented as an extension behind the same provisioning seam.
  • The rate limiter is in-process (fine for one replica); a Redis-backed limiter drops in behind enforce_run_rate for multi-replica.
  • k8s manifests ship as kustomize; a packaged Helm chart is a straightforward follow-up.

🎉 All phases complete

Phases 0–8 are done: the agent runtime, RAG, connectors + vault, policy + approvals, workflows, the dashboard, and enterprise hardening. See each phase section above for details.