Skip to content
Open
44 changes: 31 additions & 13 deletions docs/features/agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The AI Agent is a model-powered assistant integrated into the visual editor. The user types a request in the Agent Panel; the agent reads the current page snapshot, plans a sequence of edits, and executes them by calling tools. Structure is written as semantic HTML (`insertHtml` / `replaceNodeHtml`); styling is written as CSS — a `<style>` block and/or `class=` attributes inside the insert, or the dedicated `applyCss` tool for authoring/editing any CSS on its own. There is one CSS path and it accepts every selector; `assignClass` / `removeClass` attach existing classes to nodes.

The agent runs on a provider-agnostic AI runtime (`server/ai/`) that can drive any supported model (Anthropic Claude, OpenAI, OpenRouter, Ollama). Every driver talks directly to its provider's REST API over HTTP/SSE — no provider SDKs. All four share one multi-turn tool loop (`drivers/http/toolLoop.ts`); each supplies only a small `ProviderAdapter` of pure mapping functions. The plain `@anthropic-ai/sdk` (and any provider SDK) is banned repo-wide. Gated by `ai-driver-isolation.test.ts`.
The agent runs on a provider-agnostic AI runtime (`server/ai/`) that can drive any supported model (Anthropic Claude, OpenAI, OpenRouter, Ollama, or any OpenAI-compatible endpoint). Every driver talks directly to its provider's REST API over HTTP/SSE — no provider SDKs. All drivers share one multi-turn tool loop (`drivers/http/toolLoop.ts`); each supplies only a small `ProviderAdapter` of pure mapping functions. The plain `@anthropic-ai/sdk` (and any provider SDK) is banned repo-wide. Gated by `ai-driver-isolation.test.ts`.

---

Expand All @@ -12,7 +12,7 @@ The agent runs on a provider-agnostic AI runtime (`server/ai/`) that can drive a
- **Styling via CSS.** The agent emits CSS the same way a human pastes it: a `<style>` block and/or `class=` attributes inside the `insertHtml`/`replaceNodeHtml` payload, or the standalone `applyCss` tool. The importer (`cssToStyleRules`) classifies every selector — a bare `.foo {}` rule becomes a reusable Selectors-panel class bound to `class="foo"`; any other selector (`.hero a`, `a:hover`, `nav > li`) becomes an ambient rule; `style=` attributes land on the node's inline styles. There is no structured `classes` parameter — the agent never hand-builds classes node-by-node at insert time. `applyCss` is the single tool for authoring/editing CSS on its own; it **upserts**, so re-applying a selector edits the existing rule (the way descendant/pseudo rules get restyled).
- **35 tools total.** 6 server-side catalog read tools (resolved server-side from the posted snapshot / DB) + 29 browser-bridged tools.
- **Two-endpoint bridge.** `POST /admin/api/ai/chat/site` opens an NDJSON stream. When the model calls a browser-bridged tool, the server emits `toolRequest`; the browser executor reads or mutates the editor store and POSTs the `AiToolOutput` result to `POST /admin/api/ai/tool-result`.
- **Provider-agnostic.** The runtime selects a driver (Anthropic, OpenAI, OpenRouter, Ollama) from the conversation's configured credential.
- **Provider-agnostic.** The runtime selects a driver (Anthropic, OpenAI, OpenRouter, Ollama, Custom Provider) from the conversation's configured credential.
- **Tool input schemas are a single source of truth** in `@core/ai` (`src/core/ai/toolSchemas.ts`). The server tool registry (`server/ai/tools/site/writeTools.ts`) and the browser executor (`executor.ts` + `tokenRunners.ts`) import the exact same schema objects — a constraint added once is enforced on both sides at build time. Gated by `ai-tool-schema-ssot.test.ts` and `ai-tools-typebox-only.test.ts`.
- **Capabilities.** `ai.chat` required to stream; `ai.tools.write` required for write tools. Gated by `ai-handlers-capability-gated.test.ts`.

Expand Down Expand Up @@ -56,16 +56,18 @@ server/ai/
│ └── content/ — content-workspace tools (separate scope)
├── drivers/
│ ├── http/
│ │ ├── sse.ts — parseSseStream(res): reassemble SSE frames across chunks
│ │ ├── execTool.ts — executeAiTool(): server-handler vs browser-bridge dispatch; normaliseToolOutput(): wraps raw handler results in the canonical AiToolOutput envelope, validated via TypeBox (not duck-typed)
│ │ ├── toolLoop.ts — runToolLoop(): provider-agnostic multi-turn loop
│ │ ├── toolArgs.ts — parseToolArguments(json): shared tool-argument JSON parsing (one copy for all drivers)
│ │ └── errors.ts — isAbortError / classifyHttpError
│ ├── responses-shared.ts — OpenAI-Responses mapping + SSE translator + adapter factory (openai + openrouter)
│ ├── anthropic.ts — Anthropic driver: direct POST /v1/messages (no SDK)
│ ├── openai.ts — OpenAI driver: direct POST /v1/responses (no SDK)
│ ├── openrouter.ts — OpenRouter driver: direct POST /v1/responses (shared Responses path; live /models; native cost)
│ └── ollama.ts — Ollama driver: direct POST /v1/chat/completions (no SDK)
│ │ ├── sse.ts — parseSseStream(res): reassemble SSE frames across chunks
│ │ ├── execTool.ts — executeAiTool(): server-handler vs browser-bridge dispatch; normaliseToolOutput(): wraps raw handler results in the canonical AiToolOutput envelope, validated via TypeBox (not duck-typed)
│ │ ├── toolLoop.ts — runToolLoop(): provider-agnostic multi-turn loop
│ │ ├── toolArgs.ts — parseToolArguments(json): shared tool-argument JSON parsing (one copy for all drivers)
│ │ ├── chatCompletions.ts — shared /v1/chat/completions SSE adapter (makeChatCompletionsAdapter); used by ollama + openai-compatible
│ │ └── errors.ts — isAbortError / classifyHttpError
│ ├── responses-shared.ts — OpenAI-Responses mapping + SSE translator + adapter factory (openai + openrouter)
│ ├── anthropic.ts — Anthropic driver: direct POST /v1/messages (no SDK)
│ ├── openai.ts — OpenAI driver: direct POST /v1/responses (no SDK)
│ ├── openrouter.ts — OpenRouter driver: direct POST /v1/responses (shared Responses path; live /models; native cost)
│ ├── ollama.ts — Ollama driver: POST /v1/chat/completions via shared chatCompletions adapter; live /api/tags catalogue
│ └── openaiCompatible.ts — Custom Provider driver: any /v1/chat/completions endpoint; live GET /v1/models catalogue
└── runtime/
├── runner.ts — runChat(): drives a driver, emits stream events
├── persister.ts — ConversationsPersister: messages + usage to DB; writes contextTokens snapshot
Expand Down Expand Up @@ -129,6 +131,22 @@ The composer area includes a `<ContextMeter>` that shows "context used / window"

---

## Providers

Each entry in **Settings → AI → Providers** stores one credential. The provider id is fixed; the auth mode and input fields are derived from it — the UI never asks you to choose.

| Provider | Label in UI | Auth mode | Required field | Optional field | Model discovery |
|---|---|---|---|---|---|
| `anthropic` | Anthropic (Claude) | `apiKey` | API key (`sk-ant-…`) | — | Static `claude-*` catalogue enriched with OpenRouter prices + context windows |
| `openai` | OpenAI | `apiKey` | API key (`sk-…`) | — | Static `gpt-*` / `o*` catalogue enriched with OpenRouter prices + context windows |
| `openrouter` | OpenRouter | `apiKey` | API key (`sk-or-…`) | — | Live `GET /api/v1/models` (cross-provider; native cost reporting) |
| `ollama` | Ollama (local) | `baseUrl` | Base URL (e.g. `http://localhost:11434`) | API key (bearer, for proxied deployments) | Live `GET {baseUrl}/api/tags`; static fallback list when unreachable |
| `openai-compatible` | Custom Provider | `baseUrl` | Base URL — any host serving the OpenAI `/v1/chat/completions` wire protocol | API key (bearer; cloud services need one, local servers often don't) | Live `GET {baseUrl}/v1/models` (standard OpenAI list shape); model `id` used as label |

**Custom Provider** (id `openai-compatible`) is the generic adapter for any endpoint that speaks the OpenAI chat/completions wire protocol — Groq (`https://api.groq.com/openai`), Together, DeepSeek, Mistral, Fireworks, self-hosted vLLM, LM Studio, and others. Capabilities default to `{ toolCalling: true, visionInput: false, promptCache: false, streaming: true }`; the operator is responsible for selecting a model that actually supports tool calling. Because arbitrary endpoints are not in the OpenRouter catalogue, no context-window enrichment is available and the context meter stays hidden for these models.

---

## Flow

```text
Expand Down Expand Up @@ -561,7 +579,7 @@ The `<ContextMeter>` shows how much of the active model's context window the cur
- **Window** (`windowTokens` prop from `AgentPanel`): the model's max total tokens, resolved once from `GET /admin/api/ai/providers/:id/models?credentialId=…`. The models endpoint enriches Anthropic and OpenAI models with `contextWindow` from the live OpenRouter catalogue (`server/ai/pricing/`); OpenRouter populates it from its own native fetch. Ollama models and uncatalogued models have no window — the meter hides.
- **Used** (`agentContextTokens` in the store): the provider-normalised "context used" — the CURRENT context size, computed by `normalizeContextTokens(providerId, buckets)` in `server/ai/contextTokens.ts`:
- Anthropic reports `input_tokens` excluding cache buckets, so the true total is `promptTokens + cacheReadTokens + cacheCreationTokens`.
- OpenAI / OpenRouter / Ollama report `input_tokens` as the full input; `promptTokens` alone is the total.
- OpenAI / OpenRouter / Ollama / Custom Provider report `input_tokens` as the full input; `promptTokens` alone is the total.

**Live, per-round, not summed.** A turn makes one provider round-trip per tool batch. The toolLoop emits a `context` event **each round** carrying THAT round's input buckets; the chat handler injects the normalised `contextTokens` and the browser updates the meter on every round — so it climbs *during* a long tool loop instead of only at the end. The meter is the LATEST round's input (the current window fill), never the sum across rounds (which would over-count, since each round re-sends the growing context). The terminal `usage` event is **billing only** — its `promptTokens` stays summed across rounds (you pay input per round). The persister keeps the latest `context` value in memory (`recordContext`) and writes it once to `ai_conversations.context_tokens` with the final `usage` (overwritten per turn), so `loadAgentConversation` restores the true context on reload.

Expand Down
5 changes: 3 additions & 2 deletions server/ai/contextTokens.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@
*
* - Anthropic reports `input_tokens` EXCLUDING the cache buckets, so the true
* total is prompt + cacheRead + cacheCreation.
* - OpenAI / OpenRouter / Ollama report `input_tokens` as the full input (any
* cached tokens are already a subset), so prompt alone is the total.
* - OpenAI / OpenRouter / Ollama / Custom Provider report `input_tokens` as
* the full input (any cached tokens are already a subset), so prompt alone
* is the total.
*
* Two callers share this: the chat handler injects the value onto the wire
* `usage` event for the live meter, and the persister writes it to the
Expand Down
117 changes: 117 additions & 0 deletions server/ai/drivers/http/chatCompletions.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
import { describe, it, expect } from 'bun:test'
import {
mapChatHistory,
ChatCompletionsTurnTranslator,
trimSlash,
normalizeOpenAiBaseUrl,
} from './chatCompletions'
import type { SseFrame } from './sse'

function frame(obj: unknown): SseFrame {
return { event: null, data: JSON.stringify(obj) }
}

describe('chatCompletions shared adapter', () => {
it('trimSlash strips trailing slashes', () => {
expect(trimSlash('http://x/v1/')).toBe('http://x/v1')
expect(trimSlash('http://x/v1')).toBe('http://x/v1')
})

it('normalizeOpenAiBaseUrl strips trailing /v1 so it is not doubled when building the endpoint', () => {
// With /v1 suffix — should strip it so appending /v1/... is correct.
expect(normalizeOpenAiBaseUrl('https://api.groq.com/openai/v1')).toBe('https://api.groq.com/openai')
expect(normalizeOpenAiBaseUrl('https://api.groq.com/openai/v1/')).toBe('https://api.groq.com/openai')
// Without /v1 suffix — no-op.
expect(normalizeOpenAiBaseUrl('https://api.groq.com/openai')).toBe('https://api.groq.com/openai')
// Ollama-style URL with no path — no-op.
expect(normalizeOpenAiBaseUrl('http://localhost:11434')).toBe('http://localhost:11434')
expect(normalizeOpenAiBaseUrl('http://localhost:11434/')).toBe('http://localhost:11434')
})

it('mapChatHistory prepends the system prompt as a system message', () => {
const turns = mapChatHistory(['be terse'], [
{ role: 'user', content: [{ kind: 'text', text: 'hi' }] },
])
expect(turns[0]).toEqual([{ role: 'system', content: 'be terse' }])
expect(turns[1]).toEqual([{ role: 'user', content: 'hi' }])
})

it('translator accumulates streamed text and finishes with stop=true when no tool calls', () => {
const t = new ChatCompletionsTurnTranslator()
const events = t.translate(frame({ choices: [{ delta: { content: 'Hello' } }] }))
expect(events).toEqual([{ type: 'text', text: 'Hello' }])
const result = t.finish()
expect(result.stop).toBe(true)
expect(result.toolCalls).toEqual([])
})

// Real OpenAI-compatible gateways (OpenCode Zen, OpenRouter, …) send explicit
// `null` for optional per-chunk fields rather than omitting them. The chunk
// schema must tolerate these, or `parseValue` throws, the frame is dropped,
// and the model's entire reply silently vanishes ("no reply").
it('still emits text when the chunk carries usage:null', () => {
const t = new ChatCompletionsTurnTranslator()
const events = t.translate(frame({ choices: [{ delta: { content: 'Hi' } }], usage: null }))
expect(events).toEqual([{ type: 'text', text: 'Hi' }])
})

it('still emits text when delta.tool_calls is null', () => {
const t = new ChatCompletionsTurnTranslator()
const events = t.translate(
frame({ choices: [{ delta: { content: 'Hi', reasoning_content: null, tool_calls: null }, finish_reason: 'stop' }], usage: null }),
)
expect(events).toEqual([{ type: 'text', text: 'Hi' }])
})

it('captures the final content of a reasoning model (content empty during reasoning, filled at the end)', () => {
const t = new ChatCompletionsTurnTranslator()
// Reasoning phase: content is "" (or null), answer lives in reasoning_content; tool_calls/usage are null.
t.translate(frame({ choices: [{ delta: { content: '', reasoning_content: 'thinking…', tool_calls: null } }], usage: null }))
t.translate(frame({ choices: [{ delta: { content: null, reasoning_content: ' more' } }], usage: null }))
// Final answer arrives in content.
const last = t.translate(
frame({ choices: [{ delta: { content: 'Hello there!', tool_calls: null }, finish_reason: 'stop' }], usage: null }),
)
expect(last).toEqual([{ type: 'text', text: 'Hello there!' }])
const result = t.finish()
expect(result.stop).toBe(true)
expect(result.assistantMessage?.[0]).toMatchObject({ role: 'assistant', content: 'Hello there!' })
})

// Reasoning models stream their chain-of-thought in a separate delta field
// (`reasoning_content`, or `reasoning` on OpenRouter-style gateways). We surface
// it as ephemeral `reasoning` events for a live "Thinking…" indicator — but it
// must NEVER enter the assistant message (not persisted, not replayed).
it('emits reasoning events for delta.reasoning_content without polluting the answer', () => {
const t = new ChatCompletionsTurnTranslator()
const r = t.translate(frame({ choices: [{ delta: { content: '', reasoning_content: 'Let me think…' } }] }))
expect(r).toEqual([{ type: 'reasoning', text: 'Let me think…' }])
const a = t.translate(frame({ choices: [{ delta: { content: 'Hello!' }, finish_reason: 'stop' }] }))
expect(a).toEqual([{ type: 'text', text: 'Hello!' }])
const result = t.finish()
// Only the answer is in the assistant message — reasoning is gone.
expect(result.assistantMessage?.[0]).toMatchObject({ role: 'assistant', content: 'Hello!' })
})

it('also emits reasoning events for the alternate delta.reasoning field', () => {
const t = new ChatCompletionsTurnTranslator()
const r = t.translate(frame({ choices: [{ delta: { content: '', reasoning: 'thinking' } }] }))
expect(r).toEqual([{ type: 'reasoning', text: 'thinking' }])
})

it('translator emits one toolCall event per accumulated call at finish_reason', () => {
const t = new ChatCompletionsTurnTranslator()
t.translate(frame({ choices: [{ delta: { tool_calls: [
{ index: 0, id: 'c1', function: { name: 'insertHtml', arguments: '{"ht' } },
] } }] }))
const ev = t.translate(frame({ choices: [{ delta: { tool_calls: [
{ index: 0, function: { arguments: 'ml":"<p/>"}' } },
] }, finish_reason: 'tool_calls' }] }))
const toolEvent = ev.find((e) => e.type === 'toolCall')
expect(toolEvent).toBeTruthy()
expect(toolEvent).toMatchObject({ type: 'toolCall', toolName: 'insertHtml', toolCallId: 'c1' })
const result = t.finish()
expect(result.stop).toBe(false)
expect(result.toolCalls[0]).toMatchObject({ name: 'insertHtml' })
})
})
Loading