From cc5564802b307c889363359b313045faa24cd720 Mon Sep 17 00:00:00 2001
From: HiranoMasaaki <lambda.groove@gmail.com>
Date: Sun, 8 Mar 2026 07:57:27 +0900
Subject: [PATCH] docs: revamp README and add Fireworks provider documentation

- Restructure README with clearer Getting Started flow (define, run, integrate, deploy)
- Position Fireworks as the reference provider in all examples
- Add "Why micro-agents?" section explaining reusability, cost, speed, and maintainability
- Add "Integrating with your app" section with architecture diagram
- Add Fireworks section to providers-and-models documentation
- Fix broken #providers anchor link
- Add demo cost info and npx command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 README.md                               | 113 ++++++++++++++----------
 docs/references/providers-and-models.md |  55 +++++++++---
 2 files changed, 113 insertions(+), 55 deletions(-)
diff --git a/README.md b/README.md
index d05b3c70..b9a9b346 100644
--- a/README.md
+++ b/README.md
@@ -15,21 +15,31 @@ A harness for micro-agents.
   <a href="https://discord.gg/2xZzrxC9"><strong>Discord</strong></a>
 </p>
 
-Perstack is an infrastructure for **practical agentic AI** that aims to:
+If you're looking for a way to build practical agentic apps like Claude Code or OpenClaw, you'll need a harness to manage the complexity.
+
+Perstack is a harness for agentic apps. It aims to:
+
 - **Do big things with small models**: If a smaller model can make the same thing, there's no reason to use a bigger model.
-- **Focus on what you know best**: Building agentic software that people use doesn't require AI science degree — knowledge to solve their problems is what matters.
+- **Quality is a system property, not a model property**: Building agentic software that people use doesn't require AI science degree, just knowledge to solve their problems.
 - **Keep it simple and reliable**: The biggest mistake is cramming AI into an overly complex harness and ending up with an uncontrollable agent.
 
 ## Getting Started
 
+Perstack is designed so that defining experts, running them, and integrating them into applications remain separate concerns.
+
+`create-expert` scaffolds experts, the harness handles orchestration, and deployment stays simple because Perstack runs on standard container and serverless infrastructure.
+
+### Defining your first expert
+
 To get started, you can use the `create-expert` Expert that helps you to focus on the core and build your first agentic AI:
 
 ```bash
 # Ask `create-expert` to form a team named `ai-gaming`
 docker run --pull always --rm -it \
-  -e ANTHROPIC_API_KEY \
+  -e FIREWORKS_API_KEY \
   -v ./ai-gaming:/workspace \
   perstack/perstack start create-expert \
+    --provider fireworks \
     "Form a team named ai-gaming to build a Bun-based CLI cutting-edge indie game playable on Bash."
 ```
 
@@ -54,24 +64,68 @@ description = "Tests the game and reports bugs"
 instruction = "Play-test the game, find bugs, and verify fixes."
 ```
 
+### Running your expert
+
 To let your agents work on an actual task, you can use the `perstack start` command to run them interactively:
 
 ```bash
 # Let `ai-gaming` team build a Wizardry-like dungeon crawler
 docker run --pull always --rm -it \
-  -e ANTHROPIC_API_KEY \
+  -e FIREWORKS_API_KEY \
   -v ./ai-gaming:/workspace \
   perstack/perstack start ai-gaming \
-    --model "haiku-4-5" \
+    --provider fireworks \
     "Make a Wizardry-like dungeon crawler. Make it replayable, so players can dive in, die, and find a way to beat it."
 ```
 
-Here is an example of a game built with these commands: [demo-dungeon-crawler](https://github.com/perstack-ai/demo-dungeon-crawler). It was built entirely on Claude 4.5 Haiku.
+Here is an example of a game built with these commands: [demo-dungeon-crawler](https://github.com/perstack-ai/demo-dungeon-crawler). It was built entirely using Kimi K2.5 on Fireworks for under $0.10 in total API cost. You can play it directly:
+
+```bash
+npx perstack-demo-dungeon-crawler start
+```
+
+### Integrating with your app
+
+Perstack separates the agent harness from the application layer. Your app stays a normal web or terminal app, with no LLM dependencies in the client.
+
+```
+┌─────────────────┐              ┌──────────────────┐
+│  Your app       │   events     │  perstack run    │
+│  (React, TUI…)  │◄─────────── │  (@perstack/     │
+│                 │  SSE / WS /  │    runtime)       │
+│  @perstack/     │  any stream  │                  │
+│    react        │              │                  │
+└─────────────────┘              └──────────────────┘
+     Frontend                         Server
+```
+
+Swap models, change agent topology, or scale the harness — without touching application code. [`@perstack/react`](https://www.npmjs.com/package/@perstack/react) provides hooks (`useJobStream`, `useRun`) that turn the event stream into React state. See the [documentation](https://perstack.ai/docs/references/react/) for details.
+
+### Deployment
+
+```dockerfile
+FROM perstack/perstack:latest
+COPY perstack.toml .
+RUN perstack install
+ENTRYPOINT ["perstack", "run", "my-expert"]
+```
+
+The image is ubuntu-based, multi-arch (`linux/amd64`, `linux/arm64`) and weighs ~74MB. The runtime can also be imported directly as a TypeScript library ([`@perstack/runtime`](https://www.npmjs.com/package/@perstack/runtime)) for serverless environments. See the [deployment guide](https://perstack.ai/docs/operating-experts/deployment/) for details.
+
+
+### Why micro-agents?
+
+Perstack is a harness for micro-agents — purpose-specific agents with a single responsibility.
+
+- **Reusable**: Delegates are dependency management for agents — like npm packages or crates. Separate concerns through delegate chains, and compose purpose-built experts across different projects.
+- **Cost-Effective**: Purpose-specific experts are designed to run on affordable models. A focused agent with the right domain knowledge on a cheap model outperforms a generalist on an expensive one.
+- **Fast**: Smaller models generate faster. Fine-grained tasks broken into delegates run concurrently via parallel delegation.
+- **Maintainable**: A monolithic system prompt is like refactoring without tests — every change risks breaking something. Single-responsibility experts are independently testable. Test each one, then compose them.
 
 ## Prerequisites
 
 - Docker
-- An LLM provider API key (see [Providers](#providers))
+- An LLM provider API key (see [Providers and Models](https://perstack.ai/docs/references/providers-and-models/))
 
 ### Giving API keys
 
@@ -82,11 +136,11 @@ There are two ways to provide API keys:
 Export the key on the host and forward it to the container:
 
 ```bash
-export ANTHROPIC_API_KEY=sk-ant-...
+export FIREWORKS_API_KEY=fw_...
 docker run --rm -it \
-  -e ANTHROPIC_API_KEY \
+  -e FIREWORKS_API_KEY \
   -v ./workspace:/workspace \
-  perstack/perstack start my-expert "query"
+  perstack/perstack start my-expert "query" --provider fireworks
 ```
 
 **2. Store keys in a `.env` file in the workspace**
@@ -95,7 +149,7 @@ Create a `.env` file in the workspace directory. Perstack loads `.env` and `.env
 
 ```bash
 # ./workspace/.env
-ANTHROPIC_API_KEY=sk-ant-...
+FIREWORKS_API_KEY=fw_...
 ```
 
 ```bash
@@ -141,19 +195,19 @@ Perstack organizes the complexity of an agent harness into five layers with clea
 | **Definition** | `perstack.toml` | Declarative project config with global defaults (model, reasoning budget, retries, timeout) |
 | | Expert definitions | Instruction, description, delegates, tags, version, and minimum runtime version per expert |
 | | Skill types | MCP stdio, MCP SSE, and interactive skills with tool pick/omit filtering and domain restrictions |
-| | Provider config | 8 providers (Anthropic, OpenAI, Google, Azure OpenAI, Amazon Bedrock, Google Vertex, Ollama, DeepSeek) with per-provider settings |
+| | Provider config | 9 providers (Anthropic, OpenAI, Google, Fireworks, DeepSeek, Ollama, Azure OpenAI, Amazon Bedrock, Google Vertex) with per-provider settings |
 | | Model tiers | Provider-aware model selection via `defaultModelTier` (low / middle / high) with fallback cascade |
 | | Provider tools | Provider-native capabilities (web search, code execution, image generation, etc.) with per-tool options |
 | | Lockfile | `perstack.lock` — resolved snapshot of experts and tool definitions for reproducible deployments |
 | **Context** | Meta-prompts | Role-specific system prompts (coordinator vs. delegate) with environment injection (time, working directory, sandbox) |
 | | Context window tracking | Per-model context window lookup with usage ratio monitoring |
 | | Message types | Instruction, user, expert, and tool messages with text, image, file, thinking, and tool-call parts |
-| | Prompt caching | Anthropic ephemeral cache control with cache-hit tracking |
+| | Prompt caching | Provider-specific cache control with cache-hit tracking |
 | | Delegation | Parallel child runs with isolated context, parent history preservation, and result aggregation |
 | | Extended thinking | Provider-specific reasoning budgets (Anthropic thinking, OpenAI reasoning effort, Google thinking config) |
 | | Token usage | Input, output, reasoning, cached, and total token tracking accumulated across steps and delegations |
 | | Resume / continue | Resume from any checkpoint, specific job, or delegation stop point |
-| **Runtime** | State machine | 9-state XState machine (init → generate → call tools → resolve → finish, with delegation and interactive stops) |
+| **Runtime** | State machine | 9-state machine (init → generate → call tools → resolve → finish, with delegation and interactive stops) |
 | | Event-sourcing | 21 run events, 6 streaming events, and 5 runtime events for full execution observability |
 | | Checkpoints | Immutable state snapshots with messages, usage, pending tool calls, and delegation metadata |
 | | Skill manager | Dynamic skill lifecycle — connect, discover tools, execute, disconnect — with adapter pattern |
@@ -176,37 +230,6 @@ Perstack organizes the complexity of an agent harness into five layers with clea
 
 </details>
 
-
-## Deployment
-
-```dockerfile
-FROM perstack/perstack:latest
-COPY perstack.toml .
-RUN perstack install
-ENTRYPOINT ["perstack", "run", "my-expert"]
-```
-
-```bash
-docker build -t my-expert .
-docker run --rm -e ANTHROPIC_API_KEY my-expert "query"
-```
-
-The image is ubuntu-based, multi-arch (`linux/amd64`, `linux/arm64`) and weighs ~74MB.
-
-The runtime can also be imported directly as a TypeScript library for serverless environments (Cloudflare Workers, Vercel, etc.) or integrated into your own applications:
-
-```typescript
-import { run } from "@perstack/runtime"
-
-const checkpoint = await run({
-  setting: {
-    providerConfig: { providerName: "anthropic", apiKey: env.ANTHROPIC_API_KEY },
-    expertKey: "my-expert",
-    input: { text: query },
-  },
-})
-```
-
 ## Documentation
 
 | Topic | Link |
diff --git a/docs/references/providers-and-models.md b/docs/references/providers-and-models.md
index 4f256789..08adb1cb 100644
--- a/docs/references/providers-and-models.md
+++ b/docs/references/providers-and-models.md
@@ -32,16 +32,17 @@ npx perstack run my-expert "query" --provider google --model gemini-2.5-pro
 
 ## Supported Providers
 
-| Provider         | Key              | Description                   |
-| ---------------- | ---------------- | ----------------------------- |
-| Anthropic        | `anthropic`      | Claude models (default)       |
-| Google           | `google`         | Gemini models                 |
-| OpenAI           | `openai`         | GPT and reasoning models      |
-| DeepSeek         | `deepseek`       | DeepSeek models               |
-| Ollama           | `ollama`         | Local model hosting           |
-| Azure OpenAI     | `azure-openai`   | Azure-hosted OpenAI models    |
-| Amazon Bedrock   | `amazon-bedrock` | AWS Bedrock-hosted models     |
-| Google Vertex AI | `google-vertex`  | Google Cloud Vertex AI models |
+| Provider         | Key              | Description                          |
+| ---------------- | ---------------- | ------------------------------------ |
+| Anthropic        | `anthropic`      | Claude models                        |
+| Google           | `google`         | Gemini models                        |
+| OpenAI           | `openai`         | GPT and reasoning models             |
+| Fireworks        | `fireworks`      | Open-weight models (Kimi, DeepSeek)  |
+| DeepSeek         | `deepseek`       | DeepSeek models                      |
+| Ollama           | `ollama`         | Local model hosting                  |
+| Azure OpenAI     | `azure-openai`   | Azure-hosted OpenAI models           |
+| Amazon Bedrock   | `amazon-bedrock` | AWS Bedrock-hosted models            |
+| Google Vertex AI | `google-vertex`  | Google Cloud Vertex AI models        |
 
 ## Anthropic
 
@@ -201,6 +202,40 @@ export DEEPSEEK_API_KEY=sk-...
 npx perstack run my-expert "query" --provider deepseek --model deepseek-chat
 ```
 
+## Fireworks
+
+**Environment variables:**
+| Variable             | Required | Description     |
+| -------------------- | -------- | --------------- |
+| `FIREWORKS_API_KEY`  | Yes      | API key         |
+| `FIREWORKS_BASE_URL` | No       | Custom endpoint |
+
+**perstack.toml settings:**
+```toml
+[provider]
+providerName = "fireworks"
+[provider.setting]
+baseUrl = "https://custom-endpoint.example.com"  # Optional
+headers = { "X-Custom-Header" = "value" }        # Optional
+```
+
+| Setting   | Type   | Description         |
+| --------- | ------ | ------------------- |
+| `baseUrl` | string | Custom API endpoint |
+| `headers` | object | Custom HTTP headers |
+
+**Models:**
+| Model                                    | Context | Max Output |
+| ---------------------------------------- | ------- | ---------- |
+| `accounts/fireworks/models/kimi-k2p5`    | 262K    | 262K       |
+| `accounts/fireworks/models/deepseek-v3p2`| 164K    | 164K       |
+| `accounts/fireworks/models/glm-5`        | 203K    | 203K       |
+
+```bash
+export FIREWORKS_API_KEY=fw_...
+npx perstack run my-expert "query" --provider fireworks --model accounts/fireworks/models/kimi-k2p5
+```
+
 ## Ollama
 
 **Environment variables:**