-
Notifications
You must be signed in to change notification settings - Fork 45
v0.1.0 release doc #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
v0.1.0 release doc #266
Changes from all commits
2b8f8f6
ed24459
101bba5
81c0619
b054f44
09cfb52
29b1b2b
3feda5e
d6cf95d
571c6f4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,248 @@ | ||||||
| # AgentCube v0.1.0 Released: Serverless Orchestration Layer for AI Agents | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| AgentCube v0.1.0 is the **first official release** of AgentCube, a Volcano subproject that extends Kubernetes with native support for AI agent and code interpreter workloads. This release establishes the foundational architecture: a lightweight HTTP reverse proxy (Router) routes agent invocations to per-session microVM sandboxes, while a Workload Manager controls sandbox lifecycle, warm pools, and garbage collection. A minimal runtime daemon (PicoD) replaces SSH inside sandboxes, providing secure code execution, file operations, and JWT-based authentication with zero protocol overhead. Session state is stored in Redis/Valkey, enabling horizontal Router scaling. Two new Kubernetes CRDs — `AgentRuntime` and `CodeInterpreter` — are introduced to model agent workloads as first-class Kubernetes resources. A Python SDK, LangChain integration, and Dify plugin are included to make AgentCube immediately usable from popular AI frameworks. | ||||||
|
|
||||||
| ## What's New | ||||||
|
|
||||||
| ### Key Features Overview | ||||||
|
|
||||||
| - **Session-Based MicroVM Agent Routing**: Stateful request routing with session affinity, backed by isolated microVM sandboxes per session | ||||||
| - **AgentRuntime and CodeInterpreter CRDs**: Kubernetes-native abstractions for conversational agent and secure code interpreter workloads | ||||||
| - **Warm Pool for Fast Cold Starts**: Pre-warmed sandbox pool support for `CodeInterpreter`, reducing invocation latency via `SandboxClaim` adoption | ||||||
| - **PicoD Runtime Daemon**: Lightweight HTTP daemon replacing SSH inside sandboxes — code execution, file I/O, JWT authentication | ||||||
| - **JWT Security Chain (Router → PicoD)**: RSA-2048 key pair generated at startup; public key distributed via Kubernetes Secret and injected into sandbox pods | ||||||
| - **Dual GC Policy (Idle TTL + Max Duration)**: Background garbage collector enforces both idle timeout and hard maximum session duration | ||||||
| - **Python SDK and AI Framework Integrations**: Out-of-the-box SDK with LangChain and Dify plugin support | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ### Key Feature Details | ||||||
|
|
||||||
| ### Session-Based MicroVM Agent Routing | ||||||
|
|
||||||
| AI agent workloads are fundamentally stateful and interactive. A single agent session may span many invocations — tool calls, environment inspections, multi-step reasoning — all requiring the same isolated execution environment. Kubernetes has no native concept of persistent, identity-bound agent sessions. AgentCube fills this gap by mapping session IDs to dedicated microVM sandbox pods. | ||||||
|
|
||||||
| The Router acts as the data plane entry point. It reads the `x-agentcube-session-id` request header to look up an existing session in the store, or allocates a new sandbox via the Workload Manager when no session exists. Every response carries the `x-agentcube-session-id` header, enabling stateless clients to maintain session continuity across requests. | ||||||
|
|
||||||
| Key Capabilities: | ||||||
|
|
||||||
| - **Session affinity via header**: `x-agentcube-session-id` header maps requests to existing sandbox pods | ||||||
| - **Transparent sandbox allocation**: new sessions trigger automatic sandbox creation with no client-side configuration | ||||||
| - **Reverse proxy with path-prefix matching**: path-based routing to multiple exposed sandbox ports | ||||||
| - **HTTP/2 (h2c) support**: low-latency connections to sandbox endpoints | ||||||
| - **Configurable concurrency limit**: `MaxConcurrentRequests` prevents overload | ||||||
|
|
||||||
| **Router Endpoints:** | ||||||
|
|
||||||
| ``` | ||||||
| POST /v1/namespaces/{namespace}/agent-runtimes/{name}/invocations/*path | ||||||
| GET /v1/namespaces/{namespace}/agent-runtimes/{name}/invocations/*path | ||||||
| POST /v1/namespaces/{namespace}/code-interpreters/{name}/invocations/*path | ||||||
|
YaoZengzeng marked this conversation as resolved.
YaoZengzeng marked this conversation as resolved.
|
||||||
| GET /v1/namespaces/{namespace}/code-interpreters/{name}/invocations/*path | ||||||
| ``` | ||||||
|
YaoZengzeng marked this conversation as resolved.
|
||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ### Agent as First-Class Citizen in Kubernetes | ||||||
|
|
||||||
| Two distinct workload profiles emerge in the AI agent space: conversational/tool-using agents that need access to credentials, volumes, and custom networking; and short-lived code interpreters that require strict isolation and resource caps. Modeling both as first-class Kubernetes CRDs enables declarative configuration, RBAC integration, and GitOps-friendly workflows. | ||||||
|
|
||||||
| **AgentRuntime** (`agentruntimes.runtime.agentcube.volcano.sh`): | ||||||
|
|
||||||
| Designed for general-purpose AI agents. Accepts a full Kubernetes `PodSpec` template, allowing volume mounts, credential injection, sidecar containers, and custom resource requests. | ||||||
|
|
||||||
| - `spec.podTemplate` — full `PodSpec` for sandbox pod | ||||||
| - `spec.targetPort` — list of exposed ports with path prefix, port, and protocol | ||||||
| - `spec.sessionTimeout` — idle session expiry (default: `15m`) | ||||||
| - `spec.maxSessionDuration` — hard maximum session lifetime (default: `8h`) | ||||||
|
|
||||||
| **CodeInterpreter** (`codeinterpreters.runtime.agentcube.volcano.sh`): | ||||||
|
|
||||||
| Designed for secure, multi-tenant code execution. More locked-down than AgentRuntime, with a constrained sandbox template that restricts image, resources, and runtime class. | ||||||
|
|
||||||
| - `spec.template` — `CodeInterpreterSandboxTemplate` (image, imagePullPolicy, resources, runtimeClassName) | ||||||
| - `spec.ports` — list of exposed ports with path prefix | ||||||
| - `spec.sessionTimeout` / `spec.maxSessionDuration` — session lifecycle bounds | ||||||
| - `spec.warmPoolSize` — optional pre-warmed sandbox pool size | ||||||
| - `spec.authMode` — `picod` (default, RSA/JWT) or `none` (delegate auth to sandbox) | ||||||
|
|
||||||
| Alpha Feature Notice: APIs are under active development. Spec fields and default values may change in future releases. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ### Warm Pool for Fast Cold Starts | ||||||
|
|
||||||
| Creating a microVM sandbox from scratch on every session request incurs a cold-start penalty that is unacceptable for interactive workloads. AgentCube introduces a warm pool mechanism: the Workload Manager pre-creates a configurable number of idle `Sandbox` pods and keeps them ready. When an invocation arrives, the Router claims a pre-warmed pod via a `SandboxClaim` CR instead of waiting for a new pod to start. The pool is automatically replenished after each claim. | ||||||
|
||||||
| Creating a microVM sandbox from scratch on every session request incurs a cold-start penalty that is unacceptable for interactive workloads. AgentCube introduces a warm pool mechanism: the Workload Manager pre-creates a configurable number of idle `Sandbox` pods and keeps them ready. When an invocation arrives, the Router claims a pre-warmed pod via a `SandboxClaim` CR instead of waiting for a new pod to start. The pool is automatically replenished after each claim. | |
| Creating a microVM sandbox from scratch on every session request incurs a cold-start penalty that is unacceptable for interactive workloads. AgentCube introduces a warm pool mechanism: the Workload Manager pre-creates a configurable number of idle `Sandbox` pods and keeps them ready. When an invocation arrives, the Router requests sandbox creation from the Workload Manager; when warm pools are enabled, the Workload Manager can satisfy that request by creating a `SandboxClaim` CR for a pre-warmed pod instead of waiting for a new pod to start. The pool is automatically replenished after each claim. |
Copilot
AI
Apr 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The install command sets redis.password to the literal value '''' (four quote characters). If the intent is “no password” or an empty password, this will configure the chart with the wrong password and break connectivity. Prefer --set redis.password="" (empty) or omit the flag entirely, and optionally add a note that users should set it only when Redis requires AUTH.
Copilot
AI
Apr 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This v0.1.0 release doc uses ghcr.io/volcano-sh/picod:latest in the CodeInterpreter example. Using :latest makes the instructions non-reproducible and can diverge from the v0.1.0 behavior over time. Consider pinning the example image tag to v0.1.0 (or the exact release tag you publish).
| image: ghcr.io/volcano-sh/picod:latest | |
| image: ghcr.io/volcano-sh/picod:v0.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heading hierarchy is inconsistent: "### Key Feature Details" is immediately followed by another level-3 heading ("### Session-Based MicroVM Agent Routing"), which makes it look like a sibling section rather than a subsection. Consider making the feature sections level-4 under "Key Feature Details" (or removing the extra "Key Feature Details" heading) so the structure is unambiguous in rendered Markdown/TOC.