Skip to content

Releases: volcano-sh/agentcube

AgentCube v0.1.0

17 Apr 09:34
ff8dcb7

Choose a tag to compare

Summary

AgentCube v0.1.0 is the first official release of AgentCube, a Volcano subproject that extends Kubernetes with native support for AI agent and code interpreter workloads. This release establishes the foundational architecture: a lightweight HTTP reverse proxy (Router) routes agent invocations to per-session microVM sandboxes, while a Workload Manager controls sandbox lifecycle, warm pools, and garbage collection. A minimal runtime daemon (PicoD) replaces SSH inside sandboxes, providing secure code execution, file operations, and JWT-based authentication with zero protocol overhead. Session state is stored in Redis/Valkey, enabling horizontal Router scaling. Two new Kubernetes CRDs — AgentRuntime and CodeInterpreter — are introduced to model agent workloads as first-class Kubernetes resources. A Python SDK, LangChain integration, and Dify plugin are included to make AgentCube immediately usable from popular AI frameworks.

What's New

Key Features Overview

  • Session-Based MicroVM Agent Routing: Stateful request routing with session affinity, backed by isolated microVM sandboxes per session
  • AgentRuntime and CodeInterpreter CRDs: Kubernetes-native abstractions for conversational agent and secure code interpreter workloads
  • Warm Pool for Fast Cold Starts: Pre-warmed sandbox pool support for CodeInterpreter, reducing invocation latency via SandboxClaim adoption
  • PicoD Runtime Daemon: Lightweight HTTP daemon replacing SSH inside sandboxes — code execution, file I/O, JWT authentication
  • JWT Security Chain (Router → PicoD): RSA-2048 key pair generated at startup; public key distributed via Kubernetes Secret and injected into sandbox pods
  • Dual GC Policy (Idle TTL + Max Duration): Background garbage collector enforces both idle timeout and hard maximum session duration
  • Python SDK and AI Framework Integrations: Out-of-the-box SDK with LangChain and Dify plugin support

Session-Based MicroVM Agent Routing

AI agent workloads are fundamentally stateful and interactive. A single agent session may span many invocations — tool calls, environment inspections, multi-step reasoning — all requiring the same isolated execution environment. Kubernetes has no native concept of persistent, identity-bound agent sessions. AgentCube fills this gap by mapping session IDs to dedicated microVM sandbox pods.

The Router acts as the data plane entry point. It reads the x-agentcube-session-id request header to look up an existing session in the store, or allocates a new sandbox via the Workload Manager when no session exists. Every response carries the x-agentcube-session-id header, enabling stateless clients to maintain session continuity across requests.

Key Capabilities:

  • Session affinity via header: x-agentcube-session-id header maps requests to existing sandbox pods
  • Transparent sandbox allocation: new sessions trigger automatic sandbox creation with no client-side configuration
  • Reverse proxy with path-prefix matching: path-based routing to multiple exposed sandbox ports
  • HTTP/2 (h2c) support: low-latency connections to sandbox endpoints
  • Configurable concurrency limit: MaxConcurrentRequests prevents overload

Router Endpoints:

POST /v1/namespaces/{namespace}/agent-runtimes/{name}/invocations/*path
GET  /v1/namespaces/{namespace}/agent-runtimes/{name}/invocations/*path
POST /v1/namespaces/{namespace}/code-interpreters/{name}/invocations/*path
GET  /v1/namespaces/{namespace}/code-interpreters/{name}/invocations/*path

Agent as First-Class Citizen in Kubernetes

Two distinct workload profiles emerge in the AI agent space: conversational/tool-using agents that need access to credentials, volumes, and custom networking; and short-lived code interpreters that require strict isolation and resource caps. Modeling both as first-class Kubernetes CRDs enables declarative configuration, RBAC integration, and GitOps-friendly workflows.

AgentRuntime (agentruntimes.runtime.agentcube.volcano.sh):

Designed for general-purpose AI agents. Accepts a full Kubernetes PodSpec template, allowing volume mounts, credential injection, sidecar containers, and custom resource requests.

  • spec.podTemplate — full PodSpec for sandbox pod
  • spec.targetPort — list of exposed ports with path prefix, port, and protocol
  • spec.sessionTimeout — idle session expiry (default: 15m)
  • spec.maxSessionDuration — hard maximum session lifetime (default: 8h)

CodeInterpreter (codeinterpreters.runtime.agentcube.volcano.sh):

Designed for secure, multi-tenant code execution. More locked-down than AgentRuntime, with a constrained sandbox template that restricts image, resources, and runtime class.

  • spec.templateCodeInterpreterSandboxTemplate (image, imagePullPolicy, resources, runtimeClassName)
  • spec.ports — list of exposed ports with path prefix
  • spec.sessionTimeout / spec.maxSessionDuration — session lifecycle bounds
  • spec.warmPoolSize — optional pre-warmed sandbox pool size
  • spec.authModepicod (default, RSA/JWT) or none (delegate auth to sandbox)

Alpha Feature Notice: APIs are under active development. Spec fields and default values may change in future releases.


Warm Pool for Fast Cold Starts

Creating a microVM sandbox from scratch on every session request incurs a cold-start penalty that is unacceptable for interactive workloads. AgentCube introduces a warm pool mechanism: the Workload Manager pre-creates a configurable number of idle Sandbox pods and keeps them ready. When an invocation arrives, the Router claims a pre-warmed pod via a SandboxClaim CR instead of waiting for a new pod to start. The pool is automatically replenished after each claim.

Key Capabilities:

  • spec.warmPoolSize on CodeInterpreter controls pool depth
  • SandboxTemplate + SandboxClaim pattern delegates pod adoption to the upstream agent-sandbox controller
  • Pool refills asynchronously after each claim, keeping steady-state latency low
  • Cold-start path remains available when pool is exhausted

PicoD — Lightweight Sandbox Runtime Daemon

Traditional code sandbox implementations use SSH to execute commands remotely. SSH carries significant overhead: key management, multiplexing negotiation, and a heavyweight protocol for what are essentially single-request RPCs. PicoD replaces SSH with a minimal HTTP/1.1 daemon that runs inside each sandbox pod, providing code execution, file I/O, and authentication via a small, auditable binary.

Key Capabilities:

  • Code execution (POST /api/execute): runs arbitrary commands with configurable timeout, working directory, and environment variables; returns stdout, stderr, exit code, and wall-clock duration
  • File upload / write (POST /api/files): supports multipart form-data and JSON/base64 content for workspace-scoped file creation and updates
  • File download / read (GET /api/files/*path): streams files from the sandbox workspace using path-addressed operations
  • Health check (GET /health): exposes an unauthenticated liveness endpoint
  • JWT authentication: validates RS256 tokens from the Router; rejects unauthenticated requests
  • Path sanitization: all paths are jailed to the configured workspace root, preventing directory traversal
  • 32 MB request body limit with configurable workspace root via --workspace flag

JWT Security Chain (Router → PicoD)

Sandbox pods are ephemeral and may be replaced at any time; embedding a shared secret in cluster config is fragile and hard to rotate. AgentCube establishes an RSA-based trust chain: the Router generates an RSA-2048 key pair at startup, stores the public key in a Kubernetes Secret (picod-router-identity), and the Workload Manager injects it as PICOD_AUTH_PUBLIC_KEY for CodeInterpreter sandboxes when authentication is enabled (the default is picod; none disables injection). The Router signs short-lived (5-minute) RS256 JWTs for every proxied request. PicoD verifies these tokens entirely in-process — no network round-trip, no shared database.

Key Capabilities:

  • RSA-2048 key pair auto-generated at Router startup
  • Public key distributed via picod-router-identity Kubernetes Secret
  • Workload Manager injects public key into CodeInterpreter sandbox env when spec.authMode is not none
  • 5-minute token expiry limits blast radius of token leakage
  • PicoD rejects any request without a valid Router-issued JWT

Sandbox Lifecycle Management and GC

Agent sessions that complete their work or are abandoned by clients must be automatically reclaimed to avoid resource exhaustion. AgentCube implements a dual garbage collection policy enforced by a background loop in the Workload Manager:

  • Idle timeout: sandboxes inactive beyond spec.sessionTimeout (default 15m) are deleted
  • Hard max TTL: sandboxes older than spec.maxSessionDuration (default 8h) are deleted regardless of activity

Key Capabilities:

  • Configurable GC interval in the Workload Manager
  • UpdateSessionLastActivity store operation to reset idle timer on each invocation
  • ListExpiredSandboxes and ListInactiveSandboxes store queries feed the GC loop
  • Workload Manager deletes Sandbox / SandboxClaim CRs and removes store records atomically

Other Notable Changes

Features and Enhancements

Read more

v0.1.0-rc.0

17 Apr 08:54
ff8dcb7

Choose a tag to compare

v0.1.0-rc.0 Pre-release
Pre-release

version v0.1.0-rc.0

v0.1.0-alpha.0

15 Jan 04:08
4284f88

Choose a tag to compare

v0.1.0-alpha.0 Pre-release
Pre-release
Merge pull request #139 from hzxuzhonghu/cleanup

Workload manager Cleanup