Skip to content

WIP:feat: add E2B API compatibility layer and Templates API support#275

Open
MahaoAlex wants to merge 3 commits intovolcano-sh:mainfrom
MahaoAlex:feature/e2b-api
Open

WIP:feat: add E2B API compatibility layer and Templates API support#275
MahaoAlex wants to merge 3 commits intovolcano-sh:mainfrom
MahaoAlex:feature/e2b-api

Conversation

@MahaoAlex
Copy link
Copy Markdown
Contributor

@MahaoAlex MahaoAlex commented Apr 14, 2026

/kind feature

What this PR does / why we need it:

This PR introduces full E2B API compatibility to AgentCube, enabling seamless migration from E2B services. It consists of two major parts:

  1. E2B API Compatibility Layer - Adds E2B-compatible REST API endpoints to the Router for
    sandbox lifecycle management, including creating, listing, getting details, deleting,
    setting timeout, and refreshing sandboxes. Implements Kubernetes Informer-based API key
    authentication with real-time cache updates, rate limiting, and local cache fallback.
  2. E2B Templates API Support - Adds full Templates API compatibility with CRUD operations
    for templates, template builds lifecycle management, aliases support, and public/private
    visibility controls. Maps templates to AgentCube CRDs (CodeInterpreter/AgentRuntime).

Which issue(s) this PR fixes:
Fixes #257

Special notes for your reviewer:

  • The E2B API implementation is located under pkg/router/e2b/.
  • API key authentication uses a K8s Secret informer with a 5-minute background refresh
    fallback.
  • Template API maps E2B templates to existing CodeInterpreter/AgentRuntime CRDs.
  • Please review the test coverage in pkg/router/e2b/*_test.go and
    test/e2e/templates_test.go.

Does this PR introduce a user-facing change?:
Add E2B API compatibility layer and Templates API support to AgentCube Router, enabling E2B
SDK users to migrate seamlessly.

This commit adds architecture design proposals for E2B API compatibility:

- docs/design/e2b-api-architecture.md - E2B API Phase 1 architecture design
- docs/architecture/e2b-integration.md - E2B integration architecture diagrams

These documents describe the architecture and design decisions for
implementing E2B-compatible REST API and Templates management in AgentCube.

Signed-off-by: MahaoAlex <alexmahao319@gmail.com>
This commit adds E2B-compatible REST API support to AgentCube Router,
enabling seamless migration from E2B services.

Features:
- POST /sandboxes - Create sandbox from template
- GET /sandboxes - List all running sandboxes
- GET /sandboxes/{id} - Get sandbox details
- DELETE /sandboxes/{id} - Delete sandbox
- POST /sandboxes/{id}/timeout - Set sandbox timeout
- POST /sandboxes/{id}/refreshes - Refresh sandbox keepalive

Implementation:
- pkg/router/e2b/ - E2B API handlers and models
- 80.9% test coverage with comprehensive unit tests
- 39 black-box tests for API, lifecycle, and concurrency

Documentation:
- docs/api/e2b-openapi.yaml - OpenAPI 3.0 spec
- docs/api-comparison.md - API comparison with E2B
- docs/developer-guide/e2b-implementation.md - Developer guide
- docs/tutorials/e2b-api-guide.md - User guide

CI/CD:
- .github/workflows/e2b-api.yml - E2B API integration tests
- .github/workflows/code-quality.yml - Code quality checks
- hack/setup-kind-cluster.sh - Kind cluster setup
- hack/setup-local-env.sh - Local dev environment setup

Kubernetes Informer-based API key authenticatio:
- Real-time cache updates via K8s Secret informer events
- Rate limiting (1/sec) to prevent brute-force amplification
- Background refresh (5 min interval) as fallback
- Local cache serving when K8s API is unavailable

Signed-off-by: MahaoAlex <alexmahao319@gmail.com>
Copilot AI review requested due to automatic review settings April 14, 2026 07:54
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an E2B-compatible API layer to AgentCube, enabling seamless migration from E2B services by supporting sandbox lifecycle management and template operations. It includes new API handlers, models, and authentication mechanisms, along with comprehensive E2E tests. My feedback highlights opportunities to simplify error handling logic in the router, decouple informer initialization for better testability, and refactor the wildcard routing logic in the templates handler to improve maintainability.

Comment thread pkg/router/handlers.go
Comment on lines +157 to +162
// Return 500 for invalid endpoint URL errors, 404 for not found errors
status := http.StatusNotFound
if strings.Contains(err.Error(), "invalid endpoint URL") {
status = http.StatusInternalServerError
}
c.JSON(status, gin.H{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error handling logic here is overly complex for a simple status code mapping. It is better to map the error type directly to the status code using a helper function or a map to improve readability and maintainability.

Comment thread pkg/router/e2b/auth.go
Comment on lines +357 to +393
func (a *Authenticator) InitializeInformer() error {
a.mu.Lock()
defer a.mu.Unlock()

if a.k8sClient == nil {
return fmt.Errorf("kubernetes client is nil, cannot initialize informer")
}

if a.informer != nil {
klog.V(2).Info("Informer already initialized, skipping")
return nil
}

// Create informer factory with namespace restriction
factory := informers.NewSharedInformerFactoryWithOptions(
a.k8sClient,
10*time.Minute,
informers.WithNamespace(a.config.APIKeySecretNamespace),
)

// Create secret informer filtered by secret name
secretInformer := factory.Core().V1().Secrets().Informer()

// Add event handlers for Secret changes
_, err := secretInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: a.onSecretAdd,
UpdateFunc: a.onSecretUpdate,
DeleteFunc: a.onSecretDelete,
})
if err != nil {
return fmt.Errorf("failed to add event handler to informer: %w", err)
}

a.informer = secretInformer
klog.V(2).InfoS("Informer initialized", "namespace", a.config.APIKeySecretNamespace, "secret", a.config.APIKeySecret)
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The informer initialization logic is tightly coupled with the server's configuration. Consider moving the informer setup to a factory or a dedicated initialization function to improve testability and modularity.

Comment on lines +811 to +863
func (s *Server) handleTemplateWildcard(c *gin.Context) {
path := c.Param("path")
path = strings.TrimPrefix(path, "/")

// Extract template ID and sub-path
var templateID string
var subPath string

// Check for builds sub-path
if idx := strings.Index(path, "/builds"); idx != -1 {
templateID = path[:idx]
subPath = path[idx:]
} else {
templateID = path
}

// Set the id parameter for downstream handlers
c.Params = gin.Params{{Key: "id", Value: templateID}}

// Route based on sub-path and HTTP method
switch subPath {
case "/builds":
if c.Request.Method == "GET" {
s.handleListTemplateBuilds(c)
} else if c.Request.Method == "POST" {
s.handleBuildTemplate(c)
}
case "":
// Regular template operations
switch c.Request.Method {
case "GET":
s.handleGetTemplate(c)
case "PATCH":
s.handleUpdateTemplate(c)
case "DELETE":
s.handleDeleteTemplate(c)
default:
respondWithError(c, ErrInvalidRequest, "method not allowed")
}
default:
// Check for /builds/{buildId} pattern
if strings.HasPrefix(subPath, "/builds/") {
buildID := strings.TrimPrefix(subPath, "/builds/")
c.Params = gin.Params{
{Key: "id", Value: templateID},
{Key: "buildId", Value: buildID},
}
s.handleGetTemplateBuild(c)
} else {
respondWithError(c, ErrNotFound, "path not found")
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The wildcard handler is doing too much routing logic. This should be refactored into a more structured routing approach, perhaps by defining specific routes for builds and template operations separately, to avoid complex string parsing and manual path manipulation.

@MahaoAlex MahaoAlex changed the title Feature/e2b api feat: add E2B API compatibility layer and Templates API support Apr 14, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 14, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 58.57605% with 512 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.20%. Comparing base (845b798) to head (6c29ee7).
⚠️ Report is 163 commits behind head on main.

Files with missing lines Patch % Lines
pkg/router/e2b/templates_handlers.go 50.20% 216 Missing and 30 partials ⚠️
pkg/router/e2b/auth.go 48.31% 169 Missing and 15 partials ⚠️
pkg/router/e2b/handlers.go 69.29% 25 Missing and 14 partials ⚠️
pkg/router/e2b/e2b_server.go 62.85% 18 Missing and 8 partials ⚠️
pkg/router/e2b/mapper.go 87.17% 5 Missing and 5 partials ⚠️
pkg/router/handlers.go 76.92% 2 Missing and 1 partial ⚠️
pkg/router/e2b/errors.go 91.30% 1 Missing and 1 partial ⚠️
pkg/router/server.go 71.42% 1 Missing and 1 partial ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #275       +/-   ##
===========================================
+ Coverage   35.60%   48.20%   +12.59%     
===========================================
  Files          29       38        +9     
  Lines        2533     3840     +1307     
===========================================
+ Hits          902     1851      +949     
- Misses       1505     1791      +286     
- Partials      126      198       +72     
Flag Coverage Δ
unittests 48.20% <58.57%> (+12.59%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an E2B-compatible API surface to the AgentCube Router (templates + sandboxes) and augments CI/e2e coverage and docs so E2B SDK users can migrate to AgentCube with minimal changes.

Changes:

  • Register E2B-compatible /templates and /sandboxes routes in the Router and add E2B models/helpers (mapper, rate limiter, error mapping).
  • Add/extend E2E + black-box tests (Go + optional Python E2B SDK) and update scripts/workflows to run them.
  • Add Templates API and E2B API documentation, examples, and Helm/RBAC adjustments for deployment.

Reviewed changes

Copilot reviewed 51 out of 53 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
test/e2e/test_templates.yaml Adds CodeInterpreter CRs used as templates for E2E coverage.
test/e2e/test_e2b_sdk.py Adds Python E2B SDK-based compatibility E2E tests.
test/e2e/run_e2e.sh Updates e2e runner to configure E2B API keys and run tagged e2e tests + optional SDK tests.
test/e2e/e2e_test.go Adds e2e build tag and minor constant refactor for CI control.
test/e2e/README_E2B_SDK.md Documents how to run the E2B SDK compatibility tests.
test/e2b/lifecycle_test.go Adds black-box lifecycle tests for the E2B sandbox API.
test/e2b/TEST_REPORT.md Adds test report documentation for E2B API black-box tests.
test/e2b/TEST_PLAN.md Adds test plan documentation for E2B API black-box tests.
pkg/workloadmanager/client_cache_test.go Minor formatting change in constants alignment.
pkg/workloadmanager/auth_test.go Minor formatting/indentation cleanup in tests.
pkg/router/server.go Registers E2B route group at root and wires it into the Router.
pkg/router/jwt_test.go Makes RSA private-key equality assertion resilient across Go versions.
pkg/router/handlers_test.go Switches Router tests to use miniredis instead of a real Redis endpoint.
pkg/router/handlers.go Improves endpoint URL parsing and adjusts error status selection.
pkg/router/e2b/templates_models_test.go Adds unit tests for Templates API models serialization/validation.
pkg/router/e2b/templates_models.go Adds Templates API request/response models and validation helpers.
pkg/router/e2b/ratelimiter_test.go Adds tests for a token-bucket rate limiter used by auth/cache logic.
pkg/router/e2b/ratelimiter.go Implements an in-memory token-bucket rate limiter.
pkg/router/e2b/models.go Adds E2B sandbox API request/response models and error codes.
pkg/router/e2b/mapper_test.go Adds tests for mapping between AgentCube SandboxInfo and E2B models.
pkg/router/e2b/mapper.go Implements mapping logic and template-id validation/expiry helpers.
pkg/router/e2b/handlers.go Implements E2B sandbox endpoints: create/list/get/delete/timeout/refresh.
pkg/router/e2b/errors_test.go Adds unit tests for E2B error mapping + Gin error responses.
pkg/router/e2b/errors.go Implements E2B error formatting and error-to-code mapping.
pkg/router/e2b/e2b_server.go Adds an E2B server that registers sandbox + template routes and auth middleware.
manifests/charts/base/values.yaml Adds templates-related Helm values (currently not wired into templates).
manifests/charts/base/templates/rbac-router.yaml Expands Router RBAC rules for runtime API resources.
hack/setup-local-env.sh Adds a local dev helper to stand up Kind + AgentCube + port-forwards.
hack/setup-kind-cluster.sh Adds a Kind cluster setup script with agent-sandbox + Redis provisioning.
go.mod Adds an indirect dependency (objx) pulled by the new test stack.
examples/templates/python_sdk_example.py Adds a Python example for using Templates API via an SDK.
examples/templates/list_templates.sh Adds a curl example for listing templates.
examples/templates/create_template.sh Adds a curl example for creating a template and polling state.
docs/tutorials/e2b-api-guide.md Adds an E2B API usage guide (Chinese) with curl/SDK examples.
docs/api/templates-api.md Adds Templates API reference documentation.
docs/api/e2b-openapi.yaml Adds an OpenAPI document for the E2B-compatible sandbox API.
docs/api-comparison.md Adds/updates a comparison doc positioning AgentCube vs E2B/OpenKruise.
README.md Updates README to mention E2B compatibility + adds feature tables/quick example.
.github/workflows/templates-api-tests.yml Adds a workflow to run Templates-specific unit/integration tests.
.github/workflows/e2e.yml Updates e2e workflow to set E2B API key env/config before running e2e.
.github/workflows/e2b-api.yml Adds a dedicated Kind-based workflow for E2B API + SDK compatibility testing.
.github/workflows/code-quality.yml Adds a comprehensive code-quality workflow (fmt/vet/lint/tests/build/tidy/generate).

Comment on lines +88 to +94
// List sandboxes using store client
// We use ListExpiredSandboxes with a future time to get all active sandboxes
ctx := c.Request.Context()
futureTime := time.Now().Add(365 * 24 * time.Hour) // Far future to get all sandboxes

sandboxes, err := s.storeClient.ListExpiredSandboxes(ctx, futureTime, 1000)
if err != nil {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleListSandboxes is using ListExpiredSandboxes(time.Now()+365d) to approximate “list all sandboxes”. This will include already-expired sessions and will also silently omit sessions with ExpiresAt beyond that arbitrary window. Consider adding a store method to list active sandboxes (e.g., ZRANGEBYSCORE expiryIndexKey from now..+inf) and use that here so the endpoint matches “running sandboxes” semantics.

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +205
// Calculate new expiration time from now
newExpiresAt := time.Now().Add(time.Duration(req.Timeout) * time.Second)
sandbox.ExpiresAt = newExpiresAt

// Update sandbox
if err := s.storeClient.UpdateSandbox(ctx, sandbox); err != nil {
klog.Errorf("failed to update sandbox timeout: %v", err)
respondWithError(c, ErrInternal, "failed to set timeout")
return
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleSetTimeout updates ExpiresAt and persists via UpdateSandbox, but UpdateSandbox doesn’t update the expiry index (ZSet). This means the new timeout won’t be reflected in expiration/GC behavior. Please update the expiry index together with the stored object (likely via a new store API that does both atomically).

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +54
// Parse template_id to namespace/name
namespace, name := parseTemplateID(req.TemplateID)

klog.Infof("creating sandbox: template=%s, namespace=%s, clientID=%s, timeout=%d",
req.TemplateID, namespace, clientID, req.Timeout)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleCreateSandbox parses template_id but never validates its format, even though validateTemplateID exists and other template endpoints use it. Without validation, template_id values with extra slashes or invalid characters can flow into K8s names and fail in confusing ways. Consider validating template_id up-front and returning ErrInvalidRequest on bad input.

Copilot uses AI. Check for mistakes.
resources: ["secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["runtime.agentcube.volcano.sh"]
resources: ["codeinterpreterbuilds", "agentruntimebuilds"]
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The router E2B Templates handlers create/read CodeInterpreter CRDs, but this Role grants access only to "codeinterpreterbuilds"/"agentruntimebuilds". Those resources don’t exist in this repo’s runtime API types, and the router will still lack permissions for "codeinterpreters"/"agentruntimes" (and lists/watches) needed for template CRUD. Please adjust the RBAC resources to the actual CRDs being accessed.

Suggested change
resources: ["codeinterpreterbuilds", "agentruntimebuilds"]
resources: ["codeinterpreters", "agentruntimes"]

Copilot uses AI. Check for mistakes.
Comment thread pkg/router/server.go
Comment on lines +150 to +156
// E2B API routes (templates, sandboxes) - registered at root level for E2B compatibility
// These routes handle their own authentication via API key middleware
e2bGroup := s.engine.Group("")
e2bGroup.Use(gin.Logger())
e2bGroup.Use(gin.Recovery())
// Apply concurrency limiting to E2B routes as well
e2bGroup.Use(s.concurrencyLimitMiddleware())
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrencyLimitMiddleware() allocates a new semaphore channel each time it’s called. By applying it to both v1 and E2B groups, the effective global concurrency cap becomes ~2× MaxConcurrentRequests (one pool per group). If the intent is a single global limit, store the semaphore on Server and reuse the same middleware instance across groups.

Copilot uses AI. Check for mistakes.
Comment on lines +236 to +244
// If timeout is provided, extend expiration time
if req.Timeout > 0 {
sandbox.ExpiresAt = time.Now().Add(time.Duration(req.Timeout) * time.Second)
if err := s.storeClient.UpdateSandbox(ctx, sandbox); err != nil {
klog.Errorf("failed to update sandbox on refresh: %v", err)
respondWithError(c, ErrInternal, "failed to refresh sandbox")
return
}
klog.Infof("sandbox refreshed with timeout: sandboxID=%s, timeout=%d", sandboxID, req.Timeout)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refresh semantics don’t match the API docs/comments: when a timeout is provided, this sets ExpiresAt = now + timeout rather than extending from the current expiration. For a true “refresh/keep-alive”, it should usually add to max(now, current ExpiresAt) so repeated refreshes monotonically extend the lifetime.

Copilot uses AI. Check for mistakes.
Comment thread pkg/router/handlers.go
Comment on lines +157 to +162
// Return 500 for invalid endpoint URL errors, 404 for not found errors
status := http.StatusNotFound
if strings.Contains(err.Error(), "invalid endpoint URL") {
status = http.StatusInternalServerError
}
c.JSON(status, gin.H{
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic determines HTTP status by substring-matching err.Error() ("invalid endpoint URL"). That’s brittle and couples behavior to error text. Prefer a sentinel/typed error (e.g., var ErrInvalidEndpointURL) or wrapping with errors.Is, then branch on errors.Is(err, ErrInvalidEndpointURL) to decide between 404 vs 500.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +42
-d '{
"name": "python-data-science",
"description": "Python template with data science libraries",
"public": true,
"aliases": ["datascience", "py-ds"],
"memoryMB": 4096,
"cpuCount": 2,
"dockerfile": "FROM python:3.11-slim

WORKDIR /app

# Install data science libraries
RUN pip install --no-cache-dir \
pandas \
numpy \
matplotlib \
scikit-learn \
jupyter

# Set up working directory
COPY . /app/

CMD [\"python\"]",
"startCommand": "python"
}')

echo "Response: $TEMPLATE_RESPONSE"

# Extract template ID from response
TEMPLATE_ID=$(echo "$TEMPLATE_RESPONSE" | grep -o '"templateID":"[^"]*"' | cut -d'"' -f4)

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON field names in this example (memoryMB, cpuCount, startCommand) don’t match the server’s Templates API models, which use snake_case (memory_mb, vcpu_count, start_command). Also the response parsing looks for "templateID" but the server returns "template_id". As written, this script will fail against the current implementation—please align the field names with the API.

Copilot uses AI. Check for mistakes.
Comment thread docs/api/e2b-openapi.yaml
Comment on lines +257 to +294
NewSandbox:
type: object
description: 创建沙箱的请求
properties:
templateID:
type: string
description: 模板 ID(格式:namespace/name 或 name)
example: default/code-interpreter
timeout:
type: integer
description: 沙箱超时时间(秒)
default: 900
minimum: 60
maximum: 86400
metadata:
type: object
additionalProperties: true
description: 自定义元数据
envVars:
type: object
additionalProperties:
type: string
description: 环境变量(Phase 1 不支持)
autoPause:
type: boolean
description: 自动暂停(Phase 1 不支持,必须为 false)
default: false
allowInternetAccess:
type: boolean
description: 允许互联网访问(Phase 1 不支持)
default: false
secure:
type: boolean
description: 安全模式
default: false
required:
- templateID

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenAPI schema and examples here use camelCase (templateID/clientID/sandboxID, envVars, autoPause), but the router’s E2B models use snake_case JSON tags (template_id/client_id/sandbox_id, env_vars, auto_pause). Please align the OpenAPI spec field names (and defaults like timeout) with the actual implementation so the spec is usable.

Copilot uses AI. Check for mistakes.
# Run template-specific E2E tests with API key
export ROUTER_URL=http://localhost:8081
export E2B_API_KEY=${{ env.E2B_API_KEY }}
go test -v ./test/e2e/... -run "Template" -timeout 10m
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration test step runs go test ./test/e2e/... without -tags e2e, but test/e2e/e2e_test.go now has //go:build e2e and will be excluded. This can make the workflow pass while running no meaningful E2E tests. Add -tags e2e (consistent with run_e2e.sh) to ensure the intended tests execute.

Suggested change
go test -v ./test/e2e/... -run "Template" -timeout 10m
go test -tags e2e -v ./test/e2e/... -run "Template" -timeout 10m

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@acsoto acsoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still WIP?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the check that already exists in the main branch

Copy link
Copy Markdown
Contributor Author

@MahaoAlex MahaoAlex Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still work in progcess, I’ll check

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps with e2e

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also need to be removed

Comment thread test/e2b/TEST_PLAN.md
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem to be a lot of unnecessary files

Comment thread test/e2b/TEST_REPORT.md
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above

@MahaoAlex MahaoAlex changed the title feat: add E2B API compatibility layer and Templates API support WIP:feat: add E2B API compatibility layer and Templates API support Apr 14, 2026
Copilot AI review requested due to automatic review settings April 14, 2026 08:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 51 out of 53 changed files in this pull request and generated 8 comments.

Comment on lines +544 to +569
## Phase 1 限制说明

当前 E2B API 实现为 Phase 1,具有以下限制:

### 支持的特性

- [OK] 沙箱生命周期管理(创建、获取、列出、删除)
- [OK] 超时设置和刷新
- [OK] API Key 认证
- [OK] 基础元数据支持

### 不支持的特性(Phase 2+)

| 特性 | 状态 | 说明 |
|------|------|------|
| `/templates/*` API | [不支持] | 模板管理 API |
| `/sandboxes/{id}/metrics` | [不支持] | 指标监控 API |
| `/sandboxes/{id}/logs` | [不支持] | 日志流 API |
| `/snapshots/*` API | [不支持] | 快照管理 API |
| `/volumes/*` API | [不支持] | 卷管理 API |
| Pause/Resume | [不支持] | 沙箱暂停/恢复功能 |
| Auto Pause | [不支持] | 自动暂停功能(请求时会返回错误) |
| Network 配置 | [不支持] | 网络访问配置 |
| Volume Mounts | [不支持] | 卷挂载 |
| Environment Variables | [不支持] | 环境变量注入 |

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Phase 1 限制说明” section states that /templates/* is not supported, but this PR adds Templates API routes under pkg/router/e2b and registers them in the router. This creates contradictory guidance for users. Please update the Phase 1 limitations list (and any examples) to reflect the current supported endpoints.

Copilot uses AI. Check for mistakes.
Comment thread test/e2b/e2b_test.go
Comment on lines +550 to +579
// handleDeleteSandbox handles DELETE /sandboxes/{id}
func (ts *TestServer) handleDeleteSandbox(c *gin.Context) {
id := c.Param("id")

ts.mu.Lock()
sb, exists := ts.sandboxes[id]
if !exists {
ts.mu.Unlock()
c.JSON(http.StatusNotFound, ErrorResponse{
Error: "not_found",
Code: "SANDBOX_NOT_FOUND",
Message: "Sandbox not found: " + id,
})
return
}

delete(ts.sandboxes, id)
ts.mu.Unlock()

// Also delete from store
if err := ts.Store.DeleteSandboxBySessionID(c.Request.Context(), sb.SessionID); err != nil {
// Log error but don't fail the request
// In real implementation, use proper logging
_ = err
}

c.JSON(http.StatusOK, SuccessResponse{
Message: "Sandbox deleted successfully",
})
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/e2b is presented as “black-box” E2B API tests, but NewTestServer defines its own in-memory Gin handlers (not the router’s pkg/router/e2b implementation) and the tests exercise those local handlers. This means the suite can pass even if the real router implementation is broken. If the intent is to validate AgentCube’s E2B compatibility, consider turning these into integration tests that hit the actual router (e.g., start pkg/router with a mock store/session manager, or run against a real router instance via HTTP).

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +205
// Set timeout if specified
if req.Timeout > 0 {
sandbox.ExpiresAt = CalculateExpiry(req.Timeout)
if err := s.storeClient.UpdateSandbox(c.Request.Context(), sandbox); err != nil {
klog.Errorf("failed to update sandbox timeout: %v", err)
}
}

// Convert to E2B response
response := s.mapper.ToE2BSandbox(sandbox, clientID)

klog.Infof("sandbox created successfully: sandboxID=%s", sandbox.SessionID)
c.JSON(http.StatusCreated, response)
}

// handleListSandboxes handles GET /sandboxes - List all sandboxes
func (s *Server) handleListSandboxes(c *gin.Context) {
// Get client ID from context
clientID := c.GetString("client_id")
if clientID == "" {
clientID = defaultClientID
}

// List sandboxes using store client
// We use ListExpiredSandboxes with a future time to get all active sandboxes
ctx := c.Request.Context()
futureTime := time.Now().Add(365 * 24 * time.Hour) // Far future to get all sandboxes

sandboxes, err := s.storeClient.ListExpiredSandboxes(ctx, futureTime, 1000)
if err != nil {
klog.Errorf("failed to list sandboxes: %v", err)
respondWithError(c, ErrInternal, "failed to list sandboxes")
return
}

// Convert to E2B response
response := make([]ListedSandbox, 0, len(sandboxes))
for _, sandbox := range sandboxes {
response = append(response, *s.mapper.ToE2BListedSandbox(sandbox, clientID))
}

klog.V(4).Infof("listed %d sandboxes", len(response))
c.JSON(http.StatusOK, response)
}

// handleGetSandbox handles GET /sandboxes/{id} - Get sandbox details
func (s *Server) handleGetSandbox(c *gin.Context) {
sandboxID := c.Param("id")
if sandboxID == "" {
respondWithError(c, ErrInvalidRequest, "sandbox id is required")
return
}

// Get client ID from context
clientID := c.GetString("client_id")
if clientID == "" {
clientID = defaultClientID
}

// Get sandbox from store
sandbox, err := s.storeClient.GetSandboxBySessionID(c.Request.Context(), sandboxID)
if err != nil {
handleStoreError(c, err)
return
}

// Convert to E2B response
response := s.mapper.ToE2BSandboxDetail(sandbox, clientID)

klog.V(4).Infof("retrieved sandbox: sandboxID=%s", sandboxID)
c.JSON(http.StatusOK, response)
}

// handleDeleteSandbox handles DELETE /sandboxes/{id} - Delete a sandbox
func (s *Server) handleDeleteSandbox(c *gin.Context) {
sandboxID := c.Param("id")
if sandboxID == "" {
respondWithError(c, ErrInvalidRequest, "sandbox id is required")
return
}

ctx := c.Request.Context()

// First get the sandbox to find the session ID
sandbox, err := s.storeClient.GetSandboxBySessionID(ctx, sandboxID)
if err != nil {
handleStoreError(c, err)
return
}

// Delete sandbox by session ID
if err := s.storeClient.DeleteSandboxBySessionID(ctx, sandbox.SessionID); err != nil {
klog.Errorf("failed to delete sandbox: %v", err)
respondWithError(c, ErrInternal, "failed to delete sandbox")
return
}

klog.Infof("sandbox deleted successfully: sandboxID=%s, sessionID=%s", sandboxID, sandbox.SessionID)
c.Status(http.StatusNoContent)
}

// handleSetTimeout handles POST /sandboxes/{id}/timeout - Set sandbox timeout
func (s *Server) handleSetTimeout(c *gin.Context) {
sandboxID := c.Param("id")
if sandboxID == "" {
respondWithError(c, ErrInvalidRequest, "sandbox id is required")
return
}

var req TimeoutRequest
if err := c.ShouldBindJSON(&req); err != nil {
klog.Errorf("failed to bind request body: %v", err)
respondWithError(c, ErrInvalidRequest, "invalid request body")
return
}

// Validate timeout
if req.Timeout <= 0 {
respondWithError(c, ErrInvalidRequest, "timeout must be greater than 0")
return
}

ctx := c.Request.Context()

// Get the sandbox
sandbox, err := s.storeClient.GetSandboxBySessionID(ctx, sandboxID)
if err != nil {
handleStoreError(c, err)
return
}

// Calculate new expiration time from now
newExpiresAt := time.Now().Add(time.Duration(req.Timeout) * time.Second)
sandbox.ExpiresAt = newExpiresAt

// Update sandbox
if err := s.storeClient.UpdateSandbox(ctx, sandbox); err != nil {
klog.Errorf("failed to update sandbox timeout: %v", err)
respondWithError(c, ErrInternal, "failed to set timeout")
return
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In handleCreateSandbox / handleSetTimeout / handleRefreshSandbox, the code updates sandbox.ExpiresAt and then calls storeClient.UpdateSandbox(...). However pkg/store’s UpdateSandbox explicitly does not update the expiry ZSET index (session:expiry), so TTL extensions won’t be reflected in GC / expiry-based queries. This will make /sandboxes/{id}/timeout and refresh-with-timeout appear to succeed but not actually extend lifetime. Consider adding a store method that updates both the sandbox object and the expiry index (e.g., UpdateSandboxExpiry / UpdateSandboxAndIndexes) and use that here, or update the store implementation so UpdateSandbox updates the expiry/last-activity indices when those fields change.

Copilot uses AI. Check for mistakes.
Comment thread pkg/router/server.go
Comment on lines 102 to +160
// concurrencyLimitMiddleware limits the number of concurrent requests
func (s *Server) concurrencyLimitMiddleware() gin.HandlerFunc {
concurrency := make(chan struct{}, s.config.MaxConcurrentRequests)
return func(c *gin.Context) {
// Try to acquire a slot in the semaphore
select {
case concurrency <- struct{}{}:
// Successfully acquired a slot, continue processing
defer func() {
// Release the slot when done
<-concurrency
}()
c.Next()
default:
// No slots available, return 503 Service Unavailable
c.JSON(http.StatusTooManyRequests, gin.H{
"error": "server overloaded, please try again later",
"code": "SERVER_OVERLOADED",
})
c.Abort()
}
}
}

// setupRoutes configures HTTP routes using Gin
func (s *Server) setupRoutes() {
s.engine = gin.New()

// Health check endpoints (no authentication required, no concurrency limit)
s.engine.GET("/health/live", s.handleHealthLive)
s.engine.GET("/health/ready", s.handleHealthReady)

// API v1 routes with concurrency limiting
v1 := s.engine.Group("/v1")
// Add middleware
v1.Use(gin.Logger())
v1.Use(gin.Recovery())

v1.Use(s.concurrencyLimitMiddleware()) // Apply concurrency limit to API routes

// Agent invoke requests (support GET/POST, since downstream uses these methods)
v1.GET("/namespaces/:namespace/agent-runtimes/:name/invocations/*path", s.handleAgentInvoke)
v1.POST("/namespaces/:namespace/agent-runtimes/:name/invocations/*path", s.handleAgentInvoke)

// Code interpreter invoke requests (support GET/POST, since downstream uses GET for file download)
v1.GET("/namespaces/:namespace/code-interpreters/:name/invocations/*path", s.handleCodeInterpreterInvoke)
v1.POST("/namespaces/:namespace/code-interpreters/:name/invocations/*path", s.handleCodeInterpreterInvoke)

// E2B API routes (templates, sandboxes) - registered at root level for E2B compatibility
// These routes handle their own authentication via API key middleware
e2bGroup := s.engine.Group("")
e2bGroup.Use(gin.Logger())
e2bGroup.Use(gin.Recovery())
// Apply concurrency limiting to E2B routes as well
e2bGroup.Use(s.concurrencyLimitMiddleware())

// Initialize E2B server (registers /templates and /sandboxes routes)
_, err := e2b.NewServer(e2bGroup, s.storeClient, s.sessionManager)
if err != nil {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrencyLimitMiddleware() allocates a new semaphore channel each time it’s called. Since it’s applied separately to the /v1 group and the root E2B group, the router effectively allows up to 2 * MaxConcurrentRequests concurrent requests (one limit per group), which defeats the intent of a global cap. Consider creating the semaphore once on Server (e.g., s.concurrencySem) and returning a handler that uses that shared semaphore so the limit is enforced across all routes.

Copilot uses AI. Check for mistakes.
Comment on lines +225 to +231
for _, tt := range tests {
t.Run(string(rune(tt.timeout)), func(t *testing.T) {
result := CalculateExpiry(tt.timeout)
// Check that the result is within a reasonable time window
expectedTime := time.Now().Add(tt.expected)
assert.WithinDuration(t, expectedTime, result, time.Second)
})
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestCalculateExpiry uses t.Run(string(rune(tt.timeout)), ...) as the subtest name. Converting an int timeout to a rune produces non-printable/duplicate names (e.g., 60 becomes '<'), which makes test output confusing and can collide. Use a stable string like fmt.Sprintf("timeout_%d", tt.timeout) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +42
-d '{
"name": "python-data-science",
"description": "Python template with data science libraries",
"public": true,
"aliases": ["datascience", "py-ds"],
"memoryMB": 4096,
"cpuCount": 2,
"dockerfile": "FROM python:3.11-slim

WORKDIR /app

# Install data science libraries
RUN pip install --no-cache-dir \
pandas \
numpy \
matplotlib \
scikit-learn \
jupyter

# Set up working directory
COPY . /app/

CMD [\"python\"]",
"startCommand": "python"
}')

echo "Response: $TEMPLATE_RESPONSE"

# Extract template ID from response
TEMPLATE_ID=$(echo "$TEMPLATE_RESPONSE" | grep -o '"templateID":"[^"]*"' | cut -d'"' -f4)

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script uses camelCase field names (memoryMB, cpuCount, startCommand) and parses templateID from the response, but the Templates API models in pkg/router/e2b/templates_models.go use snake_case JSON (memory_mb, vcpu_count, start_command, template_id). As written, the request won’t bind as intended and TEMPLATE_ID extraction will fail. Update the payload keys and the response parsing to match the actual API field names (or adjust the API to accept both formats if compatibility requires it).

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +97
// List sandboxes using store client
// We use ListExpiredSandboxes with a future time to get all active sandboxes
ctx := c.Request.Context()
futureTime := time.Now().Add(365 * 24 * time.Hour) // Far future to get all sandboxes

sandboxes, err := s.storeClient.ListExpiredSandboxes(ctx, futureTime, 1000)
if err != nil {
klog.Errorf("failed to list sandboxes: %v", err)
respondWithError(c, ErrInternal, "failed to list sandboxes")
return
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleListSandboxes uses ListExpiredSandboxes with a timestamp 1 year in the future to approximate “list all active sandboxes”. ListExpiredSandboxes returns sandboxes whose ExpiresAt is before the provided time, so this will include already-expired sandboxes as well. It also relies on the expiry index being accurate (which UpdateSandbox doesn’t maintain). To match E2B semantics (list running/active sandboxes), consider adding a store query for non-expired sessions (score >= now), or filtering the returned items by ExpiresAt.After(time.Now()) after fetching IDs from a proper index.

Copilot uses AI. Check for mistakes.
Comment thread docs/api/templates-api.md
Comment on lines +33 to +331
```json
[
{
"templateID": "default/my-template",
"name": "my-template",
"description": "My code interpreter template",
"aliases": ["my-alias"],
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-15T10:30:00Z",
"public": true,
"state": "ready",
"memoryMB": 4096,
"cpuCount": 2
}
]
```

**States:**

- `pending` - Template is being created
- `building` - Template build is in progress
- `ready` - Template is ready to use
- `failed` - Template build failed
- `deprecated` - Template is deprecated

### Get Template

```
GET /templates/{id}
```

Path parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `id` | string | Template ID (format: `namespace/name`) |

Response:

```json
{
"templateID": "default/my-template",
"name": "my-template",
"description": "My code interpreter template",
"aliases": ["my-alias"],
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-15T10:30:00Z",
"public": true,
"state": "ready",
"memoryMB": 4096,
"cpuCount": 2,
"startCommand": "python app.py"
}
```

### Create Template

```
POST /templates
```

Request body:

```json
{
"name": "my-template",
"description": "My template description",
"dockerfile": "FROM python:3.9-slim\nRUN pip install pandas numpy",
"startCommand": "python app.py",
"aliases": ["alias1", "alias2"],
"public": true,
"memoryMB": 4096,
"cpuCount": 2
}
```

Request fields:

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | Yes | Template name (unique within namespace) |
| `description` | string | No | Template description |
| `dockerfile` | string | No | Dockerfile content for custom builds |
| `startCommand` | string | No | Command to run when sandbox starts |
| `aliases` | array[string] | No | Alternative names for the template |
| `public` | bool | No | Whether template is publicly accessible |
| `memoryMB` | int | No | Memory allocation in MB (default: 4096) |
| `cpuCount` | int | No | CPU cores allocation (default: 2) |

Response:

```json
{
"templateID": "default/my-template",
"name": "my-template",
"description": "My template description",
"aliases": ["alias1", "alias2"],
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-15T10:30:00Z",
"public": true,
"state": "pending",
"memoryMB": 4096,
"cpuCount": 2
}
```

### Update Template

```
PATCH /templates/{id}
```

Path parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `id` | string | Template ID (format: `namespace/name`) |

Request body (all fields optional):

```json
{
"description": "Updated description",
"aliases": ["new-alias"],
"public": false,
"memoryMB": 8192,
"cpuCount": 4
}
```

Response: Updated template object (same format as Get Template)

### Delete Template

```
DELETE /templates/{id}
```

Path parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `id` | string | Template ID (format: `namespace/name`) |

Response: `204 No Content`

### List Template Builds

```
GET /templates/{id}/builds
```

Path parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `id` | string | Template ID (format: `namespace/name`) |

Query parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `limit` | int | Maximum number of builds to return (default: 100) |
| `offset` | int | Offset for pagination (default: 0) |

Response:

```json
[
{
"buildID": "build-12345",
"templateID": "default/my-template",
"state": "completed",
"createdAt": "2024-01-15T10:30:00Z",
"completedAt": "2024-01-15T10:35:00Z",
"logs": "Building image...\nInstalling dependencies..."
}
]
```

**Build States:**

- `pending` - Build queued
- `building` - Build in progress
- `completed` - Build succeeded
- `failed` - Build failed
- `cancelled` - Build was cancelled

### Create Template Build

```
POST /templates/{id}/builds
```

Path parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `id` | string | Template ID (format: `namespace/name`) |

Request body:

```json
{
"dockerfile": "FROM python:3.11-slim\nRUN pip install pandas numpy matplotlib",
"startCommand": "python app.py"
}
```

Request fields:

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `dockerfile` | string | No | Updated Dockerfile content |
| `startCommand` | string | No | Updated start command |

Response:

```json
{
"buildID": "build-12345",
"templateID": "default/my-template",
"state": "pending",
"createdAt": "2024-01-15T10:30:00Z"
}
```

### Get Template Build

```
GET /templates/{id}/builds/{buildId}
```

Path parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `id` | string | Template ID (format: `namespace/name`) |
| `buildId` | string | Build ID |

Response:

```json
{
"buildID": "build-12345",
"templateID": "default/my-template",
"state": "completed",
"createdAt": "2024-01-15T10:30:00Z",
"completedAt": "2024-01-15T10:35:00Z",
"logs": "Building image...\nStep 1/3 : FROM python:3.11-slim\n..."
}
```

## Error Codes

| Code | HTTP Status | Description |
|------|-------------|-------------|
| `invalid_request` | 400 | Invalid request parameters |
| `unauthorized` | 401 | Missing or invalid API key |
| `template_not_found` | 404 | Template does not exist |
| `build_not_found` | 404 | Build does not exist |
| `template_already_exists` | 409 | Template with this name already exists |
| `build_in_progress` | 409 | Another build is already in progress for this template |
| `invalid_dockerfile` | 422 | Dockerfile syntax error |
| `internal_error` | 500 | Internal server error |

## Mapping to Kubernetes

Templates map to CodeInterpreter and AgentRuntime CRDs:

| Templates API | Kubernetes CRD | Notes |
|---------------|----------------|-------|
| Template ID | CRD name | Format: `{namespace}/{name}` |
| Aliases | Annotation | Stored in `e2b.agentcube.io/aliases` |
| Public flag | Label | Stored in `e2b.agentcube.io/public` |
| Dockerfile | Build source | Used to build container image |
| Start command | Container args | Command to run on sandbox start |
| Memory/CPU | Resource limits | Maps to container resources |

### Example CRD Mapping

A template with ID `default/python-ds` maps to:

```yaml
apiVersion: runtime.agentcube.io/v1alpha1
kind: CodeInterpreter
metadata:
name: python-ds
namespace: default
labels:
e2b.agentcube.io/public: "true"
annotations:
e2b.agentcube.io/aliases: '["datascience", "py-ds"]'
spec:
image: agentcube/python-ds:latest
resources:
memoryMB: 4096
cpuCount: 2
```
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Templates API documentation examples use camelCase JSON fields (templateID, createdAt, memoryMB, cpuCount, etc.), but the actual API structs serialize/deserialize snake_case (template_id, created_at, memory_mb, vcpu_count, ...). This mismatch will cause clients following the doc to send the wrong payload and misinterpret responses. Please align the docs with the implemented JSON field names (or explicitly document/implement dual-format support). Also, the CRD example uses apiVersion: runtime.agentcube.io/v1alpha1, but the actual group in this repo is runtime.agentcube.volcano.sh/v1alpha1.

Copilot uses AI. Check for mistakes.
This commit adds full E2B Templates API compatibility to AgentCube:

API Endpoints:
- GET /templates - List templates with filtering
- POST /templates - Create new template
- GET /templates/{id} - Get template by ID
- PATCH /templates/{id} - Update template
- DELETE /templates/{id} - Delete template
- GET /templates/{id}/builds - List template builds
- POST /templates/{id}/builds - Create build
- GET /templates/{id}/builds/{buildId} - Get build status

Implementation:
- pkg/router/e2b/templates_models.go - Data models
- pkg/router/e2b/templates_handlers.go - HTTP handlers
- pkg/router/e2b/templates_handlers_test.go - Unit tests
- test/e2e/templates_test.go - E2E tests

Documentation:
- docs/api/templates-api.md - API reference
- examples/templates/ - Shell and Python examples

CI/CD:
- .github/workflows/templates-api-tests.yml - Dedicated test workflow
- Updated e2b-api.yml to include E2B SDK compatibility tests

Features:
- Template CRUD operations
- Template aliases support
- Public/private template visibility
- Build lifecycle management
- CRD mapping (CodeInterpreter/AgentRuntime)

Signed-off-by: MahaoAlex <alexmahao319@gmail.com>
Copilot AI review requested due to automatic review settings April 14, 2026 10:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 51 out of 53 changed files in this pull request and generated 5 comments.

Comment on lines +50 to +55
// Parse template_id to namespace/name
namespace, name := parseTemplateID(req.TemplateID)

klog.Infof("creating sandbox: template=%s, namespace=%s, clientID=%s, timeout=%d",
req.TemplateID, namespace, clientID, req.Timeout)

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleCreateSandbox parses template_id via parseTemplateID but does not validate it. This allows IDs with multiple slashes or invalid characters (e.g., a/b/c becomes name b/c), which will likely fail later when mapping to Kubernetes resource names and may surface as 5xx errors. You already have validateTemplateID; please call it early and return a 400 on validation failure.

Copilot uses AI. Check for mistakes.
Comment thread docs/api/e2b-openapi.yaml
Comment on lines +41 to +95
paths:
/sandboxes:
post:
summary: 创建沙箱
description: |
从模板创建一个新的沙箱实例。

**Phase 1 限制:**
- `autoPause` 必须为 `false`(默认)
- `envVars` 不支持
- `network` 配置不支持
- `volumeMounts` 不支持
operationId: createSandbox
tags:
- Sandboxes
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/NewSandbox'
examples:
minimal:
summary: 最小请求
value:
templateID: default/code-interpreter
with-timeout:
summary: 指定超时
value:
templateID: default/code-interpreter
timeout: 900
metadata:
project: my-project
responses:
'201':
description: 沙箱创建成功
content:
application/json:
schema:
$ref: '#/components/schemas/Sandbox'
example:
clientID: client-abc123
envdVersion: "1.0.0"
sandboxID: sb-xyz789
templateID: default/code-interpreter
domain: sb-xyz789.agentcube.local
'400':
description: 请求参数错误(如设置了不支持的 autoPause)
content:
application/json:
schema:
$ref: '#/components/schemas/E2BError'
example:
code: 400
message: auto_pause not supported in Phase 1
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This OpenAPI spec documents camelCase fields like templateID, clientID, envdVersion, etc., but the router implementation structs use snake_case JSON tags (e.g., template_id, client_id, envd_version). This mismatch will break generated clients and mislead SDK users. Please align the OpenAPI schema and examples with the actual wire format (or update the server to accept/emit the documented casing consistently).

Copilot uses AI. Check for mistakes.
Comment thread docs/api/templates-api.md
Comment on lines +33 to +121
```json
[
{
"templateID": "default/my-template",
"name": "my-template",
"description": "My code interpreter template",
"aliases": ["my-alias"],
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-15T10:30:00Z",
"public": true,
"state": "ready",
"memoryMB": 4096,
"cpuCount": 2
}
]
```

**States:**

- `pending` - Template is being created
- `building` - Template build is in progress
- `ready` - Template is ready to use
- `failed` - Template build failed
- `deprecated` - Template is deprecated

### Get Template

```
GET /templates/{id}
```

Path parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `id` | string | Template ID (format: `namespace/name`) |

Response:

```json
{
"templateID": "default/my-template",
"name": "my-template",
"description": "My code interpreter template",
"aliases": ["my-alias"],
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-15T10:30:00Z",
"public": true,
"state": "ready",
"memoryMB": 4096,
"cpuCount": 2,
"startCommand": "python app.py"
}
```

### Create Template

```
POST /templates
```

Request body:

```json
{
"name": "my-template",
"description": "My template description",
"dockerfile": "FROM python:3.9-slim\nRUN pip install pandas numpy",
"startCommand": "python app.py",
"aliases": ["alias1", "alias2"],
"public": true,
"memoryMB": 4096,
"cpuCount": 2
}
```

Request fields:

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | Yes | Template name (unique within namespace) |
| `description` | string | No | Template description |
| `dockerfile` | string | No | Dockerfile content for custom builds |
| `startCommand` | string | No | Command to run when sandbox starts |
| `aliases` | array[string] | No | Alternative names for the template |
| `public` | bool | No | Whether template is publicly accessible |
| `memoryMB` | int | No | Memory allocation in MB (default: 4096) |
| `cpuCount` | int | No | CPU cores allocation (default: 2) |

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Templates API docs use camelCase fields (e.g., templateID, createdAt, memoryMB, cpuCount, startCommand) but the implementation and tests for templates use snake_case JSON (e.g., template_id, created_at, memory_mb, vcpu_count, start_command). Please update the documentation examples and field lists to match the actual API responses/requests to avoid confusing users and breaking copy/paste snippets.

Copilot uses AI. Check for mistakes.
Comment thread pkg/router/handlers.go
Comment on lines 156 to 164
klog.Errorf("Failed to get sandbox access address %s: %v", sandbox.SandboxID, err)
c.JSON(http.StatusNotFound, gin.H{
// Return 500 for invalid endpoint URL errors, 404 for not found errors
status := http.StatusNotFound
if strings.Contains(err.Error(), "invalid endpoint URL") {
status = http.StatusInternalServerError
}
c.JSON(status, gin.H{
"error": err.Error(),
})
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forwardToSandbox determines whether to return 500 vs 404 by checking strings.Contains(err.Error(), "invalid endpoint URL"). This is brittle (changes to error wording or wrapping will change behavior). Since buildURL is the source of this condition, consider returning a typed/sentinel error (or wrapping with a known value) and using errors.Is/As here to map to the correct status code reliably.

Copilot uses AI. Check for mistakes.
Comment on lines 41 to 73
@@ -47,15 +65,16 @@ func (m *mockSessionManager) GetSandboxBySession(_ context.Context, _ string, _
}

func setupEnv() {
os.Setenv("REDIS_ADDR", "localhost:6379")
os.Setenv("REDIS_PASSWORD", "test-password")
mr := getTestRedis()
os.Setenv("REDIS_ADDR", mr.Addr())
os.Setenv("REDIS_PASSWORD", "")
os.Setenv("REDIS_PASSWORD_REQUIRED", "false")
os.Setenv("WORKLOAD_MANAGER_URL", "http://localhost:8080")
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Router tests rely on store.Storage() (a singleton guarded by sync.Once) inside NewServer(). Because the store singleton is initialized only once per test process, whichever test runs first will lock in REDIS_ADDR for all subsequent tests. With handlers_test.go now using miniredis and server_test.go still setting REDIS_ADDR=localhost:6379, the suite becomes order-dependent and can fail when the singleton is initialized with a non-running Redis. A robust fix is to ensure all tests initialize the store with the same miniredis address before any NewServer() call (e.g., via TestMain), and/or add a test-only reset hook in pkg/store to reinitialize the singleton between tests.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: E2B API Compatibility for AgentCube Ecosystem Growth

5 participants