Skip to content

feat(sandbox): code-execution sandbox + runnable artifacts#1727

Open
larryro wants to merge 88 commits into
mainfrom
feat/sandbox-code-run
Open

feat(sandbox): code-execution sandbox + runnable artifacts#1727
larryro wants to merge 88 commits into
mainfrom
feat/sandbox-code-run

Conversation

@larryro
Copy link
Copy Markdown
Collaborator

@larryro larryro commented May 19, 2026

Summary

Adds an isolated code-execution sandbox and folds it into the artifact system as a new class of runnable artifacts (python_runnable / node_runnable). The LLM can now author, edit, and execute Python or Node scripts inside the chat canvas, with generated files surfaced as downloadable chat-card parts.

What's new

New services

  • services/sandbox/ — Bun-based spawner that manages containerized executions (quota, slot reservation, watchdog, output streaming).
  • services/sandbox-runtime/ — hardened Docker image (Python + Node) used as the per-execution container.
  • services/sandbox-egress/ — tinyproxy-based egress proxy isolating sandbox network traffic.

Convex backend

  • convex/sandbox/ — execution schema (sandboxExecutions), internal mutations/queries, output ingestion, cron-driven cleanup, governance hooks.
  • node_only/sandbox/ — internal action layer + spawner client that talks to the sandbox service.

Agent tools

  • New runnable artifact types (python_runnable, node_runnable) integrated into artifact_create / artifact_edit.
  • New artifact_run tool — explicit, LLM-driven execution trigger. Separating authoring from execution prevents the LLM from spamming duplicate artifact_create calls on run failure.
  • chat-agent.json updated (en/de/fr) with the 4-step author → run → edit → re-run cycle.

Canvas UI

  • canvas-runnable-code-renderer.tsx — execution panel (status, stderr preview, generated files) stacked above the source pane.
  • Runnable types registered in the artifact bar / message-bubble pill icon maps.
  • New canvas.runDone i18n key (en/de/fr).

CLI

  • tale doctor — diagnostics for sandbox prerequisites (docker, network, images).
  • tale start / tale deploy integrate the sandbox + sandbox-egress services; dev compose publishes port 8003 for bun dev.

Pre-PR checklist

  • Ran bun run check (format, lint, typecheck, all tests). — not yet run; please run before merge
  • Updated services/platform/messages/{en,de,fr}.json (new canvas.runDone key).
  • Updated /docs/{en,de,fr}/ for every user-visible change. — pending: new agent tool, new artifact types, tale doctor, sandbox env vars (SANDBOX_URL, SANDBOX_TOKEN)
  • Every touched docs page has a real opening and closing. — N/A until docs are added
  • Ran bun run --filter @tale/docs lint and bun run --filter @tale/docs test. — N/A until docs are added
  • Updated README.md, README.de.md, README.fr.md. — likely needed (new services in the stack)

Test plan

  • bun run check is green.
  • tale doctor reports OK on a clean dev machine.
  • tale start brings up sandbox + sandbox-egress alongside the existing stack; ports/network as expected.
  • Smoke: chat-agent receives a "write & run a Python script that …" request → artifact_create(python_runnable)artifact_run → canvas shows stdout/stderr + downloadable files; no duplicate v1 tabs in the artifact bar.
  • Failure path: forced runtime error → LLM follows artifact_editartifact_run cycle (not another artifact_create).
  • Node runnable equivalent path.
  • Quota + watchdog: spawn beyond the slot reservation budget; killed runs surface runErrorCode and a stderr preview.
  • sandboxExecutions audit rows are written and cleaned up by the cron.

Summary by CodeRabbit

  • New Features

    • Runnable artifacts (Python/Node) with sandboxed execution, run status, logs, and downloadable outputs
    • New runtime spawner and sandbox services (egress proxy, spawner, runtime) plus dev Compose/network support
    • Canvas and chat UI show runnable artifact status, run controls, and output downloads
    • CLI: new doctor command and sandbox network setup helpers; env tooling emits sandbox token
  • Bug Fixes

    • Prevented duplicate placeholder rows during streaming artifact creation
  • Tests

    • Added sandbox execution and quota unit tests

Comment thread services/sandbox-egress/Dockerfile Fixed
Comment thread services/sandbox/Dockerfile Fixed
Comment thread services/sandbox-runtime/Dockerfile Fixed
@@ -0,0 +1,226 @@
'use node';
@@ -0,0 +1,523 @@
'use node';
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

📝 Walkthrough

Walkthrough

Adds runnable artifact types (python/node), a new Convex execution workflow (sandboxExecutions, mutations, queries), an internal action that calls a sandbox spawner, and a Bun-based spawner service that runs per-call Docker sandboxes. Persists run config on artifacts, introduces artifact_run tool and updates artifact_create/edit, adds UI components to show run state/files, integrates sandbox services into compose/CLI/env/templates, provides a doctor command and tests, and implements network/volume/cleanup helpers and runtime images.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~150 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sandbox-code-run

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 22

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tools/cli/src/lib/actions/deploy.ts (1)

47-48: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include sandbox network in dry-run logging for parity.

ensureInfrastructure now creates two networks, but dry-run only reports one. Add a dry-run line for the sandbox network so operators see the full infra plan.

Also applies to: 59-63

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/cli/src/lib/actions/deploy.ts` around lines 47 - 48, Dry-run logging in
deploy.ts’s ensureInfrastructure path only reports the internal network; add a
matching dry-run log for the sandbox network so the operator sees both networks.
Update the logger.info call(s) that currently print `${prefix}Would ensure
network: ${getProjectId()}_internal` to also emit `${prefix}Would ensure
network: ${getProjectId()}_sandbox` (do this in the earlier dry-run return block
and the later dry-run block around lines 59-63), referencing the same prefix,
getProjectId() and logger.info used in ensureInfrastructure.
services/platform/app/features/chat/components/canvas/canvas-pane.tsx (1)

153-161: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Localize runnable type labels instead of hardcoding English.

Python (sandbox) and Node (sandbox) should be translation keys so the canvas badge remains localized like the rest of the UI.

As per coding guidelines: “No hardcoded user-facing strings; always use the translation hook.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/app/features/chat/components/canvas/canvas-pane.tsx` around
lines 153 - 161, Replace the hardcoded English labels for runnable types in
TYPE_LABELS so they use the translation hook instead of literal strings: update
the entries for the keys "python_runnable" and "node_runnable" in the
TYPE_LABELS Record<CanvasContentType, string> to call the i18n translate
function (use the existing t hook or import useTranslation/t as used in this
component) with distinct keys like "canvas.type.python_runnable" and
"canvas.type.node_runnable"; also add those keys (with appropriate localized
values) to the locale resource files so the badge text is translated at runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@knip.config.ts`:
- Around line 56-57: The knip config's "entry" array currently only lists test
globs ('src/**/*.test.ts'), which can cause live runtime files like
src/server.ts to be treated as unused; update the knip.config.ts "entry" to
include the application's runtime entrypoint (e.g., add 'src/server.ts' to the
entry array) so knip analyzes the live service code when computing usage for the
project.

In
`@services/platform/app/features/chat/components/canvas/canvas-runnable-code-renderer.tsx`:
- Around line 156-157: The current cast using "as RunOutputFile[]" on
artifact?.runOutputFiles can hide runtime shape mismatches; replace it by
implementing a type guard (e.g., isRunOutputFile and isRunOutputFileArray) that
validates each object's required discriminators/fields and use that guard when
assigning outputFiles (instead of the cast) so you assign
artifact.runOutputFiles only when isRunOutputFileArray(artifact.runOutputFiles)
is true, otherwise default to an empty array; add the guard functions near the
component (or in a shared validators file) and reference them from
canvas-runnable-code-renderer.tsx when computing outputFiles.
- Around line 116-124: The component currently renders hardcoded user-facing
strings (e.g., the raw runStatus value and labels like "Run", "Files",
"stdout/stderr", "chars") inside CanvasRunnableCodeRenderer (see use of
runStatus and Badge) — replace those with i18n keys by importing and using the
project's translation hook (e.g., t or useTranslations) and wrap each
user-facing string (including the run status output) with t(...) calls using
appropriate keys; update all occurrences noted (around runStatus rendering and
the label locations at the other listed blocks) so the Badge children and all
labels call t('...') instead of literal text, and ensure keys are descriptive
and added to the locale files.

In `@services/platform/convex/agent_tools/artifacts/shared.ts`:
- Around line 24-33: The current isValidArtifactType function duplicates the
ArtifactType values with a manual OR-chain, risking drift; change it to derive
validity from the canonical artifactTypeEnum instead: import or reference
artifactTypeEnum and implement isValidArtifactType by checking whether value is
a member of artifactTypeEnum (e.g., using
Object.values(artifactTypeEnum).includes(value) or artifactTypeEnum.includes if
it’s an array), keeping the function signature isValidArtifactType(value:
string): value is ArtifactType and ensuring the check uses the enum’s runtime
values so future enum additions are automatically covered.

In `@services/platform/convex/node_only/sandbox/helpers/spawner_client.ts`:
- Around line 234-238: The catch block that wraps the fetch call to fetch(url, {
method: 'POST', headers, body }) currently swallows errors; update the catch to
log the error and context (e.g. URL and any identifying execution id) using
console.warn or console.error instead of leaving it empty, while keeping the
existing comment about best-effort cancellation; ensure the log includes the
error object so failures to POST cancellation requests are recorded for
debugging.

In `@services/platform/convex/sandbox/internal_mutations.test.ts`:
- Around line 57-85: The mock builder in makeBuilder only records status
predicates so tests won’t catch missing org/index predicates; update
builder.withIndex to also capture the index name argument and any org eq()
predicate (push { indexName: _name, field, value } into calls), and change
builder[Symbol.asyncIterator] to verify the captured indexName and org id (e.g.,
find a call with field === 'orgId' and expected value) before selecting
runningRows/queuedRows/completedRows so the fake query fails if
reserveSlotAndInsert drops or uses the wrong org/index filter.
- Around line 118-121: The test mock for internalMutation should be made generic
so the imported handlers keep their types and eliminate the repeated "as unknown
as" casts; update the mocked internalMutation signature to accept a generic type
parameter (e.g., internalMutation<TArgs, TReturn>(handler: MutHandler<TArgs,
TReturn>) or similar) and return a properly typed MutHandler<TArgs, TReturn>,
then adjust the mock implementation to forward types so usages like
reserveSlotAndInsert can be assigned directly (remove casts) — refer to the
mocked function name internalMutation and the test symbols reserveSlotAndInsert,
MutHandler, and baseArgs when making this change.

In `@services/sandbox-egress/Dockerfile`:
- Around line 27-28: Remove the inline HEALTHCHECK directive from the
Dockerfile: delete the HEALTHCHECK block ("HEALTHCHECK --interval=10s
--timeout=3s --retries=2 CMD nc -z 127.0.0.1 3128 || exit 1") so the image no
longer embeds orchestration-specific health policy; keep health checks
centralized in your Compose/service definitions instead (adjust
docker-compose.yml service for the sandbox-egress container to define equivalent
healthcheck there).

In `@services/sandbox-egress/tinyproxy.conf.template`:
- Around line 30-31: The configuration currently sets DisableViaHeader No which
contradicts the hardening intent; change the tinyproxy setting DisableViaHeader
from "No" to "Yes" in tinyproxy.conf.template so the Via header is suppressed on
outgoing requests (i.e., ensure the line reads DisableViaHeader Yes).

In `@services/sandbox-runtime/entrypoint.sh`:
- Around line 72-73: The uv command invocation "uv pip list --format=json
--python /workspace/.deps/python" uses an unsupported --python flag; change the
call to use a supported form such as "uv pip list --format=json" (or if you
intend to capture the current environment's installed packages, use "uv pip
freeze --format=json") so the command succeeds and produces
/workspace/install-report.json instead of failing silently.

In `@services/sandbox/Dockerfile`:
- Around line 32-33: The Dockerfile contains an image-level HEALTHCHECK (the
HEALTHCHECK instruction executing "curl -fsS http://127.0.0.1:8003/health ||
exit 1") which conflicts with the repo convention to centralize health checks in
compose; remove this HEALTHCHECK instruction from the Dockerfile and instead add
an equivalent healthcheck entry for this service in the compose/orchestration
configuration so health logic lives only at compose-level (keep the same
endpoint/interval/timeout/retries semantics when migrating).
- Line 12: Update the sandbox Dockerfile's FROM line to use the Bun 1.3 baseline
(e.g., change the FROM value to oven/bun:1.3-debian or newer) and update the
sandbox package.json `@types/bun` version to match the project's 1.3.11 (ensure
packageManager remains bun@1.3.10); locate the Dockerfile's FROM directive and
the services/sandbox/package.json dependency entry for "`@types/bun`" to make
these changes so the container OS and type definitions align with the project
baseline.

In `@services/sandbox/src/config.ts`:
- Around line 17-22: Remove the "as 'runc' | 'runsc'" cast on the runtime
variable and add a type‑guard to narrow the value via control flow: read const
raw = process.env.SANDBOX_RUNTIME ?? 'runc', implement a function like
isSandboxRuntime(value: string): value is 'runc' | 'runsc' and use it in the
existing check (throw if not valid); after the guard the runtime value will be
narrowed to 'runc' | 'runsc' for use elsewhere (refer to the SANDBOX_RUNTIME
env, the runtime/raw variable and add isSandboxRuntime as the narrowing helper).

In `@services/sandbox/src/server.ts`:
- Around line 80-89: The code adds parsed.executionId into inFlightSet before
validating parsed, which may insert undefined; update the flow around JSON.parse
and inFlightSet so you validate that parsed.executionId is present and a valid
identifier (e.g., non-empty string) before calling inFlightSet.add; if
validation fails, return the 400 Response (same structure used on JSON parse
failure) and do not add to inFlightSet. Target the JSON.parse block and the use
of parsed, ExecuteRequest, parsed.executionId and inFlightSet in server.ts to
implement this guard.

In `@services/sandbox/src/spawn_util.ts`:
- Around line 82-116: The timeout is armed after awaiting collectStdout()/stderr
which means a hung docker prevents the timeout from ever starting; move the
timeout race to start before/while collecting output by creating a single
Promise.race that includes the exited promise, the timeout promise (the existing
timer logic that sets timedOut, calls proc.kill('SIGKILL') and spawns the docker
killer using DOCKER_BIN when opts.killOnTimeoutContainer is set), and the
stdout/stderr collection (collectStdout() and new
Response(proc.stderr).arrayBuffer()); in practice modify the code around
collectStdout(), proc.exited, and opts.timeoutMs so you start the timeout
Promise concurrently (using Promise.race) with collectStdout()/stderr collection
instead of awaiting output first, ensuring timer is set/cleared appropriately
and reusing proc, timer, proc.kill, killOnTimeoutContainer, and Bun.spawn as in
the current logic.

In `@services/sandbox/src/spawn.ts`:
- Around line 250-252: The current timeout cleanup timers call dockerKill (and
similarly dockerRm / rm in the other block) and silently swallow any errors;
update the handlers around setTimeout and the cleanup callers (referencing the
killTimer callback and the other cleanup block that calls dockerRm/rm) to catch
rejections and surface them: log the full error via the service logger (include
containerName/sessionDir context), increment a cleanup-failure metric or
counter, and optionally perform a best-effort retry or escalate (e.g., schedule
a single retry after a short delay) rather than ignoring the failure. Ensure
dockerKill, dockerRm and rm callers use .catch(err => { logger.error(..., err);
metrics.increment('cleanup.failure'); /* optional retry */ }) so orphaned
containers/directories are visible in logs/metrics.
- Around line 220-226: Detect and reject duplicate executionId values before
staging by checking inFlight.has(req.executionId) immediately (before computing
containerName/workspaceHostDir and before creating the AbortController); if the
id already exists, return/throw a clear error to abort the request. Use the
existing inFlight Map (symbol: inFlight) as the guard, and only create and set
the entry ({ containerName, abort }) after the check so you don't overwrite an
existing run; ensure any cleanup paths rely on the presence of that exact
inFlight entry to avoid canceling or deleting another run.
- Around line 130-165: The walker currently explores every entry under outputDir
even after byte limits are reached; modify the async function walk(rel: string)
so it short-circuits traversal as soon as caps are exceeded: at the top of walk
check if totalAccepted >= caps.totalMax (or another combined limit you choose)
and return immediately; before recursing into subdirectories check the same and
skip walking that child; before calling stat/readFile also check per-file and
remaining total bytes to avoid unnecessary I/O; keep using truncatedCount to
record skipped files and ensure files, totalAccepted, truncatedCount updates
remain consistent.

In `@services/sandbox/src/volume.ts`:
- Around line 43-57: The current check-then-create flow using runDocker allows a
race where volume create can fail with "already exists" and is treated as fatal;
update the logic around runDocker calls in this module (the inspect and create
calls in volume.ts) to treat a create failure that indicates the volume already
exists as non-fatal: after running the create command (the variable named
create), if create.exitCode !== 0 inspect create.stderr/create.stdout for the
“already exists” message (or Docker’s equivalent) and, if present, treat the
operation as successful (return) instead of throwing; only throw for genuine
failures that are not the already-exists case.

In `@tests/container-smoke-test.sh`:
- Around line 87-92: The docker network create step can fail if the network
already exists or is in use; make the setup idempotent by checking for the
network before creating or by swallowing non-fatal create errors: use `docker
network inspect tale-sandbox-net >/dev/null 2>&1 || docker network create
--internal --ipv6=false --driver=bridge tale-sandbox-net >/dev/null` or append
`|| true` to the `docker network create ... tale-sandbox-net` call so the script
does not hard-fail when the network already exists or creation returns a
non-zero status.

In `@tools/cli/src/commands/doctor.ts`:
- Around line 119-127: The SANDBOX_TOKEN check in function checkSandboxToken
currently returns status: 'fail' when missing/short which conflicts with the PR
making SANDBOX_TOKEN optional; change the returned status to 'warn' (or, if you
want to keep fail behavior for certain modes, gate the failure by checking the
runtime mode/flag and only set status: 'fail' when that explicit mode requires
the token). Update both occurrences of this logic (the checkSandboxToken
function and the similar check at lines ~175-178) so they return a warning
message and appropriate fix text instead of failing the entire doctor run unless
explicitly required by mode.

In `@tools/cli/src/lib/compose/services/create-sandbox-service.ts`:
- Around line 29-33: The ports mapping currently uses ports: ['8003:8003'] which
binds the spawner to all host interfaces; change this to bind to loopback only
by replacing that entry with an explicit loopback host binding (e.g.,
'127.0.0.1:8003:8003') in the create-sandbox-service definition so the container
port 8003 is published only on localhost and reduces the exposed attack surface.

---

Outside diff comments:
In `@services/platform/app/features/chat/components/canvas/canvas-pane.tsx`:
- Around line 153-161: Replace the hardcoded English labels for runnable types
in TYPE_LABELS so they use the translation hook instead of literal strings:
update the entries for the keys "python_runnable" and "node_runnable" in the
TYPE_LABELS Record<CanvasContentType, string> to call the i18n translate
function (use the existing t hook or import useTranslation/t as used in this
component) with distinct keys like "canvas.type.python_runnable" and
"canvas.type.node_runnable"; also add those keys (with appropriate localized
values) to the locale resource files so the badge text is translated at runtime.

In `@tools/cli/src/lib/actions/deploy.ts`:
- Around line 47-48: Dry-run logging in deploy.ts’s ensureInfrastructure path
only reports the internal network; add a matching dry-run log for the sandbox
network so the operator sees both networks. Update the logger.info call(s) that
currently print `${prefix}Would ensure network: ${getProjectId()}_internal` to
also emit `${prefix}Would ensure network: ${getProjectId()}_sandbox` (do this in
the earlier dry-run return block and the later dry-run block around lines
59-63), referencing the same prefix, getProjectId() and logger.info used in
ensureInfrastructure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 563a2a93-503d-42b2-8301-27c5e0bc0db3

📥 Commits

Reviewing files that changed from the base of the PR and between ad6018a and 565f2f2.

⛔ Files ignored due to path filters (2)
  • services/platform/convex/_generated/api.d.ts is excluded by !**/_generated/**
  • services/sandbox/bun.lock is excluded by !**/*.lock
📒 Files selected for processing (68)
  • .commitlintrc.json
  • compose.yml
  • examples/agents/chat-agent.json
  • knip.config.ts
  • services/platform/app/features/chat/components/canvas/artifact-bar.tsx
  • services/platform/app/features/chat/components/canvas/canvas-context.tsx
  • services/platform/app/features/chat/components/canvas/canvas-pane.tsx
  • services/platform/app/features/chat/components/canvas/canvas-runnable-code-renderer.tsx
  • services/platform/app/features/chat/components/message-bubble.tsx
  • services/platform/convex/agent_tools/artifacts/artifact_create_tool.ts
  • services/platform/convex/agent_tools/artifacts/artifact_edit_tool.ts
  • services/platform/convex/agent_tools/artifacts/artifact_run_tool.ts
  • services/platform/convex/agent_tools/artifacts/shared.ts
  • services/platform/convex/agent_tools/tool_names.ts
  • services/platform/convex/agent_tools/tool_registry.ts
  • services/platform/convex/artifacts/internal_mutations.ts
  • services/platform/convex/artifacts/internal_queries.ts
  • services/platform/convex/artifacts/schema.ts
  • services/platform/convex/crons.ts
  • services/platform/convex/governance/soft_delete_helpers.ts
  • services/platform/convex/governance/soft_delete_validators.ts
  • services/platform/convex/lib/context_management/build_artifacts_context.ts
  • services/platform/convex/node_only/sandbox/helpers/spawner_client.ts
  • services/platform/convex/node_only/sandbox/internal_actions.ts
  • services/platform/convex/sandbox/internal_mutations.test.ts
  • services/platform/convex/sandbox/internal_mutations.ts
  • services/platform/convex/sandbox/internal_queries.ts
  • services/platform/convex/sandbox/output_mutations.ts
  • services/platform/convex/sandbox/schema.ts
  • services/platform/convex/schema.ts
  • services/platform/env.sh
  • services/platform/messages/de.json
  • services/platform/messages/en.json
  • services/platform/messages/fr.json
  • services/sandbox-egress/Dockerfile
  • services/sandbox-egress/entrypoint.sh
  • services/sandbox-egress/tinyproxy.conf.template
  • services/sandbox-runtime/Dockerfile
  • services/sandbox-runtime/entrypoint.sh
  • services/sandbox/Dockerfile
  • services/sandbox/Dockerfile.dockerignore
  • services/sandbox/package.json
  • services/sandbox/seccomp.json
  • services/sandbox/src/auth.ts
  • services/sandbox/src/cleanup.ts
  • services/sandbox/src/config.ts
  • services/sandbox/src/docker_args.test.ts
  • services/sandbox/src/docker_args.ts
  • services/sandbox/src/server.ts
  • services/sandbox/src/spawn.ts
  • services/sandbox/src/spawn_util.ts
  • services/sandbox/src/types.ts
  • services/sandbox/src/volume.ts
  • services/sandbox/tsconfig.json
  • tests/container-image-test.sh
  • tests/container-smoke-test.sh
  • tools/cli/src/commands/doctor.ts
  • tools/cli/src/index.ts
  • tools/cli/src/lib/actions/deploy.ts
  • tools/cli/src/lib/actions/start.ts
  • tools/cli/src/lib/compose/generators/generate-dev-compose.ts
  • tools/cli/src/lib/compose/generators/generate-stateful-compose.ts
  • tools/cli/src/lib/compose/services/create-convex-service.ts
  • tools/cli/src/lib/compose/services/create-sandbox-egress-service.ts
  • tools/cli/src/lib/compose/services/create-sandbox-service.ts
  • tools/cli/src/lib/compose/types.ts
  • tools/cli/src/lib/config/ensure-env.ts
  • tools/cli/src/lib/docker/ensure-network.ts

Comment thread knip.config.ts
Comment on lines +56 to +57
entry: ['src/**/*.test.ts'],
project: ['src/**/*.ts'],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the runtime entrypoint to knip entries.

Line 56 only includes tests as entrypoints; src/server.ts should be listed too, otherwise knip can misclassify live service code as unused.

Suggested fix
     'services/sandbox': {
       // Standalone Bun HTTP service. Not in root workspaces (own bun.lock);
       // declare here so knip can resolve its entry points and ignore them
       // from the "unused" sweep.
-      entry: ['src/**/*.test.ts'],
+      entry: ['src/server.ts', 'src/**/*.test.ts'],
       project: ['src/**/*.ts'],
     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
entry: ['src/**/*.test.ts'],
project: ['src/**/*.ts'],
entry: ['src/server.ts', 'src/**/*.test.ts'],
project: ['src/**/*.ts'],
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@knip.config.ts` around lines 56 - 57, The knip config's "entry" array
currently only lists test globs ('src/**/*.test.ts'), which can cause live
runtime files like src/server.ts to be treated as unused; update the
knip.config.ts "entry" to include the application's runtime entrypoint (e.g.,
add 'src/server.ts' to the entry array) so knip analyzes the live service code
when computing usage for the project.

Comment on lines +116 to +124
if (runStatus === 'failed' || runStatus === 'cancelled') {
return (
<Badge
variant="outline"
icon={AlertTriangle}
className="text-destructive border-destructive/40"
>
{runStatus}
</Badge>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Externalize run-panel labels and status text through i18n.

These strings are user-facing but hardcoded (Run, Files, stdout/stderr, chars, and raw runStatus). They should go through t(...) keys to avoid locale regressions.

As per coding guidelines: “No hardcoded user-facing strings; always use the translation hook.”

Also applies to: 183-185, 200-202, 211-213, 222-224

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@services/platform/app/features/chat/components/canvas/canvas-runnable-code-renderer.tsx`
around lines 116 - 124, The component currently renders hardcoded user-facing
strings (e.g., the raw runStatus value and labels like "Run", "Files",
"stdout/stderr", "chars") inside CanvasRunnableCodeRenderer (see use of
runStatus and Badge) — replace those with i18n keys by importing and using the
project's translation hook (e.g., t or useTranslations) and wrap each
user-facing string (including the run status output) with t(...) calls using
appropriate keys; update all occurrences noted (around runStatus rendering and
the label locations at the other listed blocks) so the Badge children and all
labels call t('...') instead of literal text, and ensure keys are descriptive
and added to the locale files.

Comment on lines +156 to +157
const outputFiles: RunOutputFile[] = (artifact?.runOutputFiles ??
[]) as RunOutputFile[];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Remove the as RunOutputFile[] cast and narrow with a typed validator/guard.

The cast can mask runtime shape mismatches from backend data; prefer a safe narrowing path before rendering.

As per coding guidelines: “Never use ‘as’, never ‘any’, never ‘unknown’; use type guards/generics/discriminated unions.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@services/platform/app/features/chat/components/canvas/canvas-runnable-code-renderer.tsx`
around lines 156 - 157, The current cast using "as RunOutputFile[]" on
artifact?.runOutputFiles can hide runtime shape mismatches; replace it by
implementing a type guard (e.g., isRunOutputFile and isRunOutputFileArray) that
validates each object's required discriminators/fields and use that guard when
assigning outputFiles (instead of the cast) so you assign
artifact.runOutputFiles only when isRunOutputFileArray(artifact.runOutputFiles)
is true, otherwise default to an empty array; add the guard functions near the
component (or in a shared validators file) and reference them from
canvas-runnable-code-renderer.tsx when computing outputFiles.

Comment on lines 24 to 33
export function isValidArtifactType(value: string): value is ArtifactType {
return (
value === 'html' ||
value === 'svg' ||
value === 'markdown' ||
value === 'mermaid' ||
value === 'code'
value === 'code' ||
value === 'python_runnable' ||
value === 'node_runnable'
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Derive isValidArtifactType from artifactTypeEnum to avoid drift.

The hand-maintained OR-chain duplicates enum values and can silently fall out of sync on future type additions.

Proposed change
 export function isValidArtifactType(value: string): value is ArtifactType {
-  return (
-    value === 'html' ||
-    value === 'svg' ||
-    value === 'markdown' ||
-    value === 'mermaid' ||
-    value === 'code' ||
-    value === 'python_runnable' ||
-    value === 'node_runnable'
-  );
+  return artifactTypeEnum.safeParse(value).success;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/convex/agent_tools/artifacts/shared.ts` around lines 24 -
33, The current isValidArtifactType function duplicates the ArtifactType values
with a manual OR-chain, risking drift; change it to derive validity from the
canonical artifactTypeEnum instead: import or reference artifactTypeEnum and
implement isValidArtifactType by checking whether value is a member of
artifactTypeEnum (e.g., using Object.values(artifactTypeEnum).includes(value) or
artifactTypeEnum.includes if it’s an array), keeping the function signature
isValidArtifactType(value: string): value is ArtifactType and ensuring the check
uses the enum’s runtime values so future enum additions are automatically
covered.

Comment on lines +234 to +238
try {
await fetch(url, { method: 'POST', headers, body });
} catch {
// Cancellation is best-effort; the watchdog cron will reap stuck rows.
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add logging to the catch block per coding guidelines.

The empty catch block silently swallows fetch failures. While the comment explains cancellation is best-effort, logging the error would help debugging stuck executions.

As per coding guidelines: "No empty catch blocks; log with console.warn/console.error or re-throw."

Proposed fix
   try {
     await fetch(url, { method: 'POST', headers, body });
-  } catch {
-    // Cancellation is best-effort; the watchdog cron will reap stuck rows.
+  } catch (err) {
+    // Cancellation is best-effort; the watchdog cron will reap stuck rows.
+    console.warn(`[spawner] cancel ${executionId} failed:`, err);
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
await fetch(url, { method: 'POST', headers, body });
} catch {
// Cancellation is best-effort; the watchdog cron will reap stuck rows.
}
try {
await fetch(url, { method: 'POST', headers, body });
} catch (err) {
// Cancellation is best-effort; the watchdog cron will reap stuck rows.
console.warn(`[spawner] cancel ${executionId} failed:`, err);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/convex/node_only/sandbox/helpers/spawner_client.ts` around
lines 234 - 238, The catch block that wraps the fetch call to fetch(url, {
method: 'POST', headers, body }) currently swallows errors; update the catch to
log the error and context (e.g. URL and any identifying execution id) using
console.warn or console.error instead of leaving it empty, while keeping the
existing comment about best-effort cancellation; ensure the log includes the
error object so failures to POST cancellation requests are recorded for
debugging.

Comment on lines +250 to +252
const killTimer = setTimeout(() => {
void dockerKill(containerName).catch(() => {});
}, timeoutMs);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t silently swallow kill/cleanup failures.

If dockerKill, dockerRm, or rm fail, this leaves orphaned containers or session directories with no signal in logs or metrics. In a long-running spawner, that turns a recoverable cleanup problem into an invisible disk/slot leak.

Also applies to: 361-366

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/spawn.ts` around lines 250 - 252, The current timeout
cleanup timers call dockerKill (and similarly dockerRm / rm in the other block)
and silently swallow any errors; update the handlers around setTimeout and the
cleanup callers (referencing the killTimer callback and the other cleanup block
that calls dockerRm/rm) to catch rejections and surface them: log the full error
via the service logger (include containerName/sessionDir context), increment a
cleanup-failure metric or counter, and optionally perform a best-effort retry or
escalate (e.g., schedule a single retry after a short delay) rather than
ignoring the failure. Ensure dockerKill, dockerRm and rm callers use .catch(err
=> { logger.error(..., err); metrics.increment('cleanup.failure'); /* optional
retry */ }) so orphaned containers/directories are visible in logs/metrics.

Comment on lines +43 to +57
const inspect = await runDocker(['volume', 'inspect', name]);
if (inspect.exitCode === 0) return; // already exists, already chowned

const create = await runDocker([
'volume',
'create',
'--label',
'tale.sandbox-cache=1',
name,
]);
if (create.exitCode !== 0) {
throw new Error(
`volume: failed to create cache volume ${name}: ${create.stderr.trim() || create.stdout.trim()}`,
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle concurrent first-create races for Docker volumes.

Line 43–57 has a check-then-create race. Two callers can both see “not exists”; one volume create will fail with “already exists”, which is currently treated as fatal.

Suggested fix
   const create = await runDocker([
     'volume',
     'create',
     '--label',
     'tale.sandbox-cache=1',
     name,
   ]);
   if (create.exitCode !== 0) {
+    const existsNow = await runDocker(['volume', 'inspect', name]);
+    if (existsNow.exitCode === 0) return;
     throw new Error(
       `volume: failed to create cache volume ${name}: ${create.stderr.trim() || create.stdout.trim()}`,
     );
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/sandbox/src/volume.ts` around lines 43 - 57, The current
check-then-create flow using runDocker allows a race where volume create can
fail with "already exists" and is treated as fatal; update the logic around
runDocker calls in this module (the inspect and create calls in volume.ts) to
treat a create failure that indicates the volume already exists as non-fatal:
after running the create command (the variable named create), if create.exitCode
!== 0 inspect create.stderr/create.stdout for the “already exists” message (or
Docker’s equivalent) and, if present, treat the operation as successful (return)
instead of throwing; only throw for genuine failures that are not the
already-exists case.

Comment on lines +87 to +92
docker network rm tale-sandbox-net >/dev/null 2>&1 || true
docker network create \
--internal \
--ipv6=false \
--driver=bridge \
tale-sandbox-net >/dev/null
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid hard-failing when the sandbox network already exists.

Line 87–92 can fail under concurrent/local runs: if docker network rm is ignored (network still in use), docker network create ... tale-sandbox-net exits non-zero and the script aborts.

Suggested fix
-docker network rm tale-sandbox-net >/dev/null 2>&1 || true
-docker network create \
-    --internal \
-    --ipv6=false \
-    --driver=bridge \
-    tale-sandbox-net >/dev/null
+if ! docker network inspect tale-sandbox-net >/dev/null 2>&1; then
+  docker network create \
+      --internal \
+      --ipv6=false \
+      --driver=bridge \
+      tale-sandbox-net >/dev/null
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
docker network rm tale-sandbox-net >/dev/null 2>&1 || true
docker network create \
--internal \
--ipv6=false \
--driver=bridge \
tale-sandbox-net >/dev/null
if ! docker network inspect tale-sandbox-net >/dev/null 2>&1; then
docker network create \
--internal \
--ipv6=false \
--driver=bridge \
tale-sandbox-net >/dev/null
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/container-smoke-test.sh` around lines 87 - 92, The docker network
create step can fail if the network already exists or is in use; make the setup
idempotent by checking for the network before creating or by swallowing
non-fatal create errors: use `docker network inspect tale-sandbox-net >/dev/null
2>&1 || docker network create --internal --ipv6=false --driver=bridge
tale-sandbox-net >/dev/null` or append `|| true` to the `docker network create
... tale-sandbox-net` call so the script does not hard-fail when the network
already exists or creation returns a non-zero status.

Comment on lines +119 to +127
function checkSandboxToken(env: NodeJS.ProcessEnv): Check {
if (!env.SANDBOX_TOKEN || env.SANDBOX_TOKEN.length < 32) {
return {
name: 'SANDBOX_TOKEN',
status: 'fail',
detail:
'missing or too short — required for HMAC auth between Convex and the sandbox spawner',
fix: 'Re-run `tale init` (or set a 64-char hex value manually)',
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SANDBOX_TOKEN check severity conflicts with optional-token behavior.

This currently fails the whole doctor run when unset/short, but this PR explicitly makes SANDBOX_TOKEN optional. Downgrade to warn (or gate fail by mode) to avoid false hard failures.

Also applies to: 175-178

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/cli/src/commands/doctor.ts` around lines 119 - 127, The SANDBOX_TOKEN
check in function checkSandboxToken currently returns status: 'fail' when
missing/short which conflicts with the PR making SANDBOX_TOKEN optional; change
the returned status to 'warn' (or, if you want to keep fail behavior for certain
modes, gate the failure by checking the runtime mode/flag and only set status:
'fail' when that explicit mode requires the token). Update both occurrences of
this logic (the checkSandboxToken function and the similar check at lines
~175-178) so they return a warning message and appropriate fix text instead of
failing the entire doctor run unless explicitly required by mode.

Comment on lines +29 to +33
// Dev convention: publish 8003 to host loopback so `bun dev`'s local
// convex-local-backend (running on the host) can reach the spawner.
// Matches rag (8001) and crawler (8002). The `tale deploy` generator
// can omit this for hardened prod deployments — same as those services.
ports: ['8003:8003'],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bind sandbox port to loopback explicitly.

ports: ['8003:8003'] exposes the spawner on all host interfaces, but the comment says loopback-only. With docker.sock mounted, this widens attack surface unnecessarily.

Suggested fix
-    ports: ['8003:8003'],
+    ports: ['127.0.0.1:8003:8003'],
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Dev convention: publish 8003 to host loopback so `bun dev`'s local
// convex-local-backend (running on the host) can reach the spawner.
// Matches rag (8001) and crawler (8002). The `tale deploy` generator
// can omit this for hardened prod deployments — same as those services.
ports: ['8003:8003'],
// Dev convention: publish 8003 to host loopback so `bun dev`'s local
// convex-local-backend (running on the host) can reach the spawner.
// Matches rag (8001) and crawler (8002). The `tale deploy` generator
// can omit this for hardened prod deployments — same as those services.
ports: ['127.0.0.1:8003:8003'],
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/cli/src/lib/compose/services/create-sandbox-service.ts` around lines 29
- 33, The ports mapping currently uses ports: ['8003:8003'] which binds the
spawner to all host interfaces; change this to bind to loopback only by
replacing that entry with an explicit loopback host binding (e.g.,
'127.0.0.1:8003:8003') in the create-sandbox-service definition so the container
port 8003 is published only on localhost and reduces the exposed attack surface.

larryro added 24 commits May 20, 2026 10:46
… proxy, spawner)

Container-side foundation for the `code_run` agent tool: an ephemeral
Python/Node sandbox the LLM can invoke to run code with arbitrary packages
and surface generated files (e.g. .pptx via python-pptx) as chat attachments.

Components:

- services/sandbox-runtime: lean Python 3.12 + Node 24 + uv image. Entrypoint
  installs requested packages on demand (`--only-binary=:all:` for pip and
  `--ignore-scripts` for npm by default — closes setup.py / postinstall ACE
  vectors per R2.7), emits PHASE markers for the chat UI, then execs user
  code at /workspace/code/main.{py,js}.

- services/sandbox-egress: tinyproxy sidecar on tale-sandbox-net (an
  internal-only Docker bridge). Filters CONNECT host requests against an
  allow-list (pypi.org, files.pythonhosted.org, registry.npmjs.org, github
  package endpoints). Replaces the originally-planned iptables IP allow-list
  which R1.3/R2.1 showed was unsafe due to shared Fastly/Cloudflare CDN IPs.

- services/sandbox: ~250 LOC Bun HTTP service. POST /v1/execute with
  HMAC-signed body spawns one ephemeral container; POST /v1/cancel/:id
  propagates AbortSignal as docker kill. Workspace is a per-call tmpfs
  Docker volume (size=256m, hard ENOSPC cap per R2.2); pip/npm caches are
  per-org named volumes (closes the R2.3 cross-tenant wheel-cache poison
  vector). docker_args.ts is a pure builder with strict regex validation;
  the #1 regression gate per R1.22 has 9 passing unit tests asserting the
  argv shape and that user code never reaches argv.

- compose.yml: registers both services and the internal `sandbox` network
  pinned to `tale-sandbox-net`. IPv6 disabled on the bridge to prevent
  v4-allowlist bypass via v6 routes (R1.3).

- .commitlintrc.json: add `sandbox` scope.

Convex schema, executeCode action, code_run tool, CLI compose generator
work, and tests follow in M2 and M3.

Plan: /home/larry/.claude/plans/presentation-generation-from-prompts-delightful-aho.md
…ine (M2)

Convex-side of the sandbox feature: a new code_run agent tool, the
sandboxExecutions audit table, and the executeCode internal action that
owns the spawner HTTP round-trip + transactional storage uploads.

Schema (services/platform/convex/sandbox/schema.ts):
- sandboxExecutions table — uploadedBy, agentSlug, lifecycleStatus,
  statusChangedAt, heartbeatAt, estimatedSeconds/actualSeconds, full
  output-files validator, structured errorCode taxonomy (R1.12 + R2.8).
- Indexes: by_organizationId_and_status (quota counting), by_org_user
  (GDPR cascade), by_status (watchdog), by_threadId, by_organizationId.

Mutations (services/platform/convex/sandbox/internal_mutations.ts):
- reserveSlotAndInsert — atomic concurrency-cap + daily-CPU-budget +
  audit-row insert in one mutation, closes the TOCTOU race R1.8/R1.10
  flagged. Uses the same withIndex pattern as video_links/mutations.ts.
- setRunning / heartbeat / finalize.
- recoverStuckSandboxes — watchdog cron flips rows older than
  2×max-timeout to failed/SPAWNER_UNAVAILABLE (Convex 30-min hard-kill
  skips action try/finally; mirrors recoverStuckTranscriptions pattern).

Node action (services/platform/convex/node_only/sandbox/internal_actions.ts):
- executeCode — reserves slot → resolves+validates inputFiles via internal
  query (IDOR check) → setRunning + 60s heartbeat loop → POSTs to spawner
  with HMAC-signed body → all-or-nothing storage upload (rolls back blobs
  on partial failure) → batched fileMetadata insert → finalize. Per
  feedback_no_empty_catch: infra failures throw, user-code failures return
  a structured {success:false, errorCode, ...}.

Tool (services/platform/convex/agent_tools/code/code_run_tool.ts):
- code_run — Python 3.12 + Node 24, packages on demand, inputFiles
  ref-by-fileMetadataId, allowSdist/allowInstallScripts opt-in overrides,
  full errorCode-recovery table in description, tool-selection precedence
  vs excel/pdf/docx/image (per R1.15).

Wiring:
- schema.ts registers sandboxExecutions.
- crons.ts adds 'recover stuck sandbox executions (every 5 min)'.
- soft_delete_validators registers 'sandboxExecution'; soft_delete_helpers
  maps it to the table + uploadedBy author field.
- tool_registry / tool_names register codeRunTool.

Tests + CLI integration (M3) follow.

NOTE: services/platform/convex/_generated/api.d.ts will regenerate on the
next `bunx convex deploy` / `tale start`; the typescript errors against
internal.sandbox.* in this commit are the documented stale-codegen state
from feedback_api_dts_autogenerated, not bugs.
…(M3)

CLI work to deploy the sandbox stack via `tale start` / `tale deploy`:

- ComposeService type extended with cap_add, mem_limit, pids_limit,
  ulimits, security_opt, runtime — previously absent from the generator
  which silently dropped these on the convex service.
- BONUS FIX surfaced by sandbox review (R1.17): create-convex-service.ts
  was shipping the production convex container WITHOUT NET_ADMIN, so
  services/convex/docker-entrypoint.sh:79 was silently logging "iptables
  present but no NET_ADMIN capability — SSRF firewall NOT installed" on
  every deploy. Apply the missing cap_add + mem_limit + pids_limit +
  ulimits flags so production deployments finally get the SSRF egress
  firewall the entrypoint was always trying to install.
- New service factories: create-sandbox-service (HTTP spawner, mounts
  docker.sock, two-network membership) and create-sandbox-egress-service
  (tinyproxy sidecar on the internal sandbox bridge).
- STATEFUL_SERVICES includes 'sandbox' + 'sandbox-egress' so the
  deploy.ts auto-include-missing-stateful logic picks them up on the
  next `tale deploy` after upgrade — no migration registry entry needed.
- ensureSandboxNetwork() creates `tale-sandbox-net` (fixed Docker name,
  --internal, --ipv6=false). Called from both start.ts (dev) and
  deploy.ts (prod) infrastructure setup.
- ensure-env: SANDBOX_TOKEN added to requiredVars + secretDefaults
  (auto-generated 32-byte hex). generateEnvContent emits SANDBOX_TOKEN +
  SANDBOX_RUNTIME / SANDBOX_EGRESS_ALLOWLIST comment block for operators
  to override.
- New command: `tale doctor` — preflight checks for sandbox host
  requirements (docker, /var/run/docker.sock, runsc registration with
  dockerd, userns-remap, AppArmor docker-default, SANDBOX_TOKEN
  presence). R1.17 surfaced that there was no `doctor` command at all;
  scope here is intentionally narrow (sandbox-relevant only) to avoid
  scope creep — future Docker version / disk headroom checks belong
  here too but separately.

CLI typecheck passes (`bunx tsc --noEmit`).

Next: tests (M3 test pass) + bun run check until green.
…ient

oxlint flagged 4 issues in M1/M2 code:

- spawner_client.ts:97 — `throw new Error(...)` inside `catch (err)` without
  forwarding the original. Added `{ cause: err }` so debugging keeps the
  network-error chain.
- internal_actions.ts:445 — same pattern; same fix.
- spawner_client.ts:113 — `await res.json() as SpawnerExecuteResponse`. Annotated
  with `oxlint-disable typescript/no-unsafe-type-assertion` because the wire
  contract is validated on the spawner side; trusting it here is by design.
- internal_actions.ts:177 — `err.data as { message?: string }`. Same disable,
  scoped to the line that runs only after a `'message' in err.data` narrowing.

The remaining lint error (lib/seo/integration.test.ts) predates this branch
and is unrelated to the sandbox work.
)

Mocks _generated/server.internalMutation so the real handler is callable
with a fabricated ctx (matches the file_metadata/internal_mutations.test.ts
pattern). Covers:

- Empty in-flight → row inserted with status='queued', lifecycleStatus='active'.
- Cap reached (4 running) → throws ConvexError (atomic concurrency cap,
  closes the TOCTOU race R1.8/R1.10 flagged).
- Daily CPU budget pre-debit overflow (4 × 500s prior + 30s requested >
  1800s cap) → throws — pre-debit semantics verified, closes R1.10's
  post-debit overshoot.
- recoverStuckSandboxes — only the row whose heartbeatAt is older than
  2×max-timeout gets flipped to failed/SPAWNER_UNAVAILABLE.

All 4 tests pass via vitest. Combined with the 9-test argv builder gate
shipped in M1, that's two of R1.22's five critical regression gates.
The remaining three (in-container privilege assertion, fileMetadata IDOR
via inputFiles, cancellation propagation) require either a running
docker daemon (privilege) or a Convex test harness (IDOR / cancellation);
both are integration-test scope and best added when wiring up CI for the
sandbox stack.
Five real bugs surfaced by trying to actually hit /v1/execute end-to-end:

1. **Docker CLI API skew (L3 /health 503)** — Debian's `docker.io` 20.10.5
   speaks API 1.41; modern daemons require ≥1.44. Switched to the official
   static CLI via `COPY --from=docker:27-cli` and changed /health from
   `docker info --format` (which panics in old CLI) to `docker version
   --format`.

2. **tmpfs Docker volumes aren't shared between docker run calls** — my
   original multi-helper staging (busybox containers writing to a shared
   workspace volume) silently failed because each `docker run` of a
   tmpfs-driver-local volume creates a fresh mount. Tried piping tar to
   stdin into the runtime container, but then…

3. **`docker cp` can't read from --tmpfs mounts** — verified directly: file
   present inside container (`docker exec ls`) but `docker cp` returns
   "Could not find the file ... in container". Switched workspace to a
   1:1 host bind mount under /var/lib/tale-sandbox/sessions/<uuid>/.
   Spawner now stages files via Bun fs (no busybox/tar dance) and harvests
   outputs the same way. The compose + CLI factory mount
   /var/lib/tale-sandbox into the spawner container at the same path so
   the docker daemon (resolving against host fs) and the spawner agree on
   it. Trade-off: lose the perfect tmpfs ENOSPC cap; keep --ulimit
   fsize=100m per file + post-run rm -rf.

4. **uv/pip fail under --read-only** — uv writes to $HOME/.cache/uv,
   nobody's $HOME is /nonexistent which is RO. Set HOME=/tmp (we have
   --tmpfs /tmp) + UV_CACHE_DIR=/cache/pip.

5. **Cache volume permission denied** — new Docker volumes are root-owned
   by default; the runtime user is 65534. ensureCacheVolume now detects
   first-creation and runs a transient busybox to chown the mount point
   65534:65534.

6. **Internal-only sandbox network blocks tinyproxy's own outbound** —
   sandbox-egress couldn't resolve pypi.org because `internal: true` cut
   all DNS too. Put sandbox-egress on BOTH `sandbox` (where runtime
   containers reach it) and `internal` (where it has internet for the
   upstream tunnel). Runtime containers stay solely on sandbox.

7. **Timeout didn't kill the container** — killing the docker CLI process
   doesn't stop the sibling container; it just disconnects the wrapper.
   Two-tier timeout now: inner timer issues `docker kill <name>` at
   timeoutMs; outer (CLI process kill) at timeoutMs+30s as belt-and-
   suspenders.

8. **Error classifier patterns stale** — uv's "no matching distribution"
   has become "unsatisfiable"; runtime-time egress denial exits with 1
   not 64. Broadened PACKAGE_NOT_FOUND regex; classify EGRESS_DENIED on
   any exit when stderr matches.

9. **tinyproxy log file root-owned** — entrypoint chowns the log to
   nobody so tinyproxy (which drops privs) can write to it.

L4 verified end-to-end:
- python hello world: 620ms
- python-pptx (warm cache): 1.18s, real .pptx file with 3 slides
- TIMEOUT: exit 137 at 3.27s for a sleep(30) with timeoutMs=3000
- EGRESS_DENIED, PACKAGE_NOT_FOUND, RUNTIME_ERROR: all classified
- HMAC mismatch: 401

api.d.ts regen picked up sandbox/* + agent_tools/code/* on the platform
restart that happened during testing — included.
The L4 spawner regex was UUID-only (hex + hyphens), but executeCode passes
the Convex audit row's _id (lowercase alphanumeric, base36-ish) as the
spawner executionId. Broaden both regexes to [a-zA-Z0-9_-]{1,64} — still
safe for Docker container names and argv positions, now accepts both
UUIDs and Convex ids.

L5 verified end-to-end: executeCode action via convex run produced a
real .pptx in 2.6s with audit row + fileMetadata row both populated.
…nt demo

- compose.yml + create-sandbox-service: publish 8003:8003 (same shape as
  rag/crawler dev convention). `bun dev` runs convex-local-backend on
  the host; without this the executeCode action can't resolve `sandbox`
  (Docker DNS). The tale-deploy generator can omit the port for hardened
  prod deployments — same option as rag/crawler.

- examples/agents/chat-agent.json: add `code_run` to toolNames; flip
  Rule 7's "do NOT produce .pptx — there is no PPTX export" line in
  EN / DE / FR system prompts to "for downloadable .pptx call code_run
  with python-pptx==1.0.2". HTML in-chat slide guidance via
  artifact_create is preserved as the in-chat default.

Verified end-to-end via host loopback: HMAC-signed POST to
127.0.0.1:8003/v1/execute produces a real .pptx in 2.3s.

Recipe for bun dev:
  1. .env has SANDBOX_TOKEN=<hex> and SANDBOX_URL=http://sandbox:8003
     (for the dockerized convex container).
  2. services/platform/.env.local has the same SANDBOX_TOKEN and
     SANDBOX_URL=http://127.0.0.1:8003 (sync-convex-env-from-dotenv
     applies higher priority for .env.local; bun dev's local convex
     backend picks up the loopback URL).
  3. `bun dev` from services/platform/.
When code_run finished it left the file as a fileMetadata row but never
attached it to the assistant message — so the LLM said "your pptx is
ready" and nothing appeared in the chat bubble. Mirror what excel/pdf/
docx tools do:

- insertOutputFiles + executeCode now thread storageId through to the
  tool layer (was only returning fileMetadataId).
- code_run tool calls appendFilePart(ctx, {fileName, mimeType,
  downloadUrl}) per output file, with downloadUrl built via
  buildDownloadUrl(storageId, name) — same helper excel_tool uses.

Verified: re-ran executeCode directly via convex run; the response now
carries storageId for each file. The chat-bubble attachment fires when
the tool is invoked through the agent loop (next chat send should show
the .pptx card alongside the LLM's text response).
bun dev surfaced this on first invocation:

  Uncaught Error: SANDBOX_TOKEN env var is required for sandbox/code_run;
  set it in .env
    at handler (.../convex/node_only/sandbox/internal_actions.ts:438:8)

bun dev's convex-local-backend runs on the host with whatever env it
gets from .env / .env.local. The hard throw in getSpawnerToken() turned
"forgot to set the secret" into "tool is dead." rag (8001) and crawler
(8002) both sit on the same internal Docker network with no auth and
just work; sandbox should match.

Auth is now opt-in. Both sides agree:

  unset on both          -> unsigned requests accepted; one-time boot
                            warning on the spawner
  set on spawner only    -> 401 (catches client/server config drift)
  set on client only     -> harmless (client signs, server ignores)
  set on both            -> HMAC required, mismatch = 401

tale init still auto-generates SANDBOX_TOKEN so prod stays HMAC-on by
default; this only removes the hard-error path when the secret happens
to be missing at runtime.

Files (4):
- services/sandbox/src/types.ts        sandboxToken: string | null
- services/sandbox/src/config.ts       drop requireEnv; treat "" as unset
- services/sandbox/src/server.ts       gate verify() on token !== null;
                                       warn once on boot in unauth mode
- spawner_client.ts                    drop throw; omit signature header
                                       when token is null

Verified:
- spawner with SANDBOX_TOKEN unset boots, logs the warning, accepts an
  unsigned POST and runs python in 482ms.
- spawner with SANDBOX_TOKEN set still returns 401 on bad/missing sig.
- 9 argv unit tests still pass.
…r bun dev)

User's `bun dev` action threw `fetch failed: sandbox:8003` because the
client defaulted to the Docker DNS name. From the host, that doesn't
resolve.

The convention rag and crawler follow is the inverse:
  - Code default = http://localhost:<port>  (works for bun dev with
    published ports)
  - services/platform/env.sh sets the docker DNS as the default for
    in-container processes, e.g. RAG_URL="${RAG_URL:-http://rag:8001}".
    The platform docker-entrypoint sources env.sh and then convex env
    sets the value into the convex backend, so dockerized actions see
    the docker name.

Two changes:

- spawner_client.getSpawnerUrl() defaults to http://localhost:8003.
- services/platform/env.sh adds SANDBOX_URL="${SANDBOX_URL:-http://sandbox:8003}"
  next to RAG_URL / CRAWLER_URL / SEARCH_SERVICE_URL.

Net effect:
  bun dev (host node)          → code default localhost:8003 → works
                                  zero-config via the published port
  dockerized convex (in compose) → env.sh default sandbox:8003 → works
  operator override (either)   → SANDBOX_URL in .env / .env.local /
                                  docker compose environment block
                                  takes precedence

Sandbox argv unit tests (9) still pass.
Foundation for folding code_run into the artifact system (plan
Refinement 2). The sandbox execution layer is unchanged behaviorally;
the wire protocol grows phase events so the canvas can show live
progress, and the artifacts schema grows run-state fields so a runnable
artifact's row holds its current execution state.

Schema (artifacts/schema.ts):
- type union gains `python_runnable` + `node_runnable`
- run-state fields (all optional — non-breaking per
  feedback_deprecate_dont_delete_schema_fields):
  runPackages, runOptions, runStatus (queued|installing|running|
  completed|failed|cancelled), runProgress, runStartedAt, runCompletedAt,
  runExitCode, runErrorCode, runErrorMessage, runStdoutPreview/StorageId,
  runStderrPreview/StorageId, runOutputFiles, runExecutionId

Mutations (artifacts/internal_mutations.ts):
- initArtifactRun: set runStatus='queued' + clear prior-run remnants
- patchArtifactRunProgress: mid-flight status/progress updates fired
  on PHASE events from the spawner
- finalizeArtifactRun: write final exit code + output files + clear
  the live progress string

Spawner (services/sandbox/src/):
- spawn_util.runDocker now accepts onStdoutChunk callback that fires
  per chunk during the docker run (vs after exit).
- spawn.executeRequest accepts ExecuteRequestOptions.onPhase and runs
  a line-buffered parser over the stdout chunks; PHASE: installing /
  PHASE: running lines fire the callback with {phase} events.
- server /v1/execute switches from buffered JSON response to SSE:
  emits 'event: phase data: {...}' per phase + final
  'event: result data: {...full response...}'.

Verification:
- 9 argv unit tests still pass.
- bun run typecheck clean.
- curl smoke against rebuilt container: SSE stream emits two phase
  events ('installing', 'running') then the final result event with
  a real .pptx in outputFiles.

M5b will switch the convex spawner_client to a streaming consumer and
wire artifact_create / artifact_edit to call executeCode for runnable
types.
…r runnable types (M5b)

Builds on M5a's schema + SSE foundation. The agent-facing tool surface
is now the artifact pair (create/edit); the sandbox spawner stays
unchanged and is invoked transparently when an artifact's type is
runnable.

- spawner_client.spawnerExecute now consumes the SSE stream from the
  spawner. Phase events fire an optional onPhase callback; the final
  `event: result` payload is returned as the same SpawnerExecuteResponse
  shape callers had before (drop-in replacement; signature gained an
  optional `callbacks` arg).
- spawner_client also exposes an SSE event parser that tolerates partial
  reads and chunk boundaries.

- executeCode internal_action gains optional `artifactId`. When set:
  - onPhase fires patchArtifactRunProgress with a human-readable
    progress string ("Installing python-pptx" / "Running code") so the
    canvas runnable-code-renderer can subscribe and show live state.
  - On success, finalizeArtifactRun writes runStatus=completed plus
    exit code, stdout/stderr previews, and runOutputFiles. The audit
    sandboxExecutions row still gets its own forensics; the artifact
    row holds the latest result for fast canvas reads.

- artifact_create_tool:
  - shared.ts adds runnable types to the enum + isRunnableArtifactType /
    runnableLanguage helpers.
  - input schema gains optional `packages`, `allowSdist`,
    `allowInstallScripts`, `timeoutMs` (gated semantically on runnable
    types).
  - execute(): after the canonical content settle, runnable types
    call initArtifactRun then executeCode with the new artifactId so
    the spawner streams progress straight to the artifact row.

- artifact_edit_tool:
  - both patch and rewrite success branches call a new local
    maybeRerun() helper. For runnable types this reloads the row to
    pick up runPackages/runOptions captured at create time, fires
    initArtifactRun to clear prior-run remnants, then re-invokes
    executeCode with the new content.
  - LLM can iterate via small patches without re-emitting the full
    script; canvas subscribes to the same artifact row and updates.

The `code_run` standalone tool still exists; M5c will remove it and
update the demo agent / system prompt to point at the unified path.
…/ node_runnable (M5c)

remove the standalone code_run tool in favor of two new artifact types
(python_runnable, node_runnable) so sandbox code execution shares the
artifact tool surface: live source streaming in the canvas pane, patch /
rewrite edit semantics for multi-turn refinement, and LLM context-aware
iteration via build_artifacts_context.

- canvas: new canvas-runnable-code-renderer with source on the left and
  status chip + file chips + collapsible stdout/stderr on the right;
  canvas-pane routes python_runnable / node_runnable to it
- context: build_artifacts_context surfaces runStatus / runErrorCode /
  runOutputFiles attributes for runnable types so follow-up turns pick
  the right next action (patch on failure, leave alone on completion)
- agent: chat-agent.json EN/DE/FR rule 7 points at artifact_create with
  type "python_runnable" instead of code_run
- registry: drop code_run from tool_names / tool_registry and remove the
  tool source file
artifact_create for python_runnable / node_runnable types was crashing
the chat with "Element type is invalid: ... got: undefined" because
ArtifactBar's TYPE_ICONS and message-bubble's ARTIFACT_PILL_ICONS only
had entries for the original five canvas types. tsc didn't catch this
because the artifact row reaches them as `any` through useQuery.

also store storageId alongside fileMetadataId on artifact runOutputFiles
so the canvas right-pane file chip can build a download url without a
second roundtrip (api.file_metadata.queries.getById doesn't exist).
…tool results

before this fix, artifact_create and artifact_edit awaited executeCode for
runnable types but discarded the entire run outcome — they only returned
{success: true, artifactId, revision, message} to the llm. so a python
script that exited with RUNTIME_ERROR still showed up as a successful
tool call and the llm happily told the user "文件已生成" while no file
existed.

now both tools forward the run outcome to the llm for runnable types:
runStatus, runExitCode, runErrorCode, runErrorMessage, runStdoutPreview,
runStderrPreview, durationMs, files[], executionId. tool-level success
is redefined for runnable types as (runStatus === 'completed' && files
.length > 0) so the llm can branch correctly. the tool description gains
an errorCode recovery table and an explicit "never say file generated
unless success === true && files.length > 0" guardrail. chat-agent rule
7 EN/DE/FR gain the same checkpoint.

canvas right pane and build_artifacts_context were already showing the
failure state correctly — this aligns the llm's reply with what the user
sees in canvas.
…nderer

users reported that after a python_runnable artifact finishes, the
download chip and run status were nowhere to be seen on the canvas.
root cause: the canvas-pane keeps `showStreamingSource=true` for 10s
after stream start (MIN_SOURCE_VIEW_MS dwell window), and during that
window the source-only branch runs first — it knew nothing about
runnable types and just rendered raw source via CanvasCodeRenderer,
bypassing the runnable renderer with its file chips / status panel.

route runnable artifacts to CanvasRunnableCodeRenderer regardless of
the streaming-source dwell. the renderer now also handles its own
`isStreaming` so the source view still renders with the streaming caret
during the source phase, while the run panel updates reactively from
the artifact row.

additionally, the runnable renderer stacks at narrow canvas widths.
moved the execution panel ON TOP of the source for narrow canvases
(default 480px, below md breakpoint) so the file chip is visible
immediately. on wide canvases (≥768px) the side-by-side layout is kept.
…p md: layout switch

the responsive md:flex-row split was based on viewport width, not canvas
pane width, so a typical 1536px viewport with a 480px canvas triggered
md: mode and tried to side-by-side the panel — but the panel's md:w-80
(320px) plus a flex-1 source ate the whole canvas and squeezed the run
panel out of layout (height=0, width=0 confirmed via getBoundingClientRect).

simpler: always stack execution panel on top, source below. canvas pane
max width is 900px anyway, so side-by-side was cramped even when it did
fit. tailwind container queries would be the responsive answer but a
simple stack is fine here.

also add the canvas.runDone i18n key in en/de/fr that the renderer was
referencing (was leaking the raw key into the ui).
… the race to execute

artifact_create's onInputDelta inserts a placeholder row (toolCallId
recorded, liveStreamMode='create', empty content) and only sets
state.artifactId AFTER the mutation roundtrip returns. when the model
finishes streaming the input fast enough, the AI SDK fires `execute`
before that mutation roundtrip lands — state.artifactId is still
undefined, so execute falls through to the createArtifact else branch
and inserts a SECOND row with full content. result: two artifacts for
one tool call, one empty + one settled, showing up in the artifact bar
as two duplicate-titled v1 tabs.

defensive fallback: in execute's else branch, before inserting a new row,
look up any in-flight create-placeholder for this toolCallId scoped to
org+thread. if found, finalize THAT row instead of inserting fresh. the
new internal query findStreamingPlaceholderByToolCallId is scoped tight
enough that it can never claim a settled or cross-conversation artifact.

no schema change. lookup runs only on the rare fallback path so the
extra (indexless) toolCallId filter inside the org-and-thread index walk
is fine — the thread's recent artifact set is small.
…n tool

artifact_create and artifact_edit for python_runnable / node_runnable
now behave like static types — they only write / patch the source row.
they no longer await executeCode, no longer surface a run outcome, no
longer auto-run. the new artifact_run tool is the explicit, llm-driven
trigger to execute the script: it reads the row's persisted runPackages
/ runOptions, calls executeCode, and returns the full run outcome
(runStatus, runErrorCode, runStderrPreview, files[], executionId, ...).

why split it out: real chat smoke showed the llm keeping calling
artifact_create on a fresh thread when a run failed — three duplicate
v1 tabs of "大熊猫介绍 PPTX" accumulating, never invoking artifact_edit.
the unified semantics conflated "author source" with "execute it" and
the natural llm recovery from a failed run was to create a new artifact.
making execution its own tool decouples the two decisions so the llm's
next-action decision tree narrows correctly:
- need to make a deck → artifact_create (no auto-run)
- need to run it → artifact_run(artifactId)
- run failed → artifact_edit(artifactId, patch) → artifact_run again

chat-agent.json rule 7 (EN/DE/FR) is rewritten to walk this 4-step
cycle and to explicitly forbid calling artifact_create twice for the
same request.

no schema change, no spawner change, no executeCode action change,
no canvas change. the toolCallId race-recovery from a18b5eb stays.
runPackages / runOptions are still persisted on the row via the
existing initArtifactRun mutation, called once at create time so
artifact_run picks them up automatically.
before this patch:
- during artifact_create streaming, the canvas right panel showed a bare
  "RUN" header with nothing beneath it (no status, no files, no errors)
  because the artifact row has no run state yet — looked broken.
- after artifact_create settled, the panel filled with a stale "queued"
  status that never resolved, because artifact_create called
  initArtifactRun which forces runStatus='queued' as a side-effect of
  persisting the run config — but with the split flow no run actually
  gets queued until artifact_run is invoked.

split initArtifactRun into two:
- new setArtifactRunConfig — only persists runPackages / runOptions
  (called by artifact_create after the source settles).
- existing initArtifactRun — persists config AND resets runStatus to
  'queued' (called by artifact_run immediately before executeCode, same
  as today).

canvas renderer now hides the execution panel entirely while runStatus
is undefined and no prior-run artefact (files / stderr / errorCode) is
present. once artifact_run kicks off, the panel re-appears with live
status. completed-state artifacts still show the panel + file chips as
before.
…t duplicate rows

artifact_create's onInputDelta was running this guard:

  if (!state.rowInitialized && ...) {
    const inserted = await ctx.runMutation(createArtifact, {...});
    state.artifactId = inserted.artifactId;
    state.rowInitialized = true;  // ← set AFTER the await
    ...
  }

the AI SDK dispatches input-deltas back-to-back without waiting for the
prior handler's promise to settle. when two deltas arrive while the
first createArtifact mutation roundtrip is still in flight, BOTH delta
handlers observe `state.rowInitialized === false`, both pass the guard,
and both insert placeholder rows. user ends up with two `v1` tabs of
the same title in the artifact bar — one empty, one finalized — for a
single observed artifact_create tool call.

flip `state.rowInitialized = true` synchronously immediately after the
guard passes, before awaiting the mutation. the second delta now sees
the guard closed and skips the insert. supersedes the toolCallId
race-recovery from a18b5eb for this particular case (that fix was
designed for a different race — state.artifactId being undefined when
execute fires; this is the simpler in-handler check-then-act). the
race-recovery in execute stays for the residual case of execute firing
before the placeholder mutation lands.
- Break a self-referential Convex type in `executeCode` that resolved to
  `any` and cascaded 130+ implicit-any errors across the platform: type
  the handler return explicitly and use `Id<'sandboxExecutions'>` instead
  of `Awaited<ReturnType<typeof ctx.runMutation<...>>>`.
- Replace narrowing `as` casts in new sandbox code with type guards
  (artifact-type set, SSE phase, JSON parse) so oxlint stops flagging
  them.
- Register `services/sandbox` as a knip workspace; drop `export` from
  helpers only used in their own file; delete the unused
  `removeVolume` / `harvestOutput` / `CancelResponse` exports.
- Pre-create `tale-sandbox-net` in the smoke-test script (it's external
  in compose.yml; the CLI owns its lifecycle, the smoke script didn't).
  Tear it down in cleanup.
- Anchor the service-image grep in the smoke + validate-images scripts
  to `/tale-${service}:` so `db` no longer matches `tale-sandbox-egress`
  via the `db` substring in `sandbox`.
- Suppress AVD-DS-0002 on the two new Dockerfiles that legitimately run
  as root (sandbox spawner needs the docker socket; sandbox-egress
  entrypoint chowns the log before tinyproxy drops privs).
The previous attempt used inline `# trivy:ignore:AVD-DS-0002` comments
before `FROM` in services/sandbox{,-egress}/Dockerfile, but the misconfig
scanner reports these whole-file rules at line 1 and does not honor
adjacent-line ignores for them — the alerts kept firing on the PR.

Switch to `.trivyignore.yaml` with per-path scoping so each suppression is
narrowly targeted and documented (statement field), and wire the file into
the security.yml Trivy step via `trivyignores:`. Per-path beats `.trivyignore`
plaintext, which would mask the same finding across every Dockerfile.

Also suppress AVD-DS-0026 (no HEALTHCHECK) on sandbox-runtime: it is an
ephemeral one-shot image — the spawner runs it per code_run call, the
entrypoint executes user code, and the container exits.

Image-level scans in build.yml run `vuln,secret` by default (no misconfig),
so no change is needed there.
@larryro larryro force-pushed the feat/sandbox-code-run branch from d906dd0 to cc5514a Compare May 20, 2026 02:47
Closes ~all P0/P1 findings from the two-round /tmp/sandbox-review pass.

Security / control plane
- create-sandbox-service.ts binds the spawner to 127.0.0.1 (was 0.0.0.0)
  — docker.sock + unsigned-token would have been remote-root.
- server.ts now exits at boot when SANDBOX_TOKEN is unset, unless the
  operator sets SANDBOX_ALLOW_UNAUTH=true. ensure-env auto-mints the
  token; tale doctor's required-token check matches the runtime contract.
- server.ts caps /v1/execute body at 256 KB, validates executionId
  before registering it, and the cancel route regex now matches the
  Convex doc-id alphabet (was UUID-only). req.signal aborts now drive
  cancelExecution so a closed SSE stream tears the container down.
- seccomp.json was a 3-line stub that nothing wired up; deleted.

Egress
- tinyproxy.conf.template gains Allow ACLs + DisableViaHeader Yes +
  AnonymousHeader stripping XFF/Forwarded/Via/From.
- entrypoint.sh installs iptables REJECT rules for 169.254.169.254 and
  RFC1918 ranges (mirrors services/convex/docker-entrypoint.sh).
- Dockerfile + healthcheck switched to a CONNECT probe that fails when
  the allowlist is broken; cap_add: [NET_ADMIN] added.
- Egress runs on sandbox + internal — internal provides the NAT path
  (tale-sandbox-net is --internal) without weakening the hostname-only
  exposure (peers can already reach pypi directly via their own NAT).

Spawner runtime correctness
- cancelExecution wraps dockerKill in a 5s timeout with SIGKILL
  fallback so a wedged daemon never hangs the HTTP cancel response.
- SIGTERM handler stops accepting requests, aborts in-flight executions,
  and drains for up to 20s before exit (was process.exit(0)
  unconditionally — was leaking /var/lib/tale-sandbox/sessions/<id>/).
- Dead volume sweep removed (it filtered tale.sandbox=1 on volumes that
  never exist; per-execution storage is a host bind mount, cache
  volumes are tale.sandbox-cache=1 and must NOT be reaped). Replaced
  with a host-dir mtime sweep over the session root.
- stdout phase parser drains lineBuf on EOF; the final unterminated
  PHASE: line no longer leaks into user-visible stdout.
- Six empty catch() blocks across spawn.ts / spawn-util.ts /
  spawner_client.ts / internal_actions.ts replaced with console.warn
  + cause (AGENTS.md no-empty-catch rule).
- numEnv rejects whitespace-only and negative values; SANDBOX_PORT
  range-checked.
- Filenames in services/sandbox/src/ migrated to dash-case
  (docker-args.ts, spawn-util.ts) per AGENTS.md.

Convex state machine
- sandbox/internal_mutations.ts: finalize early-returns when the row
  is already terminal (closes the watchdog-vs-action race that was
  clobbering SPAWNER_UNAVAILABLE audit data); setRunning is monotonic
  across queued → installing → running; watchdog sweeps queued AND
  installing AND running rows past the cutoff (a throw between
  reserveSlotAndInsert and setRunning no longer leaks a quota slot
  forever).
- node_only/sandbox/internal_actions.ts: one failExecution helper
  consolidates the six prior failure exits. It finalizes the audit row
  AND finalizeArtifactRun (so the canvas spinner can no longer hang
  forever) AND cascade-deletes any _storage blobs uploaded for the
  doomed run. INPUT_REJECTED replaces the misleading
  SPAWNER_UNAVAILABLE code for IDOR-rejected input paths. The
  AbortController is now actually .abort()ed in finally.
- artifacts/internal_mutations.ts: patchArtifactRunProgress and
  finalizeArtifactRun gain terminal-status guards mirroring the audit
  side. initArtifactRun no longer hard-patches runPackages /
  runOptions — the artifact_run tool's per-call overrides stay
  per-call (was contradicting its own documentation).

Wire-shape consolidation
- New convex/sandbox/wire.ts is the single source of truth for
  error-code / run-status / phase / output-file validators + literal
  arrays (pattern matches convex/tts/error_codes.ts). sandbox + artifact
  schemas + internal mutations + action + spawner_client all import
  from it. The 9-member errorCode union previously copied 6 times
  is now defined once; INPUT_REJECTED added.
- services/sandbox/src/wire.ts is the spawner-side mirror (the spawner
  cannot import from Convex). Centralized ID alphabet regex shared by
  server.ts / spawn.ts / docker-args.ts / volume.ts.
- runProgress changes from a server-authored English string to a
  structured { kind, package?, version? } shape so the UI can render
  localized text via ICU. en/de/fr message files gain the full
  chat.canvas.runStatus.* / runErrorCode.* / runProgress.* /
  typeLabel.* key set; keys-dynamic.txt registers the four enum-driven
  prefixes.

Agent tool surface
- artifact_create_tool deletes the ghost timeoutMs field (artifacts
  schema has no runTimeoutMs column; was silently dropped) and the
  unused ArtifactCreateRunOutcome interface.
- artifact_run_tool removes the inlined Chinese description example
  per feedback_no_locale_terms_in_tool_descriptions; the try/catch
  now wraps runQuery (where v.id() throws) instead of toId (which is
  a pure cast).
- Stale code_run literals across services/sandbox/package.json,
  sandbox-runtime/Dockerfile, doctor.ts, ensure-env.ts replaced with
  artifact_run.

Soft-delete unregistration
- sandboxExecution is audit-only; the half-wired soft-delete plumbing
  (no by_org_lifecycleStatus index, not in TRASH_VISIBLE_RESOURCE_TYPES,
  no fetchTrashSubpage case, no retention cleanup, no storage cascade)
  is removed cleanly along with the lifecycleStatus field and the
  unused by_org_user / by_threadId indices.

UI / i18n
- canvas-runnable-code-renderer routes every user-visible string
  through useT('chat'). The renderer reads runProgress.kind/package/
  version and lets ICU format the localized text.
- New canvas/icon-map.ts consolidates the per-artifact-type Icon
  Record that lived in three places (artifact-bar.tsx, canvas-pane.tsx,
  message-bubble.tsx) with a drift between them; TYPE_LABEL_KEYS,
  TYPE_EXTENSIONS, TYPE_MIME_TYPES also moved.

CI / images / smoke
- build.yml + cleanup-pr-images.yml matrices add sandbox, sandbox-egress,
  and sandbox-runtime — none of which were built before this commit
  (first tale deploy would have failed at the first artifact_run call
  with image-not-found).
- spawner calls ensureImage(cfg.runtimeImage) at boot, with retry.
- tests/container-image-test.sh and container-smoke-test.sh include
  the three new images; the smoke now exercises a signed
  POST /v1/execute end-to-end probe.
- sandbox-runtime/entrypoint.sh stops writing the write-only
  install-report.json + install-stdout.log files.

Build / test loop
- services/sandbox added to root workspaces (was outside bun run check
  reach). Renamed to @tale/sandbox; lint/format scripts added.
- Dockerfile multi-stage with literal bun.lock + --frozen-lockfile
  --production; non-root user where docker.sock permits.
- New services/sandbox/src/server.test.ts (21 tests) covers
  cancel-route regex, loadConfig fail-closed defaults, HMAC verify.
- knip.config.ts entry for services/sandbox now anchors on src/server.ts.

Verification
- bun run check: 44/44 tasks, 70 661 platform tests pass, knip clean.
- services/sandbox: 21 tests pass.

Plan: /home/larry/.claude/plans/plan-issues-glistening-peach.md
Review reports: /tmp/sandbox-review/round{1,2}/
larryro added 30 commits May 21, 2026 13:20
The recent multi-file artifact refactor (511e6b3) gave artifacts a
files[] map but `artifact_run` could still only execute `entryFile`.
LLMs following multi-script skills like Anthropic's pptx (generator +
validator) were forced to create separate artifacts per script — exactly
what the multi-file model was meant to prevent.

This change lets one artifact hold many runnable scripts; the LLM picks
which to execute via `artifact_run({path})`, and all sibling files are
staged on disk so the executed script can `import` / `require` /
`subprocess` them.

Wire-shape changes (spawner + platform updated atomically):
- ExecuteRequest gains optional `files: SandboxFile[]` + `entryPath: string`
- `code` still required and carries the executed file's content; old
  spawners ignore the new fields and keep working (cross-deploy compat)
- Spawner-side validate-request.ts enforces path safety (max 50 files,
  ≤800KB aggregate, POSIX + ASCII allowlist, no traversal, NUL, BiDi,
  hidden dotfiles, case-insensitive uniqueness)
- New shared regex/caps in sandbox wire.ts (FILE_PATH_SEGMENT_RE,
  MAX_FILES_PER_REQUEST, MAX_FILE_PATH_LENGTH, MAX_FILES_BYTES)
- stageWorkspace writes every file at /workspace/code/<path> with
  resolve+prefix check (defense in depth), then mirrors the executed
  content to main.{py,js} — runtime entrypoint unchanged

Platform side:
- executeCode action passes through `files`/`entryPath` to the spawner
- spawner_client SpawnerExecuteBody mirrors the new fields
- artifact_run_tool accepts `path: string?` (defaults to entryFile),
  validates via the platform's 16-rule validatePath, refuses on
  file_missing / empty_file with structured messages, forwards every
  file from resolveArtifactFiles to the sandbox
- Tool description: explicit "one artifact, many runnable files" guidance
  with pptx-style example, plus an explicit note that each run gets a
  fresh container (no cross-run workspace persistence) — validators must
  be invoked from the generator via subprocess/import, not as a separate
  artifact_run call

Path validation runs in 3 layers: platform tool boundary, spawner request
validator, spawner staging loop (resolve+startsWith on codeDir).

Verification: `bun run check` clean on @tale/platform (typecheck + lint)
and @tale/sandbox (typecheck + lint, modulo the 6 pre-existing lint
errors on main in validate-request.ts that are unrelated to this change).
cancelVideoLink early-returned for completed/failed rows, leaving them
unbound and re-emitted by listForUserUnboundChat on next page load. The
X button's local hideJobIds set is React state and dies on refresh, so
the chip kept reappearing. Now flips status='skipped' for any non-
skipped row; cleanup's existing messageBoundAt guard keeps bound rows
safe.
…reaming

Three connected canvas-pane fixes driven by the PPT-generator flow
(main.js + verify.js), where running a verifier wiped the generator's
download chip and `artifact_create` waited until execute() to show any
content.

1. `artifact_create` placeholder-row streaming. onInputStart/onInputDelta
   parse the partial tool input; once (type, title) are committed,
   beginCreateStream inserts a revision-0 placeholder row with
   liveStreamMode='create' + toolCallId. The canvas opens immediately
   and the existing tool-input-delta -> extractContentDelta path lights
   up the source view token-by-token. execute() settles via
   finalizeCreateStream (revision 1 + artifactRevisions row + clear
   streaming flags), or hands off to createArtifact on collision /
   type_mismatch. cleanupStaleStreams now deletes revision-0 rows so a
   crashed onInputDelta does not leak empty artifacts.

2. Multi-file sidebar. New CanvasFileSidebar renders the project's
   `files[]` (synthesized for legacy rows via resolveArtifactFiles),
   with an entry-file badge and a pulse dot on the file the LLM is
   currently writing. Active file lives in CanvasContext; canvas-pane
   drives renderer / editor / download from the selected file. Source-
   view mode now triggers only when the streaming path matches the
   active file (with a fallback to entryFile for legacy streams that
   don't set streamingPath).

3. Per-file run results. sandboxExecutions records the executed file
   path. New getLatestRunPerFile query keys run-state UI by activePath
   so verify.js failing no longer clobbers main.js's runOutputFiles —
   the .pptx download chip stays reachable even when the user switches
   to a sibling that produced no output. Output chips are shown for
   stale runs too (the artifact may have been edited since); the run-
   status badge still gates on freshness.
…nerating

cancelGeneration used to overwrite the assistant message with a flat
`{ role, content: <string> }` patch. Because the SDK derives UIMessage
parts from message.content at read time, this wiped every non-text part
(file/image cards, reasoning, tool-call, tool-result) the user had
already seen. It also discarded persisted deltas whenever the freeze
snapshot was empty (e.g. mid-remount), turning the bubble into an
aborted shell.

Change the cancel protocol to send displayedLength (a position in chars)
instead of a content string. The backend now truncates the existing
content in place via truncateAssistantContent — text parts cut by
cumulative length, every non-text part preserved at its original index.
When no length is captured but text was already streamed, mark
status=success without touching content rather than vaporising it; only
truly empty messages still get status=failed.
…pt runs

- Spawner generates a wrapper main.{py,js} that subprocess-invokes each
  requested step path in sequence inside the same container. Per-step
  {path, exitCode, durationMs, status} land in /workspace/output/.tale-steps/
  results.json; spawner reads + filters them from the output harvest.
- ExecuteRequest{code?, steps?} are mutually exclusive at the spawner
  validator. Step paths must reference files[]; main.{py,js} is reserved
  for the wrapper. Cap MAX_STEPS_PER_REQUEST=10.
- sandboxExecutions.steps[] persists per-step results for audit; finalize
  mutation forwards the array when present.
- artifact_run tool description rewritten: workspace lifecycle (fresh
  every run, /workspace/output/ does not carry across), steps[] is now
  the recommended way to chain generator+validator instead of splitting
  into two artifact_run calls. path/steps superRefine mutex.
- Result message attributes failures to the specific step
  ("Step 2/3 (\"validate.py\") exited 1") so the LLM patches the right
  file instead of restarting from scratch.
- 8 new spawner tests cover mutex, missing files[], reserved-name
  collision, and step cap.
…ew cleanup

Closes the cancellation contract gaps surfaced by the multi-agent review:
user-clicked Stop now actually stops sandboxes and discards in-flight
artifact streams, and sandbox runs stream stdout/stderr to the canvas live
instead of dropping the whole 16 KB buffer at terminal time.

Cancellation cascade (C1 + C2)
- New by_threadId index on sandboxExecutions, listNonTerminalByThread
  query, cancelExecutionRecord mutation, and cancelExecutionsForThread
  internalAction. cancelGeneration schedules the action (not awaited;
  the Stop-ack response is owed to the user immediately). spawn.ts
  already returns status:'cancelled' on abort, so the SSE result event
  the in-flight action receives is unchanged.
- discardActiveStreamsForThread deletes revision-0 placeholders
  (artifact_create mid-stream) and clears liveStreamMode on settled
  rows (rewrite/patch mid-stream). Closes the ~6 min ghost-tile window
  that cleanupStaleStreams cron was the only previous recourse for.

Live stdout/stderr (C5)
- New 'stdout'/'stderr' SSE event types in both wire.ts files with a
  compile-time Equal<> parity guard. spawn.ts emits per-line stdout
  (PHASE markers stripped) and per-chunk stderr; spawn-util.ts grew
  the symmetric onStderrChunk hook.
- Platform-side action coalesces deltas (250 ms / 2 KB threshold)
  through the new appendArtifactRunOutput mutation, which is
  terminal-state guarded, freshness-checked against runExecutionId,
  and capped at the existing 16 KB preview budget to match the
  canonical preview finalizeArtifactRun writes.
- canvas-runnable-code-renderer's stdout/stderr <details> auto-opens
  while running/installing and auto-scrolls to bottom unless the user
  has scrolled away. New LiveTailDetails helper sets `open` via ref so
  the user can collapse it manually once a run finishes.

OCC fix (C4)
- expectedRevision dropped .optional() on every artifact_edit mode and
  the `?? artifact.revision` fallback was removed. The schema and tool
  description now require the LLM to pass the revision visible in the
  <artifact revision="N"> system context; OCC is real instead of
  trivially passing.

CI unblock (C3)
- Six sandbox lint errors corrected (no-unsafe-type-assertion +
  no-unnecessary-type-assertion in cleanup.ts / validate-request.ts).
- artifacts/internal_mutations.test.ts rewritten: nine tests against
  the current title-idempotent createArtifact signature plus the new
  discardActiveStreamsForThread cascade. The pre-511e6b361 test suite
  is gone — that gate had been silently red ever since the streaming-
  create split moved.

Production-compose port (C6)
- 127.0.0.1:8003:8003 publish moved out of createSandboxService into
  generate-dev-compose only. Stateful prod no longer exposes the
  spawner on host loopback; Convex reaches it through the internal
  Docker network.

A11y (C7)
- segmented-radio buttons grew focus-visible ring tokens
  (fg-base / bg-elevated) — keyboard nav on /pricing and
  /hardware-pricing is no longer reliant on suppressible UA defaults.

Dead-code cleanups
- D1 deleted unwired updateStreamingContent mutation along with its
  orphan scheduleStreamingFlush/drainFlush helpers and the
  pendingFlush/flushInFlight state in stream_state.ts.
- D2 removed the void mirrorLegacyContent tree-shake-shield in
  artifact_read_tool; the function has 16 real call sites in
  mutations.ts / internal_mutations.ts / snapshot_for_branch.ts.
- D3 installOptions sandbox schema field marked @deprecated post-R2-B4
  (kept per feedback_deprecate_dont_delete_schema_fields).
- D4 dropped export keyword on three sandbox-internal-only symbols
  (NONCE_TTL_MS, releaseSpawnerLock, ValidateResult) so knip stays
  clean.
- D5 deleted the unreachable "Only generate a PDF if the user
  explicitly insists" sentence from chat-agent.json (de/en/fr).
- D6 documented the 'running' status literal asymmetry in
  convex/sandbox/wire.ts (kept for legacy rows; new audit-row writes
  emit 'installing' only).

Bonus
- snapshot_for_branch now branches at the in-scope revision's content
  instead of the source row's current state. create_branch_thread
  walks artifactRevisions, threading revisionFiles /
  revisionEntryFile / revisionContent through to the snapshot
  function; legacy content-only rows synthesize a single-file
  artifact. Closes the pre-existing create_branch_thread_artifacts
  test failure that was keeping bun run check red.

Verified: bun run check green end-to-end — 44/44 turbo tasks, 70 684
tests. Sandbox suite 67 pass. The rewritten internal_mutations.test.ts
adds 9. The branch-artifact suite (5) now passes the "later edits out
of scope" case it was already asserting.
Symptom: chat UI froze while streamDeltas kept growing. Root cause is
the agent SDK rebuilding the full UIMessage from cursor=0 on every
Convex push, which becomes O(N²) when a large artifact_create emits
hundreds of uncompressed tool-input-delta rows.

UX
- Drop v{revision} badges from artifact bar and message bubbles; show
  file count instead. Remove listRevisions query (no UI consumer).
- artifact_create soft-conflicts on a second call in the same
  assistant message (conflict: 'already_created_in_message'), steering
  the model toward artifact_edit. Tool description leads with DEFAULT
  TO ONE ARTIFACT PER REPLY. New compound index
  by_organizationId_thread_createdByMessageId backs the lookup; guard
  gates on non-empty messageId to avoid cross-matching multi-step or
  sub-agent edge cases that fall back to "".
- Canvas file sidebar gains a "+" button that reuses the existing
  userEdit mutation, so users can add files without going through the
  LLM. Sidebar now renders for any file count (not just > 1).

Streaming
- saveStreamDeltas.throttleMs 100 -> 250 (SDK default). Tale has no
  inter-push smoothing layer, so 500ms felt chunky on fast streams;
  250ms cuts stream row volume ~2.5x without visible regression.
- patch-package extends the agent SDK's compressUIMessageChunks with a
  tool-input-delta merge branch (mirrors the text-delta merge). Submit
  upstream and drop the patch on the next SDK bump.

Verification
- 70,688 unit tests pass; new internal_queries.test.ts covers the
  guard query (hit, miss, empty-messageId short-circuit, scoping).
- stream_throttle.test.ts updated to assert the new value plus a
  documented [100, 400] band.
Symptom: during a long artifact_create stream the canvas would visibly
fill with content, then flash back to blank — and on LLM retry /
"create in segments" the model started over from zero. Root cause:
content tokens were only accumulated in client-side state and persisted
to the DB once at execute time. Any interruption between begin and
finalize lost everything that had been streamed.

artifact_create
- onInputDelta is now two-phase: Phase 1 commits the placeholder once
  type+title are known (unchanged); Phase 2 keeps parsing on every
  subsequent delta and flushes the parsed partial `content` into the
  row's `streamingContent` via a new internal mutation, throttled by
  the existing `shouldFlush` / `markFlushed` primitives from
  `stream_state.ts`.
- The execute-time catch no longer unconditionally discards the
  placeholder. If it has already received some `streamingContent`, it
  stays — a later `artifact_create` with the same title takes the
  collision path and surfaces the partial state to the model rather
  than restarting from zero. Truly abandoned placeholders still get
  swept by the existing `by_liveStreamMode` cleanup cron.

artifact_edit (rewrite mode)
- Symmetric Phase 2 after beginEditStream: every parse pass with a
  growing `content` field flushes to `streamingContent`, gated on
  toolCallId + streamingPath matching the row's current session.

The canvas's `hasDeltas ? streamedContent : (streamingContent ??
settled)` precedence already preferred `streamingContent` when the
client-side tool-input-delta hook was empty — it just never had any
bytes to show. With the server now writing during streaming, the
fallback works as designed and the blank window disappears.

Verification: 70,698 unit tests pass; new mutation tests cover the
happy path + every guard (missing row, toolCallId mismatch, wrong
mode, streamingPath mismatch); new wiring test confirms onInputDelta
calls beginCreateStream then updateCreateStreamingContent in the same
parse pass for fully-formed JSON, and that small subsequent growth is
throttled.
…collapsible

Previously the execution panel lived inside CanvasRunnableCodeRenderer
and was keyed by the sidebar's `activePath` via getLatestRunPerFile.
Switching to a sibling file (e.g. validate.py from generate.py) made
the entry file's run output, including the .pptx download chip,
disappear until the user clicked back.

Hoist the panel to canvas-pane as a project-level fixture:

- New `listRunsPerFile(artifactId)` query returns one projection per
  declared file with an execution row, ordered entry-file first, then
  declared `files[]` order. Older runs per path are collapsed; rows
  whose path is no longer in `files[]` (file deleted via canvas) are
  dropped. Legacy single-file artifacts with no per-file rows fall
  back to the artifact row's `run*` fields. Algorithm split into a
  pure `selectRunsPerFile` helper for unit testing.

- New `RunResultPanel` subscribes once at the canvas level and renders
  the entry file's result inline (same chrome the legacy renderer
  used: status badge, error block, output files, stdout / stderr
  LiveTailDetails). Non-entry files' results render inside the existing
  `CollapsibleDetails` primitive — summary shows path + status badge,
  body expands the same chrome. Panel returns null when nothing's
  worth showing, matching legacy "stay quiet" UX.

- `StatusBadge` / `FileChip` / `LiveTailDetails` and the run-detail
  body extracted into `run-result-helpers.tsx` so the new panel and
  any future consumer share one rendering source. The runnable code
  renderer is now a thin source-only wrapper around CanvasCodeRenderer.

- `getLatestRunPerFile` is removed (sole caller was the panel that no
  longer needs it). Convex regenerates `_generated/api.d.ts` via
  `typeof` module reference.

- i18n: three new `canvas.*` keys (en / de / fr): `runResultEntryLabel`,
  `runResultSecondaryLabel`, `runResultSecondaryCount` (ICU plural).

- Sidebar selection now only swaps the source code below; the run
  panel above is stable and independent.

Verification: 70,705 unit tests pass; new `queries.test.ts` covers
the pure helper across 7 cases (entry-first ordering, repeat
collapsing, deleted-file filtering, runProgress/runRevision mirroring
only onto the active row, legacy fallback, empty state, no-path
rows).
…/output/

Reproduces the user-reported failure: a multi-file python_runnable with
generate.py (writes /workspace/output/foo.pptx) and validate.py (reads
it back through markitdown) — when the LLM made two separate
artifact_run calls instead of using steps:[...], validate.py died on
FileNotFoundError because every artifact_run gets a fresh Docker
container with an empty /workspace/output/. The output file lived on
in artifact.runOutputFiles[] (in _storage) but was never re-staged.

Make separate calls forgiving:

- Wire: ExecuteRequest gains optional `priorOutputFiles: Array<{name,
  contentBase64}>` (same envelope shape as harvest returns).
- Spawner: new `stagePriorOutputFiles(outputDir, files)` writes each
  base64 entry into /workspace/output/ before the container starts.
  Path-traversal guard (resolve + outputDir-prefix check) skips unsafe
  names without aborting the run; bad entries are logged. Exported for
  unit testing.
- Convex `executeCode` action: when artifactId is set, looks up the
  artifact, sums runOutputFiles sizes, and forwards them to the spawner
  if total ≤ 10 MiB. Over the cap → skip the pre-stage and inject a
  one-line note via the existing stderr tail channel so users see why.
  Load failures are logged + non-fatal — pre-staging is a best-effort
  backstop, not a contract.
- artifact_run tool description: WORKSPACE LIFECYCLE section now says
  recent outputs ARE pre-staged (with the cap), still nudges toward
  steps:[...] as the canonical idiom for tightly-coupled chains.

steps:[...] remains the right pattern (one container, atomic outcome,
fail-fast). Pre-staging is the safety net for when the LLM forgets.

Verification: 70,705 platform tests pass, 7 new spawner-side bun:test
cases cover the path-traversal guard (../escape, /abs-escape), nested
paths, binary round-trip via base64, and empty-list no-op. The Convex
action's branch is integration-tested via the manual smoke described in
the plan (run generate.py separately, then validate.py — second run
sees the .pptx).
…e error

Symptom: streamDeltas keep growing while the chat UI looks frozen and the
Convex logs flood with `beginEditStream {code: 'streaming_in_progress'}`
on the same artifact, 30+ entries at the same second.

Root cause was in `d3ef05680` ("persist artifact streaming content
incrementally"): on an `artifact_create` execute error with non-empty
`streamingContent`, we skipped `discardCreateStream` to preserve the
partial content. The intent was that a follow-up same-title create would
recover via the collision path. But the row was left with revision=0
AND liveStreamMode='create' AND toolCallId set — any subsequent
`artifact_edit` then hit `beginEditStream`'s streaming_in_progress
refusal until the cleanup cron's stale threshold elapsed (minutes).

`artifact_edit_tool.ts`'s `onInputDelta` caught the rejection and
returned with `state.rowInitialized = false`, so every ~40 ms parse
pass retried the same mutation — hence the log flood and the
appearance of a frozen UI (Phase 2 flush never fired either).

Fix at the root, plus a defensive shield on the edit side.

§1 Settle, don't strand:
- New internal mutation `settleStrandedCreateStream(artifactId, toolCallId)`:
  * revision=0 + create-mode + non-empty streamingContent → promote
    to revision-1 (files=[{entryFile, content: streamingContent}],
    clear streaming flags, insert artifactRevisions row with editKind
    'create').
  * revision=0 + empty buffer → delete the row (mirror of
    discardCreateStream's old behavior).
  * Already-settled or different mode → clear streaming flags only.
  * toolCallId mismatch → no-op (don't touch another stream's row).
- Replace the lookup→if-content→skip-discard dance in artifact_create's
  execute catch with a single call to the new mutation. The decision
  lives server-side now; no client-side race window between read and
  conditional patch.

§2 Break the edit-side retry loop:
- New optional `beginEditStreamFailed` flag on `ArtifactStreamState`.
- `artifact_edit_tool.onInputDelta`: short-circuit at the top of Phase 1
  when the flag is set; stamp it in the `beginEditStream` catch.
- `artifact_edit_tool.execute`: read the flag and return a structured
  `{success: false, code: 'streaming_in_progress', message}` so the LLM
  gets the right recovery hint instead of falling through to the OCC
  path.

Verification: 70,712 platform tests pass. New `settleStrandedCreateStream`
cases cover promote / delete / mismatch / settled-row / missing-row.
New `artifact_edit_tool.test.ts` confirms a permanent beginEditStream
rejection calls the mutation exactly once across four onInputDelta
invocations (and that the happy path still flushes via
`updateRewriteStreamingContent`).

Out of scope (separate follow-ups):
- `cleanupStaleStreams` cron behavior — kept as-is (delete revision-0
  past threshold); §1 prevents stranding in the first place, the cron
  is the long-tail safety net.
- LLM-side system prompt guidance to react to `streaming_in_progress`.
…ked-write 'append' mode

User has been hitting `streaming_in_progress` errors on `beginEditStream`
repeatedly despite multiple incremental fixes. The root pattern: the
`artifact_create` streaming-placeholder window is fundamentally fragile
— whenever the create's execute doesn't run cleanly (action mid-flight,
AI SDK retries, network blip, atypical step transitions), the artifact
stays in `liveStreamMode='create'` and blocks any subsequent
`artifact_edit` until the cleanup cron sweeps it minutes later.

Retire the streaming-create machinery entirely and route all content
delivery through `artifact_edit`. Add a new `append` mode so the LLM
has a proper chunked-write primitive for long files instead of being
forced to emit the whole content in a single huge tool input
(re-creating the fragility on the edit side).

artifact_create
  * Drops the `content` argument entirely.
  * No `onInputStart` / `onInputDelta` hooks. The tool is purely
    synchronous metadata: type, title, language, entryFile, packages.
  * `execute` keeps the same-message guard, then calls `createArtifact`
    once and returns its result. The row always lands directly at
    revision 1 with an empty entry file.
  * Success-response `message` includes a concrete copy-pasteable
    next-step hint pointing the LLM at `artifact_edit({mode:'append'})`.
  * Description rewritten: removes all "content REQUIRED" language;
    surfaces the new create-then-append/rewrite workflow.

artifact_edit — 5 modes total (3 content + 2 file-tree):
  * `append` (new) — concat `content` to the end of a file at `path`;
    creates the file if missing. Each call bumps revision; the LLM
    delivers a long file via N small calls. Preferred over `rewrite`
    for files >~10 KB or multi-turn delivery. OCC via `expectedRevision`
    handles retry de-duplication (second land returns `code:'stale'`).
  * `set_entry` retired. Common case (re-point entry on rename) is
    already handled atomically by `rename`'s `from === entryFile`
    follow-along; the rare swap-between-existing-files corner is doable
    with a two-step rename. Saves the LLM one mode of cognitive load.
  * `rewrite` / `patch` / `delete` / `rename` unchanged.

Backing mutations
  * New `appendToFile` — mirror of `rewriteArtifact` but the file's new
    content is `existing + args.content` instead of replacement. Audit
    row uses `editKind: 'append'` (added to the schema validator).
  * `beginEditStream` now seeds `streamingContent=''` for both
    `rewrite` and `append` so the canvas's `streamingContent ?? settled`
    fallback chain works identically across both content modes.
  * `updateRewriteStreamingContent` accepts both `rewrite` and `append`
    live modes (same wire shape, same per-tool-call live preview path).
  * Deleted: `beginCreateStream`, `finalizeCreateStream`,
    `updateCreateStreamingContent`, `discardCreateStream`,
    `settleStrandedCreateStream`, `setArtifactEntry` (~380 LoC).
  * `set_entry` literal kept in `artifactEditKindValidator` for
    read-validator compatibility with existing rows (per
    feedback_deprecate_dont_delete_schema_fields).

Schema:
  * `artifactEditKindValidator` gains `'append'`.
  * `liveStreamModeValidator` gains `'append'` (same wire shape as
    `rewrite` — content streams in via tool-input).

Tests:
  * Deleted `artifact_create_tool.test.ts` (the streaming hooks it
    covered are gone).
  * Trimmed `internal_mutations.test.ts`: removed
    `updateCreateStreamingContent` and `settleStrandedCreateStream`
    describe blocks; extended the mock's query builder with `.order()`
    so trimRevisionHistory works inside the test.
  * Added 5 `appendToFile` cases: concat happy path, create-if-missing,
    stale OCC rejection, not_found, and a multi-call sequential flow
    asserting the final revision and concatenated content.

Net: ~700 LoC deletion, the recurring streaming-create bug class
disappears, the LLM gets a natural way to handle long files. 70,705
platform tests pass; 0 lint warnings.

Out of scope (separate follow-ups):
- Streaming-flag schema field cleanup (still in use by rewrite/append/patch).
- `cleanupStaleStreams` cron (long-tail safety net; behaviour unchanged).
- LLM-side system prompt nudges to prefer `append` over `rewrite`.
…dit 'append' mode

Follow-up to c5350c8 ("retire streaming-create + add 'append' mode").
The Convex layer wires `liveStreamMode='append'` correctly, but
canvas-pane.tsx still gates its content-streaming logic on
`liveStreamMode === 'create' || === 'rewrite'` only — `'append'` was
missed, so an in-flight append left the canvas main area blank even
though the file sidebar's streaming dot lit up (the sidebar uses a
mode-agnostic `path === streamingPath` check).

Add `liveStreamMode === 'append'` to both content-streaming
disjunctions:

* `isStreamingActiveFile` — drives the 3-tier fallback chain (live
  tool-input-delta → `streamingContent` → `settledContent`) and the
  streaming caret.
* `showStreamingSource` — the source-view gate that keeps the
  pre-render code view on while a content stream is in flight.

`useStreamedArtifactContent`'s tool-input-delta decoder is shape-
identical for `rewrite` and `append` (both ship `content` at top-level
of the parsed tool input), so the existing client-side path just
works once these two gates allow it through. No other frontend file
needs touching — grep confirmed every other reference to
`liveStreamMode` is either mode-agnostic (`!== undefined`), explicitly
`'patch'`, or a pure path comparison.

Verification: 70,705 tests still pass. Manual smoke for the user's
case (WiSeKey-style large pptx via append-mode chunked writes) should
now show content streaming in token-by-token as each tool-input
delta arrives — same UX as the historical rewrite mode.
…artifact runs

A failed or cancelled run was unconditionally writing `runOutputFiles: []`
into the artifact row (both at `initArtifactRun` and inside the empty
`applyFinalizeArtifactRun` patch). Because pre-staging for the next run
reads the artifact row's `runOutputFiles`, any failed intermediate run
wiped the prior successful run's outputs — so a subsequent `artifact_run`
that depended on those files hit a `FileNotFoundError` for a file that
demonstrably existed.

`applyFinalizeArtifactRun` now only writes `runOutputFiles` when the run
completed OR the harvest produced at least one file; `initArtifactRun`
no longer clears the field on start. Successful runs still replace
atomically, so the only behavior change is "failed/cancelled run no
longer destroys history."
…teral closes

`artifact_edit`'s `onInputDelta` was calling `beginEditStream` as soon as
`parsePartialJson` surfaced any `path` value — but a partial parse on
`{"path":"c` would auto-repair the string and surface `path: "c"`. Every
subsequent delta committed a new intermediate name ("cr", "cre", ...),
producing the visible filename flicker in the Canvas FILES panel.

Added `isPathFieldClosed` to scan the raw accumulator for the value's
unescaped closing `"` before allowing `beginEditStream`. Once the path
literal has closed it cannot regress (JSON is written linearly), so the
gate is one-way: the first committed `streamingPath` is the final one.

The Phase-2 content-flush hook is already transitively gated by
`state.rowInitialized`, so it inherits the fix without further change.
…pers + handler modules

internal_mutations.ts had grown to 1307 lines mixing internalMutation
wrappers, args/returns validators, and full handler bodies — hard to scan,
hard to extend, and the per-mutation contract was hidden inside ~150-line
handler bodies.

Split into four concern-scoped modules under handlers/:
  - shared.ts:        helpers, size guards, validateFiles,
                      clearStreamingFlags, trimRevisionHistory
  - content_edits.ts: createArtifact, applyToolPatch, rewriteArtifact,
                      appendToFile, deleteFileFromArtifact,
                      renameFileInArtifact
  - streaming.ts:     beginEditStream, abortStream,
                      updateRewriteStreamingContent,
                      discardActiveStreamsForThread, cleanupStaleStreams
  - run_state.ts:     setArtifactRunConfig, initArtifactRun,
                      appendArtifactRunOutput, patchArtifactRunProgress,
                      finalizeArtifactRun (+ the pure
                      applyFinalizeArtifactRun helper)

Each handler module exports a `xxxArgs` / `xxxReturns` / `xxxHandler`
triple; the wrapper file now just imports and assembles them via
`internalMutation(...)`. Existing call sites are unaffected — the
internalMutation export names + shapes are preserved, and the two
cross-module helpers (`MAX_ARTIFACT_BYTES`, `assertAggregateSize`,
`applyFinalizeArtifactRun`) are re-exported for back-compat. The vitest
suite's `internalMutation: (config) => config` mock continues to work
unchanged because `.handler` still resolves to the registered function.

internal_mutations.ts: 1307 -> 197 lines.
…ables + backfill

Phase 1 of the plan in llm-majestic-hamming.md. Pure additive — the new
tables ship alongside the existing `artifacts.files[]` /
`artifacts.runOutputFiles[]` fields, which remain in the schema as
`@deprecated` per [feedback_deprecate_dont_delete_schema_fields] so
historical rows continue to pass the read validator unchanged. No read or
write paths are switched in this commit; later phases will migrate
callers table-by-table while the deprecated fields keep working as a
fallback.

New tables (all camelCase, registered in convex/schema.ts):
  - artifactFiles      one row per source file, keyed by (artifactId, path);
                       carries optional streamingWriteToolCallId pointer
                       the canvas will use to find streamDeltas for live
                       content rendering once the new write paths land
  - artifactRuns       one row per execution attempt, append-only; carries
                       optional inputsFromRun for cross-run pre-staging
  - artifactRunFiles   append-only output files per run; failed runs that
                       harvested partial files will still land rows here
                       (resolves D5 in the plan once the spawner is updated)

Backfill: migrations/backfill_artifact_files_table.ts (single paginated
internal mutation, follows the existing migrations/* convention). Reads
each artifact's `files[]` into `artifactFiles` and — for artifacts with
terminal `runStatus` — synthesizes one `artifactRuns` row plus a
`artifactRunFiles` row per output. Idempotent (every step probes the
target index first), safe to re-run.

Deprecation pass on artifactsTable: marked `files`, `runOutputFiles`,
`streamingPath`, `streamingContent`, `liveStreamMode`, `toolCallId`,
`liveStreamStartedAt`, `streamingPatches` with @deprecated JSDoc explaining
the replacement and the migration window. Fields stay on the row; new
code reads/writes the new tables once subsequent phases switch over.
…stage migrate to new tables

Phase 2 of the plan in llm-majestic-hamming.md. Three concerns wired
together end-to-end while keeping the deprecated artifacts.runOutputFiles
field as a read fallback (per [feedback_deprecate_dont_delete_schema_fields]).

1. spawner harvest on failure (D5):
   `spawn.ts` now harvests `/workspace/output/` on every exit path —
   completed, cancelled, and failed — wrapped in try/catch so a stat
   error never trumps the underlying failure signal. Partial files the
   user script managed to write before crashing now reach the platform
   and land in artifactRunFiles (status=failed), so the user can see
   what was produced before the crash instead of getting nothing.

2. finalize dual-write:
   `applyFinalizeArtifactRun` still patches the artifact row's run-state
   fields (including artifacts.runOutputFiles when the harvest had
   output, per the Fix-1 guard), AND now also inserts one artifactRuns
   row + one artifactRunFiles row per output. Append-only — failed and
   cancelled runs leave their row in place for future history queries.
   The artifact row's run-state fields remain the canonical source for
   in-flight (queued/installing/running) state.

3. pre-stage read path:
   New getLatestRunOutputs internal query reads the most recent
   completed artifactRuns row + its artifactRunFiles, falling back to
   the legacy artifacts.runOutputFiles field for artifacts whose data
   hasn't been backfilled. executeCode in node_only/sandbox/
   internal_actions.ts now calls it instead of reading runOutputFiles
   directly — the migration window keeps both paths working.

Behaviour change visible to users today: a failed run that was
previously preceded by a successful one no longer destroys the prior
outputs at pre-stage time (Fix-1's guard already prevented destruction
on the artifact row; this change ensures the new artifactRuns table
preserves them across the cutover), and any partial files written
before a crash now show up in the per-run artifactRunFiles set.
…ackages_add

Two small surface additions to existing tools. Originally the plan also
split artifact_edit's rewrite/append modes into separate _open / _write
tools (Q4 of the original plan), but D2's flicker had already been
solved structurally by the isPathFieldClosed gate (commit 94a9249),
so the split would have doubled LLM round-trips without any user-visible
gain. Reverted before commit.

artifact_run:
  - New optional `inputs?: { from_run: "latest" | "<runId>" }` arg.
    Default behaviour unchanged ("latest succeeded run"); explicit
    runId pins pre-staging to that exact prior run regardless of status.
  - Threaded into executeCode → getLatestRunOutputs query, which now
    accepts an optional fromRun; resolves the runId via
    `ctx.db.normalizeId` so a malformed id silently falls through to
    the default path instead of crashing the action.
  - Return shape carries the new `runId` (resolved via
    `getRunByExecutionId` index on artifactRuns.executionId) so the LLM
    can reference history. Best-effort: omitted if finalize never ran.
  - New `by_executionId` index on artifactRuns backs the lookup.

artifact_edit:
  - `rewrite` and `append` modes now accept optional `packages_add`.
    On success, unions the listed names into the artifact's persistent
    `runPackages` (dedup, case-sensitive); no-op if all listed entries
    were already present. Existing entries are never removed.
  - New `addArtifactPackages` internal mutation handler + thin wrapper;
    `artifact_edit`'s execute() calls it after the underlying rewrite /
    append succeeds. Best-effort: a package-update failure does NOT
    flip the edit's success status.

Surface unchanged for non-runnable artifacts; `packages_add` is silently
ignored when the artifact type is not python_runnable / node_runnable.
…ct_edit/artifact_read

LLM-facing tool surface migrates from modal artifact_edit (5 modes:
rewrite/append/patch/delete/rename) to single-responsibility CRUD:

- file_create / file_update / file_delete / file_rename — file-level writes
  with explicit path_exists / file_missing guards and OCC via expectedRevision
- file_read / file_list — explicit paths required; no "smart inline" aggregate
- artifact_packages_add — runtime-dependency mutation separated from file ops

Cuts the "single file too big" anti-pattern by guiding the LLM (via the
shared artifacts system prompt) to split into multiple files instead of
chunked appends.

Schema / handler changes:
- Per-file dual-write into artifactFiles table via new syncArtifactFiles
  helper, called from every settle path
- New createFileInArtifact / updateFileInArtifact handlers replace the
  upsert rewriteArtifact; old applyToolPatch / rewriteArtifact / appendToFile
  handlers + apply_patches.ts deleted
- Canvas getById switched to loadArtifactWithFiles, reading artifactFiles
  as the authoritative source with doc fallback for legacy rows

Streaming fix:
- beginEditStream's row-level single-writer mutex removed — concurrent
  file_create / file_update calls to different paths no longer fail with
  spurious streaming_in_progress. Same-path collisions are still caught by
  OCC at settle time; canvas streaming UI shifts to last-writer-wins.

Cleanup:
- artifact_edit_tool / artifact_read_tool deleted + unregistered
- Schema validators for historical editKind values (patch / rewrite /
  append / set_entry) preserved per feedback_deprecate_dont_delete_schema_fields
- All LLM-facing prose (tool descriptions, error messages, build_artifacts_context)
  rewritten to advertise file_* CRUD
- docs/{en,de,fr}/platform/workspace/canvas.md updated
…te shared system prompt

Two cleanups on top of the strict-CRUD surface from 2088df6:

1. **Namespace tools by artifact scope.** All six file-level CRUD tools
   (file_create / file_update / file_delete / file_rename / file_read /
   file_list) operate on files INSIDE an artifact, not standalone files.
   Renaming to artifact_file_* puts them in the same prefix as their
   sibling artifact_create / artifact_list / artifact_run / artifact_packages_add,
   makes the scope explicit, and frees the bare file_* namespace for any
   future non-artifact file primitives.

2. **Move file-convention guidance out of the shared system prompt.**
   The "MULTI-FILE PROJECTS: split logically separate concerns..." block
   in build_artifacts_context.ts was sent to every agent on every turn,
   regardless of whether the agent had write tools registered. That's
   noise for query/RAG-only agents. The same guidance lives in the
   artifact_file_create / artifact_file_update / artifact_create /
   artifact_run tool descriptions — which only ship when the relevant
   tool is in scope — so the shared block can drop the duplication.

Schema validator entries 'file_create' / 'file_delete' / 'file_rename'
on artifactEditKindValidator are unchanged — they are persisted in
historical artifactRevisions rows and must continue to parse.

docs/{en,de,fr}/platform/workspace/canvas.md updated to use the new
tool names.
After 6a826e2 retired artifact_edit and renamed file_* tools to
artifact_file_*, examples/agents/chat-agent.json was left referencing
the removed artifact_edit tool in both its toolNames array AND in three
locales of systemInstructions. The chat agent would have:

- failed to load artifact_edit (tool-not-found warning, no working
  modification path)
- lacked every artifact_file_* tool entirely
- still steered the LLM toward artifact_edit in the PPTX recovery and
  HTML revision flows baked into its system prompt

Updated:
- toolNames: drop artifact_edit; add artifact_packages_add + 6
  artifact_file_* tools
- systemInstructions (en / de / fr): rewrite the PPTX 3-tool sequence
  as a 4-tool sequence (create → file_update → run → file_update on
  failure); rewrite the HTML path to use file_update; replace
  artifact_edit references in the shared guardrail with
  artifact_file_update / artifact_file_create
… artifactId literal

`parsePartialJson` auto-closes in-flight strings, so streaming deltas were
firing `getById` with truncated artifactIds, tripping `v.id("artifacts")`
and spamming WARN logs. Generalise the existing path-field-closed gate
into `isStringFieldClosed(accumulator, fieldName)` and apply it to both
the `artifactId` preflight and the existing `path` check.
Previously, editing source code after a run cleared the status badge,
stdout, stderr, and error block — only output files survived. The
panel looked empty even though we still had everything to show.

Flip the freshness gate from "hide" to "annotate": `isStale` (replaces
`isRunFresh`) now signals when `runRevision` diverges from
`artifactRevision`, and `StatusBadge` renders a secondary "Source
edited" chip next to terminal-state badges. In-flight runs keep their
spinner unadorned — the user choice is to treat a running execution as
still legitimately progressing, not stale.

Adds `canvas.runStale` to en/de/fr (de-CH inherits from de).
The sandbox runtime image hardcoded `exec python3 /workspace/code/main.py`,
forcing the spawner to write either user content or its multi-step wrapper
to that fixed path. Users couldn't name any artifact file `main.py` even
though that's the most natural Python entry-script name — the LLM's
`steps: [{path:"main.py"}, {path:"test.py"}]` flow was outright rejected.
The constraint protected the spawner's staging convention, not any real
semantic mistake. It's gone now:

- Runtime entrypoint takes the script path as a 4th positional arg and
  exec()s it; user files run at their declared paths so tracebacks /
  `__file__` carry the real filename, no synthetic mirror.
- Multi-step wrapper moves to `/workspace/.tale/runner.{py,js}` — a dir
  whose dotfile segment is already unreachable from user paths, so there
  is no collision surface and no user-facing reservation.
- `code` field removed from the spawner wire (`ExecuteRequest`,
  `executeCode` action, `SpawnerExecuteBody`); `entryPath` is required
  when `steps` is absent. Validator + action mutex updated to match.

Tests pin the user-reported workflow (`steps: ['main.py', 'test.py']`)
as a regression gate and assert the new staging layout (no synthetic
`main.py`, wrapper under `/workspace/.tale/`).
- Use head/tr for byte-cap test payloads instead of a 5 MiB bash brace-expansion loop that intermittently timed out on shared CI.
- Remove unused SandboxSseEvent export from wire.ts.
- Drop stale convex/_generated/** knip ignore (no longer needed).
LLM calls to artifact_file_update for ~22KB content hit the 8192-token
output cap mid-string, producing an unrecoverable JSON parse error
before the handler ran. Two-layer fix: raise the agent fallback cap to
32768 (was 8192) to give realistic headroom for tool-call arguments,
and rewrite the file-write tool descriptions with a HARD size limit
(~12KB / ~400 lines) plus explicit split-the-file workflow (e.g.
slide1.js + slide2.js requiring into main.js) so the model decides to
split BEFORE generating the call instead of getting truncated.
After raising the agent fallback maxOutputTokens to 32768, the ~12KB
guidance was using only ~10% of the actual output budget and would
prompt unnecessary file splitting for clearly safe writes. 40KB / 1000
lines aligns with the common best-practice line ceiling and still
leaves ~65% headroom of the 32K cap for preamble / thinking / wrapping.
The platform forwards prior-run output files (base64-encoded, up to
10 MB raw / ~13.5 MB after base64 inflation) inline in the request
body so the spawner can pre-stage them for the next run. The previous
256 KB cap was sized for "agent-authored code + small input file set"
and never accounted for binary artifact outputs (PPTX/PDF/images)
getting fed back in, so re-running any artifact that had already
produced a non-trivial output file failed with HTTP 413
"sandbox spawner refused payload — request body exceeds spawner cap".

20 MB aligns with the upstream priorOutputFiles cap
(MAX_PRIOR_OUTPUT_BYTES = 10 MB) after base64 overhead plus room
for source files + JSON wrapper. Still sits well below the spawner
container's 512 MB mem_limit so a single oversized POST cannot OOM
the process — the original DoS-guard intent is preserved.
…oads work

`listRunsPerFile` was projecting `sandboxExecutions.outputFiles` straight to
the canvas, but that field is the audit row and intentionally omits
`storageId`. <FileChip>'s `useFileUrl(storageId)` therefore returned null,
leaving `<a href="#">` and a non-functional download button.

Look up `storageId` per file via `fileMetadata` after the pure projection
runs. Keeps `selectRunsPerFile` synchronous (unit tests unchanged) and
fixes existing rows without a backfill.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants