Skip to content

Add local-vm CLI for building and running E2B infrastructure VMs#1992

Open
tomassrnka wants to merge 9 commits intomainfrom
local-vm-build
Open

Add local-vm CLI for building and running E2B infrastructure VMs#1992
tomassrnka wants to merge 9 commits intomainfrom
local-vm-build

Conversation

@tomassrnka
Copy link
Member

@tomassrnka tomassrnka commented Feb 25, 2026

Summary

  • QEMU/KVM-based VM running the full E2B stack (API, Orchestrator, Client-Proxy + Docker deps)
  • e2b-build.sh: two-phase image build (packages + HWE kernel, then make targets + template)
  • e2b-local.sh: CLI for managing VM instances (start/stop/ssh/status/network/test)
  • Bridge networking with dynamic subnet detection (192.168.100-119 range)
  • Port-forward mode for single-instance use without bridge setup
  • nightly-build.sh: cron-friendly build + test + compress cycle with image pruning
  • --verbose/--quiet output modes, auto-disabled ANSI when piped
  • Once feat: add initial ARM64 (aarch64) architecture support #1875 gets merged, there will be a follow-up PR to support builds for ARM64 and runtime for mac

Test plan

  • All 14 shell scripts pass bash -n syntax check
  • test-sandbox.mjs passes Node.js syntax check
  • CLI dispatch: help, unknown command, missing subcommand all behave correctly
  • e2b-local.sh status shows running VM with colored table
  • e2b-local.sh start errors properly when VM already running
  • e2b-local.sh ssh --name nonexistent shows error message
  • -v flag shows debug output, -q flag suppresses info
  • Piped output contains no ANSI escape codes
  • Full sandbox test passes against running VM (create sandbox, run command, verify output)

Note

Medium Risk
Introduces new root-run scripts that create/tear down host bridge networking and iptables NAT rules and automate building a full VM image, which could disrupt a developer/CI host if misused. Changes are otherwise isolated to a new local-vm toolchain and do not modify runtime infrastructure code.

Overview
Adds a new local-vm toolchain to build a self-contained QEMU/KVM Ubuntu VM image that boots and runs the full E2B stack via cloud-init/systemd, plus a local CLI to start/stop/ssh/status multiple VM instances using either bridge networking (with dynamic subnet + dnsmasq/NAT setup) or localhost port-forwarding; it also includes a nightly build/test/compress workflow and a Node-based sandbox smoke test wired to predictable local test credentials.

Written by Cursor Bugbot for commit 5b2af22. This will update automatically on new commits. Configure here.

  Single-machine E2B setup using QEMU/KVM with bridge networking.
  Two-phase build (cloud-init + HWE kernel), CLI for start/stop/ssh/status,
  dynamic subnet detection, and nightly build+test automation.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 288cf4b7b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

- Fix KVM fallback: make -machine/-cpu flags conditional (tcg+max when no KVM)
- Fix --skip-build overwriting existing image with bare Ubuntu base
- Persist SSH port in state dir so vm-ssh.sh honors custom --ssh-port
- Fix nightly symlink .xz suffix mismatch breaking --skip-build fallback
- Make bridge init lazy so SSH/status commands don't fail on subnet exhaustion
- Harden cloud-init: disable password auth, remove root login, drop plaintext password
- Remove unofficial Docker registry mirrors (keep only mirror.gcr.io)
- Fix iptables INPUT rule cleanup running outside DEFAULT_IFACE guard
- Enable net.ipv4.ip_forward in network setup for NAT to work on clean hosts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove __COMMIT_HASH__ sentinel from guard condition in deploy-phase1.sh;
  sed replaces both the value and the guard, making checkout always no-op.
  Now just checks if COMMIT is non-empty (empty when --commit not passed).
- Validate instance name is <= 7 chars to prevent TAP device name
  (e2b-tap-<name>) from exceeding Linux 15-char IFNAMSIZ limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tomassrnka and others added 3 commits February 26, 2026 12:51
The EXIT trap only removed the lock file, leaving orphaned QEMU
processes and TAP devices if the script was interrupted after starting
the test VM. Now stops the VM before removing the lock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use ExecStopPost instead of ExecStartPost in e2b-phase2.service so
  the VM shuts down regardless of whether deploy-phase2.sh succeeds
  or fails (ExecStartPost only runs on exit code 0)
- Add 3600s timeout to the Phase 2 wait loop in e2b-build.sh so a
  stuck VM is killed instead of blocking the build forever

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Poll /var/log/e2b-build-status via SSH while the VM is running.
After shutdown, check for PHASE2_COMPLETE vs PHASE2_FAILED and exit
non-zero on failure or unknown status, instead of unconditionally
printing "Build Complete".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable autofix in the Cursor dashboard.

if (( PHASE2_ELAPSED % 15 == 0 )); then
PHASE2_STATUS=$(ssh $SSH_OPTS -i "$SSH_KEY_FILE" -p "$SSH_PORT" e2b@localhost \
"cat /var/log/e2b-build-status 2>/dev/null" 2>/dev/null || true)
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 2 status polling overwrites previously captured result

Medium Severity

PHASE2_STATUS is unconditionally reassigned on every 15-second SSH poll. If one poll successfully captures PHASE2_COMPLETE but a subsequent poll occurs while the VM is shutting down (SSH dying), the failed SSH produces empty output via || true, overwriting the good value. The post-loop check then declares "status unknown" and exits with an error, causing a false build failure.

Additional Locations (1)

Fix in Cursor Fix in Web

-monitor unix:"$PHASE2_MON_SOCK",server,nowait \
-daemonize \
-display none \
-pidfile "$PHASE2_PID_FILE"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing cleanup trap for daemonized Phase 2 VM

Medium Severity

The Phase 2 QEMU VM is launched with -daemonize (detached from the script), but e2b-build.sh has no trap to kill it on script exit. If the user interrupts the build (Ctrl+C) during the Phase 2 polling loop, the daemonized VM (16GB RAM, 6 CPUs) is left running indefinitely with no easy cleanup path. nightly-build.sh has an EXIT trap for exactly this purpose, but e2b-build.sh does not. The orphaned VM isn't discoverable via e2b-local.sh stop --all either, since it uses a separate PID path (/tmp/e2b-build-phase2.pid).

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants