Skip to content

xtask: sandbox env, vendored bins, Carnac (v1, stacked on #190)#191

Open
whme wants to merge 9 commits into
mainfrom
demo-automation-v1
Open

xtask: sandbox env, vendored bins, Carnac (v1, stacked on #190)#191
whme wants to merge 9 commits into
mainfrom
demo-automation-v1

Conversation

@whme
Copy link
Copy Markdown
Owner

@whme whme commented May 9, 2026

Summary

Stacked on top of #190 (v0). This PR currently contains the v0
commit plus v1 + follow-ups. Once #190 merges, this branch will be
rebased on main so only v1 changes remain. (GitHub does not
support Gerrit-style chains directly, so we approximate them by
stacking PRs and rebasing on merge.)

This is v1 of the record-demo plan. Builds on the v0
record-demo subcommand so the recorder no longer requires ffmpeg,
gifski, or Carnac on PATH and the demo can be recorded inside an
isolated Windows Sandbox VM instead of the caller's interactive
desktop.

v1 highlights

  • New --env sandbox provider (now the default) renders
    target/demo/csshw-demo.wsb with read-only mounts for the
    workspace, the bin cache, and xtask/demo-assets/, plus a
    writable mount for the captured GIF. A LogonCommand runs
    sandbox-bootstrap.ps1 which sources setup-desktop.ps1,
    optionally launches Carnac, runs xtask record-demo --env local
    inside the sandbox, and writes a done.flag sentinel before
    shutting the VM down. The host polls for the sentinel and
    copies the GIF back. Sandbox cannot run on GitHub-hosted
    runners (no nested virtualisation); v2 will add the
    ci_runner provider for windows-2022.
  • xtask/src/demo/bin.rs SHA-pins ffmpeg 8.1.1, gifski 1.34.0,
    and Carnac 2.3.13. ensure_bins downloads each into
    target/demo/bin/ on cold cache, verifies the SHA-256 (case-
    insensitive to tolerate PowerShell's upper-case digests), and
    extracts via the pure-Rust zip and xz2/tar crates (no
    external tar or Expand-Archive dependency).
  • Recorder now polls file_size until ffmpeg has written at
    least 8 KiB of capture data so the first synthesised keystrokes
    are not lost in the gdigrab warm-up window.
  • setup-desktop.ps1 normalises wallpaper, console font, and DPI
    inside the sandbox; the .wsb schema does not expose a stable
    resolution element.
  • Carnac is downloaded unchanged under the MS-PL; LICENSE and
    attribution README live under xtask/demo-assets/carnac/ to
    preserve the notices the license requires.
  • All new I/O surfaces (path_exists, file_size,
    http_download, sha256_file, extract_archive,
    spawn_sandbox, terminate_sandbox) are routed through
    DemoSystem so unit tests cover the cache state machine, the
    .wsb mount layout, the sentinel poll, and the capture-baseline
    gate against mockall fakes with zero real filesystem or network
    effects.
  • README spells out the Windows Sandbox feature prerequisites.

Test plan

  • cargo build -p xtask clean (no warnings)
  • cargo fmt clean
  • cargo lint clean
  • cargo test (all green)
  • cargo doc-tests clean
  • cargo xtask check-typography clean
  • Manual: cargo xtask record-demo --env sandbox produces a
    playable target/demo/csshw.gif on a host with the
    Containers-DisposableClientVM Windows feature enabled.
  • Manual: cargo xtask record-demo --env local still
    produces a playable target/demo/csshw.gif (regression check
    against v0).

The full plan lives at
C:/Users/whme/.claude/plans/tranquil-hopping-karp.md.

whme and others added 7 commits May 9, 2026 10:40
The README's demo/csshw.gif is currently re-recorded by hand each
time it drifts from the product, requiring a developer to manually
configure their workstation (wallpaper, fake SSH hosts, keystroke
overlay, ScreenToGif). This commit lays the foundation for "demo
as code" - a typed Rust DSL describing the demo plus an xtask
subcommand that runs it.

This is v0 of a four-stage plan: a local-only proof that produces
target/demo/csshw.gif on the developer's own desktop. v1 adds
Windows Sandbox isolation + Carnac overlay + visual normalisation;
v2 adds CI workflows + an append-only demo-assets orphan branch
with SHA-pinned filenames; v3 adds the chord/Press DSL primitive
plus the full canonical scene.

Architecture mirrors the existing xtask modules (e.g.
social_preview.rs):

- DemoSystem trait + RealSystem behind which all side effects
  (windows input synthesis, fs, subprocess) sit so unit tests
  exercise the driver via mockall with zero real-system effects.
- Closed Step enum + Script builder validating capture pairing
  and regex compilation at build time, so a typo in the script
  fails cargo check rather than a half-completed recording.
- Driver interprets steps via the trait; the in-flight ffmpeg
  capture is cleaned up even when a step errors mid-script.
- config_override writes target/demo/csshw-config.toml, a
  dispatcher.bat that strips an optional user@ prefix from
  csshw's USERNAME_AT_HOST substitution, and per-host enter.bat
  scripts with curated home directories.
- env/local.rs copies csshw.exe into target/demo/ so csshw's
  startup set_current_dir(exe_dir) lands on our config (csshw
  rebases its cwd at startup, so a plain cwd-based override
  does not work).
- terminate_csshw kills the daemon child and best-effort kills
  any CREATE_NEW_CONSOLE-detached client csshw.exe instances.

v0 requires ffmpeg and gifski on PATH; v1 will vendor them as
SHA-pinned binaries downloaded into target/demo/bin/. The full
plan lives at C:/Users/whme/.claude/plans/tranquil-hopping-karp.md.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Builds on the v0 record-demo subcommand. With this change the
recorder no longer requires ffmpeg, gifski, or Carnac on PATH and
the demo can be recorded inside an isolated Windows Sandbox VM
instead of the caller's interactive desktop.

- New `--env sandbox` provider renders `target/demo/csshw-demo.wsb`
  with read-only mounts for the workspace, the bin cache, and
  `xtask/demo-assets/`, plus a writable mount for the captured GIF.
  A `LogonCommand` runs `sandbox-bootstrap.ps1` which sources
  `setup-desktop.ps1`, optionally launches Carnac, runs
  `xtask record-demo --env local` inside the sandbox, and writes a
  `done.flag` sentinel before shutting the VM down. The host polls
  for the sentinel and copies the GIF back.
- New `xtask/src/demo/bin.rs` SHA-pins ffmpeg 8.1.1, gifski 1.34.0,
  and Carnac 2.3.13. `ensure_bins` downloads each into
  `target/demo/bin/` on cold cache, verifies the SHA-256 (case-
  insensitive to tolerate PowerShell's upper-case digests), and
  extracts. Carnac's release zip wraps a NuGet package so the pin
  records `inner_archive` and the helper extracts both layers.
- Recorder now polls `file_size` until ffmpeg has written at least
  8 KiB of capture data so the first synthesised keystrokes are not
  lost in the gdigrab warm-up window.
- `setup-desktop.ps1` normalises wallpaper, console font, and DPI
  inside the sandbox; the .wsb schema does not expose a stable
  resolution element.
- Carnac is downloaded unchanged under the MS-PL; LICENSE and
  attribution README live under `xtask/demo-assets/carnac/` to
  preserve the notices the license requires.
- All new I/O surfaces (path_exists, file_size, http_download,
  sha256_file, extract_archive, spawn_sandbox, terminate_sandbox)
  are routed through `DemoSystem` so unit tests cover the cache
  state machine, the `.wsb` mount layout, the sentinel poll, and
  the capture-baseline gate against mockall fakes with zero real
  filesystem or network effects.

Sandbox cannot run on GitHub-hosted runners (no nested
virtualisation); v2 will add the `ci_runner` provider for
windows-2022 plus the orphan-branch publish flow.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
With this change `cargo xtask record-demo` is hermetic by default:
the Windows Sandbox provider runs against a normalised desktop
without commandeering the developer's own session. CI workflows
must pass `--env local` explicitly because GitHub-hosted runners
lack the nested virtualisation Windows Sandbox needs.

The CLI flag flip is one line; the rest of the diff is doc
churn (README, DemoEnv variant docstrings, RecordDemo command
help text, the default-pinning unit test).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Windows 10/11 ships BSD `tar.exe` on PATH but no `xz` binary, so
`tar -xf gifski-*.tar.xz` shells out to `xz -d -qq` and fails with
"unable to run program". Switch the `.tar.xz` branch in
`extract_archive` to `lzma-rs` + `tar` crates so extraction works
without any vendored decompressor.

Drops the unused `.tar.gz` / `.tar` branches: only the gifski
release uses tar.xz today.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PowerShell's `Expand-Archive` validates by file extension and
refuses anything but `.zip`, so the inner Carnac NuGet package
(`carnac-*-full.nupkg`) failed to extract:

    Expand-Archive : .nupkg is not a supported archive file
    format. .zip is the only supported archive file format.

Drop down to the underlying `System.IO.Compression.ZipFile`,
which only cares about the format, not the file name. Same
PowerShell shell-out, no new dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PowerShell's archive APIs are not portable enough for the Carnac
download pipeline:

- `Expand-Archive` validates by file extension and rejects `.nupkg`
  even though it is a zip.
- `[System.IO.Compression.ZipFile]::ExtractToDirectory(src, dst, $true)`
  binds to a different 3-arg overload depending on the PowerShell
  edition: on Windows PowerShell 5.1 (.NET Framework) the third arg
  is `Encoding` and `$true` fails to coerce; on PowerShell 7+
  (.NET Core / .NET 5+) the third arg is `bool overwriteFiles`.

Switch to the `zip` crate so extraction works the same on every
PowerShell host and we stop debugging .NET overload resolution.
The default `zip` features pull bzip2/zstd/deflate64 we don't need;
keep just `deflate` since that is what every archive we ship uses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`cargo xtask record-demo` now defaults to `--env sandbox`, but
fails with `program not found` when WindowsSandbox.exe is missing.
Replace the one-liner about `Containers-DisposableClientVM` with
the actionable trio (Pro/Enterprise/Education edition, hardware
virtualisation, the elevated PowerShell command + reboot) so a
contributor can act without leaving the README.

Reword the `--env local` bullet to point Windows Home users (and
not just CI) at it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🤖 News Fragment Check

❌ This PR appears to be missing a news fragment.

If this change affects users (new features, bug fixes, security fixes):
→ Add a news fragment in the news/ directory
→ Example: news/123.feature.md or news/~fix-issue.bugfix.md

If this change doesn't need a news fragment (docs, CI, internal changes):
→ Add the label to this PR

For more information about news fragments, see: https://github.com/nekitdev/changelogging

The sandbox VM has no Rust toolchain, so the bootstrap script can
only run a prebuilt csshw.exe from the read-only repo mount. Have
record-demo run `cargo build -p csshw` (debug profile) on the host
before spawning the sandbox, so the task is self-sufficient and the
user does not need to remember to build first.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🤖 Diff Coverage Check

Success!

Diff Coverage

Diff: origin/main...HEAD, staged and unstaged changes

No lines with coverage information in this diff.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🤖 Coverage Check
Total Coverage: 61.78%

Filename Stmts Miss Cover Missing
src\utils\config.rs 80 0 100.00%
src\utils\windows.rs 120 14 88.33% 293, 516-518, 1004-1005, 1027-1031, 1177-1182
xtask\src\social_preview.rs 101 3 97.03% 259, 262, 265
xtask\src\coverage.rs 55 3 94.55% 185, 199, 213
xtask\src\readme.rs 72 1 98.61% 123
xtask\src\release.rs 144 15 89.58% 27, 492-502, 505, 509, 615
xtask\src\changelog.rs 29 0 100.00%
xtask\src\inject_agent_token.rs 48 0 100.00%
xtask\src\typography.rs 91 1 98.90% 259
src\daemon\mod.rs 709 583 17.77% 188-190, 216-816, 877-1033, 1064-1065, 1108, 1138, 1182-1183, 1189-1191, 1204-1208, 1240, 1249-1253, 1272-1533
src\daemon\workspace.rs 21 0 100.00%
src\client\mod.rs 223 146 34.53% 169-251, 310, 314, 319-323, 341-506
src\lib.rs 113 19 83.19% 257-265, 325-327, 345-354
src\cli.rs 314 57 81.85% 100-276
src\protocol\deserialization.rs 53 0 100.00%
src\protocol\serialization.rs 30 0 100.00%
TOTAL 2203 842 61.78%

Consolidates the fixes that turned the v1 sandbox provider from
"produces a GIF" into "produces a correct GIF of the actual csshw
interaction".

- recorder: drop `-video_size 1920x1080` from the gdigrab args so
  the capture covers the entire primary monitor instead of cropping
  to the top-left 1920x1080 region. Windows Sandbox auto-sizes its
  desktop to the host monitor and exposes no stable resolution
  hook, so any pinned size truncated the recording on high-DPI / 4K
  hosts. The downstream `scale=1280:-1` step in
  `stop_ffmpeg_and_encode` already normalises the encoded GIF
  width regardless of source size.
- windows_input: replace `SendInput(KEYEVENTF_UNICODE)` with a
  `VkKeyScanW`-driven virtual-key sequence (with shift / ctrl / alt
  modifiers as needed). Synthetic Unicode events arrive at
  low-level keyboard hooks (`WH_KEYBOARD_LL`) with `vkCode =
  VK_PACKET`, which Carnac renders as the literal text "Packet"
  (so a `whoami` broadcast showed up as six "Packet" rows). Real
  VK events make Carnac display the typed character. Surrogate-
  pair / unmapped chars now error loudly because the canonical
  script restricts itself to ASCII keyboard characters.
- sandbox env: stop mounting the workspace read-only and stop
  building csshw + xtask inside the sandbox. The host now builds
  both with a statically linked MSVC runtime
  (`-C target-feature=+crt-static`) directly into the writable
  mount at `target/demo/out/work/target/`, where they appear
  inside the VM at `C:\demo\out\work\target\debug\` with no in-VM
  copy and no rustup install. xtask is invoked with
  `CSSHW_DEMO_WORKSPACE` set so its local provider locates
  csshw.exe at the same path it does on a developer workstation.
- bin: pin and download the Microsoft VC++ Redistributable x64
  installer alongside the other vendored binaries. Vendored
  gifski.exe is dynamically linked against `vcruntime140.dll`,
  which the Windows Sandbox base image does not ship (it only
  ships UCRT). Without the redist installed, in-VM gifski exits
  with `STATUS_DLL_NOT_FOUND` (0xC0000135). The bootstrap runs
  `vc_redist.x64.exe /install /quiet /norestart` before invoking
  xtask. `ensure_pin` now treats archives whose `archive_name`
  equals `exe_rel` as self-contained executables and skips the
  extract step.
- sandbox-bootstrap.ps1: switch the sentinel-write protection
  from a script-level `trap` to `try/catch/finally`. PowerShell
  treats the existing `try` as the enclosing handler even with no
  `catch`, so the trap never fired and `$status` silently kept
  its placeholder. The new `catch` surfaces the real exception
  message into `done.flag`, and the failure path tails the last
  ~1.5 KB of xtask's redirected stdout/stderr into the sentinel
  so the host's `wait_for_sentinel` diagnostic carries the real
  cause when the in-VM xtask fails.
- setup-desktop.ps1: stop setting a solid wallpaper. The sandbox
  ships a clean stock background and the `--env local` path must
  not modify the developer's wallpaper.
- sandbox env: bail out of the sentinel poll early when the user
  closes the sandbox window manually, so the host does not hang
  for the full 8-minute timeout.

GitHub: #191
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a stupid question, but we are only using carnac when recording a demo gif that is linked to in the readme.
Neither carnac nor the gif is part of any release a customer will download.
Do we still need to include the LICENSE in the repo just because we use it? Seems weird. I dont see a difference between using carnac and using any third part library or tool. We dont include those licenses either do we?
Please research what we need to do here, if we need the license we might not use carnac at all.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not need, drop me?

Comment on lines +4 to +6
# C:\demo\bin ffmpeg / gifski / Carnac / vcredist caches (RO)
# C:\demo\assets this script + setup-desktop.ps1 (read-only)
# C:\demo\out writable: prebuilt binaries, GIF, sentinel,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3 mounts? Just mount all under demo as RW and be done with it

Comment on lines +9 to +11
# The host builds csshw + xtask with a statically linked MSVC
# runtime (RUSTFLAGS=-C target-feature=+crt-static) directly into
# C:\demo\out\work\target\debug\ on the writable mount. The binaries
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to build csshw and xtask statically linked anymore since we install msvc into the sandbox, adjust this and every other place that mentions this or implements this

Comment on lines +51 to +56
// Copy csshw.exe into the demo directory. csshw rebases its cwd
// to its own exe_dir on startup (src/cli.rs:548), so the config
// we just wrote is only picked up if csshw runs from there.
let source_exe = locate_csshw_exe(&workspace)?;
let demo_exe = layout.csshw_cwd.join("csshw.exe");
system.copy_file(&source_exe, &demo_exe)?;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does csshw not have a --config flag? This way we wouldnt need to copy it

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its doesn't have one, open a GH issue for that feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant