Skip to content

ron2k1/claude-code-structured-concurrency

Repository files navigation

claude-code-structured-concurrency

Kernel-enforced cleanup of orphaned Claude Code subprocesses. Win32 Job Object on Windows, cgroup.kill (Linux 5.14+) with a process-group fallback for older kernels, and setpgid + a disowned out-of-process watchdog on macOS. cmd.exe delegates to the PowerShell wrapper.

License: MIT Platform Shell Tests

Claude Code spawns 40-60 child processes per session (MCP servers, plugins, LSPs, hooks). They often outlive their parent. After a few days, Task Manager (Windows) or ps -ef (Linux) fills with node entries from sessions that closed hours ago, and reboot becomes the cleanup primitive. This skill wires up the same kernel mechanisms Chrome, Edge, VS Code, and systemd-run --scope already use to bound helper-process lifetime, so the OS reaps the tree instead.

Verified 9 ms reap latency on Windows 11 build 26200. 36 PowerShell unit assertions plus 1 functional test (Windows side) and 41 bats tests (Linux + macOS), all passing in CI.

Guarantee matrix

Platform Mechanism SIGKILL of wrapper survives? Status
Windows 10+ Win32 Job Object + JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE Yes (kernel reaps on handle close, including Task Manager End-Task) STRONG — shipped v1.0.0
Linux 5.14+ systemd-run --user --scope + cgroup.kill Yes (cgroup-level kill, kernel-enforced) STRONG — shipped v1.1.0
Linux <5.14 / containers / WSL1 bash set -m + trap on EXIT/INT/TERM/HUP + killpg No (trap doesn't fire on kill -9) MEDIUM — fallback, shipped v1.1.0
macOS setpgid + trap + disowned out-of-process watchdog Yes if only the wrapper is killed (the watchdog outlives it and reaps the tree); No if wrapper and watchdog are SIGKILLed simultaneously MEDIUM — shipped v1.2.0; honest ceiling stated in the install banner and pinned by tests/macos/test-honesty.bats

Running on Linux <5.14, or on macOS? The installer prints which tier you're getting at install time, so there's no surprise.

Three-layer architecture: Visibility (cc-procs.ps1) and Cleanup (cleanup-orphans.ps1) and Prevention (claude-jobbed.ps1) over shared libraries, with the Prevention layer connecting to the Win32 Job Object kernel primitive

Demo

58-second screen capture. The orphan MCP count drops to zero on claude.exe exit, with cleanup driven by the kernel's Job Object close, not by application code. If the inline player doesn't load, download the MP4 directly.

Requirements

Windows side:

  • Windows 10 build 17134 (January 2018) or later. Job Object behavior was unreliable for this pattern on older builds.
  • PowerShell 5.1 (default Windows install) or PowerShell 7+. Git Bash also works (via ~/.bashrc).

Linux side:

  • bash 4+. The fallback path uses set -m job control and trap-on-signal cleanup.
  • Optional but recommended: kernel 5.14+ (Aug 2021, in every supported distro) and systemd-run available, for the STRONG cgroup.kill path. Without these the installer drops to the MEDIUM trap-based fallback and tells you so.

macOS side:

  • bash — Apple's stock /bin/bash 3.2.57 is sufficient; the wrapper, finder, and installer are written 3.2-safe by construction. zsh and fish are also wired.
  • macOS has no cgroup.kill / Job-Object analog, so it is MEDIUM tier by construction (setpgid + trap + disowned out-of-process watchdog). The installer states this ceiling explicitly at install time — there is no STRONG path to opt into.

Zero external dependencies on any platform. No PowerShell modules, no Node, no Python, no sudo.

Important

Windows: automatic shadowing is PowerShell-only — but cmd.exe is supported. -ShadowClaude redefines claude via $PROFILE, a PowerShell concept; cmd.exe has no equivalent AutoRun profile, so plain claude typed into a bare cmd.exe window runs unprotected. cmd.exe users are not stuck: tools\claude-jobbed.cmd delegates to the PowerShell wrapper and inherits the exact same STRONG Job Object guarantee. Invoke it directly, or add a per-session macro with doskey claude=C:\path\to\tools\claude-jobbed.cmd $*. PowerShell and Git Bash get automatic shadowing. Per-launcher details and remedies live in docs/FAQ.md.

Install

Windows

git clone https://github.com/ron2k1/claude-code-structured-concurrency `
    "$env:USERPROFILE\.claude\skills\structured-concurrency"

& "$env:USERPROFILE\.claude\skills\structured-concurrency\tools\install-reap.ps1" -ShadowClaude

-ShadowClaude redefines plain claude as a function that delegates to the wrapper. Without it, you have to type claude-jobbed every time you want protection. PowerShell resolves Functions before PATH, so the function wins over claude.exe at parse time.

Open a fresh PowerShell window and confirm:

Get-Command claude
# CommandType=Function (Definition: claude-jobbed @args)  ->  wrapped
# CommandType=Application                                  ->  NOT wrapped

cmd.exe: there is no AutoRun auto-shadow, but tools\claude-jobbed.cmd delegates to claude-jobbed.ps1 and inherits the same STRONG Job Object guarantee. Run it directly, or add a per-session macro: doskey claude=C:\path\to\tools\claude-jobbed.cmd $*.

Linux / macOS

git clone https://github.com/ron2k1/claude-code-structured-concurrency \
    "$HOME/.claude/skills/structured-concurrency"

cd "$HOME/.claude/skills/structured-concurrency"
./install.sh

The same install.sh covers Linux and macOS. On Linux it detects your kernel version and prints which guarantee tier you're getting (STRONG on 5.14+, MEDIUM below). On macOS it prints the MEDIUM tier and the honest ceiling: the disowned watchdog reaps the tree if the wrapper alone is Force-Quit, but a simultaneous SIGKILL of both wrapper and watchdog is unrecoverable because macOS has no kernel job-object/cgroup primitive. Either way it asks for [y/N] confirmation. It injects an idempotent shell function block into ~/.bashrc, ~/.zshrc, and ~/.config/fish/config.fish (each only if the rc file already exists); on macOS it also writes ~/.bash_profile, because macOS Terminal.app runs bash as a login shell and login shells source ~/.bash_profile, not ~/.bashrc. Plain claude then routes through the wrapper.

# CI / unattended:
./install.sh --yes

# Re-run after editing the wrapper (rewrites the rc block in place):
./install.sh --force --yes

# Cleanly remove:
./install.sh --uninstall

Open a fresh shell and confirm the wrapper is in front of the real binary:

type claude
# claude is a function   ->  wrapped
# claude is /usr/bin/claude  (or similar)  ->  NOT wrapped

Warning

Install does not wrap a session that's already running. Close existing CC windows and start a new shell after install.

Components

Windows (tools/):

Tool Layer What it does
tools/cc-procs.ps1 Visibility Read-only inventory: PID, parent, age, memory, classification, orphan flag. No kill capability.
tools/cleanup-orphans.ps1 Cleanup Terminates strict-orphan subtrees per ~/.reap/config.json. Dry-run by default.
tools/claude-jobbed.ps1 Prevention Win32 Job Object wrapper. Kernel terminates the entire CC tree on wrapper exit.
tools/claude-jobbed.cmd Prevention cmd.exe shim. Re-execs claude-jobbed.ps1 via powershell.exe -NoProfile -File and propagates its exit code, so cmd.exe users inherit the same STRONG Job Object guarantee.

Linux (tools/linux/):

Tool Layer What it does
tools/linux/find-claude.sh Discovery 9-probe path resolver: command -v → npm prefix → /opt/homebrew/bin/usr/local/bin → nvm (highest version) → fnm → asdf → volta → yarn global. Returns 127 if nothing matches.
tools/linux/claude-jobbed.sh Prevention Two-tier wrapper. STRONG: spawns claude inside systemd-run --user --scope, so cgroup.kill reaps the tree even on kill -9 of the wrapper. FALLBACK: bash set -m + trap-on-EXIT/INT/TERM/HUP that issues killpg -TERM then -KILL. CLAUDE_JOBBED_FORCE_FALLBACK=1 exercises the fallback path on systemd-equipped boxes (used in CI).

macOS (tools/macos/):

Tool Layer What it does
tools/macos/find-claude.sh Discovery Same 9-probe order as Linux, written bash-3.2-safe (Apple ships frozen bash 3.2.57 at /bin/bash). The fnm probe additionally checks ~/Library/Application Support/fnm, fnm's default FNM_DIR on macOS.
tools/macos/claude-jobbed.sh Prevention MEDIUM-tier wrapper. set -m gives the child its own process group; a trap reaps it on graceful exit or catchable signal; a disowned out-of-process watchdog (its own pgid, parent-identity check via ps -p $pid -o lstart= since macOS has no /proc) reaps the tree even when the wrapper alone is Force-Quit. Simultaneous SIGKILL of wrapper and watchdog is the honest ceiling — macOS has no kernel job-object/cgroup primitive.
# Windows
.\tools\cc-procs.ps1                # see what's running
.\tools\cleanup-orphans.ps1         # dry-run a reap
.\tools\cleanup-orphans.ps1 -Force  # actually reap
# Linux
claude --version       # already wrapped if you ran install.sh
type claude            # confirm: should print "claude is a function"

Inside a Claude Code session on Windows, the same flow is /structured-concurrency [kill|install|verify].

A SessionStart hook (hooks/reap-on-start.ps1) runs the cleanup in strict-orphan-only mode on every CC start (Windows), so leftovers from un-wrapped or crashed sessions are reaped automatically. The Linux and macOS wrappers do not need a periodic reaper — cleanup runs at wrapper exit (Linux: cgroup.kill; macOS: the watchdog), not on a schedule.

Auditable

About 642 lines of PowerShell across tools/ and hooks/, plus 345 lines of tests. The runtime reads top to bottom in 20 minutes.

Surface Access Note
Network None No Invoke-WebRequest, Invoke-RestMethod, sockets, or telemetry.
File reads ~/.reap/config.json, ~/.reap/predicate.ps1, the wrapper's own scripts Nothing in Documents/, OneDrive/, source repos, or anywhere else under $env:USERPROFILE.
File writes ~/.claude/hooks/reap.log only Append-only log of kept, killed, and skipped decisions. No other writes.
Registry None No HKLM:\ or HKCU:\ access.
Process kills -Force plus a user-authored ~/.reap/config.json that opts in Default install kills nothing. See Configuration.
Shell profile install-reap.ps1 -ShadowClaude appends one PowerShell function to $PROFILE The function is human-readable. Removing the block reverts the install.
claude.exe Spawned as a child of the Job Object, never patched or hooked The binary on disk is untouched.
Background services None No Windows services, no scheduled tasks, no SessionStart hook unless you opt in.

Verify the scope yourself:

# No network calls anywhere in the runtime:
Select-String -Path .\tools\*.ps1,.\hooks\*.ps1 -Pattern 'Invoke-WebRequest|Invoke-RestMethod|System\.Net|curl|wget'

# Every file the tool can write to:
Select-String -Path .\tools\*.ps1,.\hooks\*.ps1 -Pattern 'Out-File|Set-Content|Add-Content|Tee-Object'

# Run the full test suite without installing anything:
.\tests\test-orphan-detect.ps1
.\tests\test-config-loader.ps1
.\tests\test-job-object.ps1

Uninstall is three commands:

notepad $PROFILE                                                    # delete the `function claude { ... }` block
Remove-Item -Recurse "$env:USERPROFILE\.reap"                       # remove your config (optional)
Remove-Item -Recurse "$env:USERPROFILE\.claude\skills\structured-concurrency"

No registry cleanup, no service removal, no leftover state.

Configuration

The cleanup engine is dangerous-by-omission. Without ~/.reap/config.json, cleanup-orphans.ps1 -Force is a guaranteed no-op. Aggression is opt-in.

Pick a starter profile at install time:

Profile Behavior
conservative Spare almost everything.
moderate Default. Kill standard MCP chains.
aggressive Also kill node.exe and cmd.exe orphans.
paranoid Observe-only. Never kills.
.\tools\install-reap.ps1 -ConfigProfile moderate

The decision flow always runs spare layers before kill layers (spare_classifications then spare_cmdline_patterns then kill_names then kill_classifications). claude.exe is classified as claude and claude is in the default spare_classifications, so it cannot be killed even if the user adds node.exe to kill_names. This invariant is exercised explicitly in tests/test-config-loader.ps1.

Full schema, the predicate.ps1 escape hatch for procedural rules, and worked configs ("I run in-house MCPs", "I want aggressive cleanup with a safety net") live in docs/CONFIGURATION.md.

How it works

The same idea — let the OS, not the application, enforce parent-death cleanup — has different kernel primitives on each platform.

Windows

JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE is a flag on Win32 Job Objects: when the last handle to the job is closed, the kernel terminates every member process. Browser sandboxes use this to bound renderer and tab lifetime.

claude-jobbed.ps1 does the wiring:

  1. CreateJobObjectW via P/Invoke.
  2. SetInformationJobObject with JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE.
  3. Spawn claude.exe and assign it to the job. Descendants inherit membership.
  4. Wait for claude.exe to exit, then exit.
  5. On wrapper exit (graceful, crash, BSOD, X-button close, Task Manager End-Task), the OS closes the job handle. The kernel walks the job and calls TerminateProcess on every member.

Linux

The strong path uses systemd's transient scope: every scope is its own cgroup, and writing 1 to cgroup.kill (kernel ≥ 5.14) atomically delivers SIGKILL to every member. systemd does that for you when the scope's main process exits.

tools/linux/claude-jobbed.sh STRONG path:

  1. find_claude resolves the real binary via the 9-probe order.
  2. systemd-run --user --scope --quiet --slice=claude-code.slice --unit="claude-jobbed-$$.scope" -- "$claude_path" "$@" launches claude inside a fresh transient scope.
  3. When the wrapper exits (any reason, including kill -9), systemd notices the scope's main PID is gone, writes 1 to the scope's cgroup.kill, and the kernel reaps every descendant cgroup-wide.

Fallback path (kernels <5.14, containers without systemd, WSL1):

  1. set -m so the spawned child gets its own process group (pgid == pid).
  2. trap 'kill -TERM "-$child_pgid"; sleep 0.5; kill -KILL "-$child_pgid"' EXIT INT TERM HUP.
  3. On graceful wrapper exit or signal, the trap fires killpg. SIGKILL of the wrapper itself escapes — the trap doesn't fire for -9. That's the documented MEDIUM-tier gap.

There is no application code path that can leak on the strong paths. This is structured concurrency enforced by the operating system, the way Nathaniel J. Smith originally framed the problem class. Application discipline is what produced the leaks in the first place.

macOS

macOS has no cgroup.kill and no Win32 Job Object. prctl(PR_SET_PDEATHSIG) is Linux-only; there is no kernel primitive that atomically reaps a process subtree when an ancestor dies. The MEDIUM tier closes as much of that gap as the OS permits:

  1. set -m so the spawned claude gets its own process group.
  2. trap 'cleanup' EXIT INT TERM HUP — graceful exit or any catchable signal killpgs the child group.
  3. A disowned watchdog subshell with its own process group records the wrapper's PID and start time (ps -p "$pid" -o lstart= — there is no /proc on macOS to read), then polls. When the wrapper disappears — including a Force-Quit / kill -9 that the wrapper's own trap can never catch — the watchdog runs the same cleanup() and reaps the tree. On graceful exit the wrapper kill -KILLs the watchdog, since the watchdog traps catchable signals and would otherwise outlive its purpose.

The honest ceiling: a simultaneous kill -9 of both the wrapper and the watchdog leaves the child group unreaped, because nothing is left alive to do it and macOS offers no kernel fallback. That exact scenario is asserted — and proven still-failing-by-design — in tests/macos/test-honesty.bats, so the limit is documented in executable form, not just prose.

Full architecture: DESIGN.md.

Tests

36 PowerShell unit assertions plus 1 functional test (Windows side), 19 bats tests (Linux), and 22 bats tests (macOS), all passing in the GitHub Actions matrix.

Windows (tests/test-*.ps1):

Suite Coverage
tests/test-job-object.ps1 Functional. Spawns a sleeping child, closes the job handle, asserts the child died within 2 seconds. 9 ms measured on Windows 11 build 26200.
tests/test-orphan-detect.ps1 Synthetic snapshots. Orphan detection, PID-reuse guard via StartTime comparison, classification, descendant tree walk.
tests/test-config-loader.ps1 Config schema. Defaults, malformed-JSON fallback, partial-config merge, and the spare-wins-over-kill safety invariant.
tests/test-spawn-plan.ps1 Wrapper host-routing for .cmd / .bat / .ps1 shims (npm-installed Claude ships a shim, not an .exe). Extension-priority resolution. 14 assertions.

Linux (tests/linux/test-*.bats):

Suite Coverage
tests/linux/test-find-claude.bats Probe priority: PATH > npm prefix > nvm (highest version) > fnm > yarn global. Sandboxed PATH+HOME so probes only hit fixtures. The 127-when-not-found contract and source-mode contract. 8 tests. (Probes 3-4 — Homebrew paths — honestly skip on Linux runners; they are exercised by the macOS suite on the macos-14 leg.)
tests/linux/test-pgid-cleanup.bats Forces fallback path via CLAUDE_JOBBED_FORCE_FALLBACK=1. Spawns a fake claude that backgrounds a grandchild, kills the wrapper with SIGTERM, polls (3s budget) for grandchild death. Plus exit-code propagation and verbatim arg forwarding. 3 tests.
tests/linux/test-cgroup-kill.bats Load-bearing parity test against Win32 KILL_ON_JOB_CLOSE. Lets the wrapper take the strong (systemd-run --scope) path, then SIGKILLs the wrapper. Bash traps don't fire on -9, so only kernel-enforced cleanup via cgroup.kill can satisfy this. Skips with a printed reason if systemd-run is missing, --user systemd is inactive, or kernel < 5.14. 1 test.
tests/linux/test-installer.bats Sandboxes HOME; covers --yes inject, idempotent re-run preserving marker count, --force overwrite (count stays at 2 not 4), --uninstall clean removal, --uninstall no-op, and unknown-flag exit code 2. 6 tests.

macOS (tests/macos/test-*.bats):

Suite Coverage
tests/macos/test-find-claude.bats Same probe-priority contract as Linux, plus the macOS-specific fnm ~/Library/Application Support/fnm probe. Sandboxed PATH+HOME. 10 tests.
tests/macos/test-pgid-cleanup.bats Spawns a fake claude that backgrounds a grandchild, kills the wrapper with SIGTERM, polls for grandchild death. Plus exit-code propagation and verbatim arg forwarding. Single macOS path (no FORCE_FALLBACK split). 3 tests.
tests/macos/test-honesty.bats The load-bearing negative test that pins the honest MEDIUM ceiling. CASE 1: SIGKILL the wrapper alone — the disowned watchdog must outlive it and reap the grandchild (proves MEDIUM). CASE 2: SIGKILL wrapper and watchdog simultaneously — the grandchild survives, the documented un-closeable ceiling on a kernel with no job-object primitive. 2 tests.
tests/macos/test-installer.bats Sandboxes HOME; covers --yes inject into ~/.zshrc + ~/.bashrc + ~/.bash_profile, the honest-ceiling install banner text, idempotent re-run, --force no-dup, --uninstall clean removal across all three rc files, --uninstall no-op, and unknown-flag exit code 2. 7 tests.
# Windows
.\tests\test-job-object.ps1
.\tests\test-orphan-detect.ps1
.\tests\test-config-loader.ps1
.\tests\test-spawn-plan.ps1
# Linux (requires bats-core: apt install bats)
bats --print-output-on-failure tests/linux/

# macOS (requires bats-core: brew install bats-core)
bats --print-output-on-failure tests/macos/

CI runs all three platforms on every push (.github/workflows/test.yml): ubuntu-latest for the Linux bats suite (with loginctl enable-linger so systemctl --user is active and the cgroup-kill test exercises the strong path instead of skipping), macos-14 (Apple Silicon) for the macOS bats suite, and windows-latest for the PowerShell suite. Hosted Intel macOS (macos-13) is intentionally not a CI leg — GitHub is retiring hosted Intel macOS, so that leg never received a runner and ran to GitHub's hard 24h "awaiting a runner" ceiling on every push; the runtime scripts are architecture-neutral and bash-3.2-safe by construction, so only x86_64 execution is uncovered, tracked as an explicit open gap in #3. The macOS leg includes a probe step that records sw_vers / uname -m and which bash actually runs the suite: GitHub's macOS runners put a modern Homebrew bash ahead of Apple's frozen /bin/bash 3.2.57 on PATH, so a dedicated /bin/bash -n static-parse step proves the runtime scripts parse under real Apple stock bash even though bats itself runs under the newer bash — the gap is documented, not hidden. If a suite fails on your Windows build, file an issue with the output of winver. If it fails on a Linux distro, include uname -r and systemctl --user is-active default.target; on macOS include sw_vers and uname -m.

Safety guarantees

  • cc-procs.ps1 never kills. No Stop-Process, no taskkill, no TerminateProcess. Run it any time.
  • cleanup-orphans.ps1 defaults to dry-run. Live kills require both -Force and a config that opts in. With no ~/.reap/config.json, the engine is a guaranteed no-op even with -Force.
  • The engine never blanket-kills node.exe by name. spare_classifications always runs first.
  • claude-jobbed.ps1 is opt-in. Plain claude.exe still works without the wrapper, just unprotected.

What this does not do

  • Replace Claude Code's own subprocess discipline. Anthropic can ship Job Objects + cgroups natively. This is the user-side workaround until they do.
  • Fully match the Windows/Linux STRONG tier on macOS. macOS has neither cgroup.kill nor a Job-Object equivalent — prctl(PR_SET_PDEATHSIG) is Linux-only. v1.2.0 ships the MEDIUM tier: setpgid + trap + a disowned out-of-process watchdog, which does survive Force-Quit of the wrapper alone (the watchdog outlives it and reaps the tree). The honest ceiling — a simultaneous kill -9 of both wrapper and watchdog — is unrecoverable, and that exact ceiling is pinned by tests/macos/test-honesty.bats so it cannot silently regress. A future Swift kqueue/launchd helper that could close the gap is conditional on telemetry.
  • Wrap a claude that's already running. Restart your shell after install (Windows or Linux).
  • Auto-shadow launchers that bypass shell rc files: Win+R, desktop shortcuts to claude.exe, Task Scheduler entries, VS Code's terminal until reloaded; on Linux/macOS that's anything launched with env -i or by a service manager that strips the rc files. Bare cmd.exe has no AutoRun auto-shadow either — though tools\claude-jobbed.cmd gives cmd.exe users the full STRONG guarantee when invoked or aliased explicitly. See docs/FAQ.md for per-path remedies.

License

MIT. See LICENSE.

Author: Ronil Basu (@ron2k1).

Reading

About

Structured concurrency for Claude Code on Windows. Skill that acts as a wrapper for CC sessions, so no orphan functions chew through memory.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors