fix(tool-server): prevent duplicate tool-servers after an nvm node switch#385
Open
filip131311 wants to merge 1 commit into
Open
fix(tool-server): prevent duplicate tool-servers after an nvm node switch#385filip131311 wants to merge 1 commit into
filip131311 wants to merge 1 commit into
Conversation
…itch Switching node versions via nvm could leave two or more tool-server processes alive at once while ~/.argent/tool-server.json tracked only one. Root cause in @argent/tools-client `ensureToolsServer()`: the read-state -> check-alive+healthy -> clearState -> spawn -> writeState sequence had no cross-process serialization (two launchers both spawn) and never terminated the previously-tracked detached server on respawn (orphan). An nvm switch relaunches `argent mcp` under the new node while the previous MCP's health monitor reconnects, so several launchers race and each spawns its own detached server on a fresh port; last-writer-wins state leaves the rest orphaned until their 30-min idle timeout. - Add an O_EXCL spawn lock (~/.argent/tool-server.lock) with a per-acquisition nonce, stale-steal (dead holder / aged lock), a bounded wait, and a hot-spin-proof reclaim loop. ensureToolsServer double-checks for a peer-spawned healthy server inside the lock before spawning. - Kill-before-respawn: terminate a wedged/unhealthy server we own before spawning its replacement, gated by (a) managed === "autospawn", (b) a structural `node <bundlePath> start` command-line identity match, and (c) a re-check immediately before each signal -- so a recycled pid or a user's `argent server start` (now tagged managed: "cli") is never killed. - spawnToolsServer reaps its detached child on every reject path so a readiness timeout can't leak an orphan that binds its port late. Adds launcher-duplicate-spawn.test.ts (burst-collapse, unhealthy-orphan, recycled-pid safety, CLI-server preserved, spawn-timeout reap); each case was verified to fail against the pre-fix code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Switching Node versions with nvm can leave two (or more)
tool-serverprocesses alive at once while~/.argent/tool-server.jsontracks only one. The extra servers are orphaned (untracked, on leaked ports) until their 30‑minute idle timeout.This was reproduced live: a machine had three
node …/dist/tool-server.cjs startprocesses running, all spawned within the same minute, with state tracking only the last one.Root cause
@argent/tools-clientensureToolsServer()ran a non-atomicread state → check alive+healthy → clearState → spawn → writeStatesequence with:argent mcpunder the new Node while the previous MCP's 30s health monitor reconnects — several launchers runensureToolsServer()at nearly the same time, all observe "no healthy server", and each spawns its own detached server on a fresh free port. Last‑writer‑wins on the state file orphans the rest.clearState()d and spawned a new one, leaving the old detached process running untracked.Fix (
packages/argent-tools-client/src/launcher.ts)~/.argent/tool-server.lock) serialises the spawn decision across processes: per-acquisition nonce, stale‑steal (dead holder / aged lock), bounded wait, and a hot‑spin‑proof reclaim loop. Inside the lock,ensureToolsServerdouble‑checks for a peer‑spawned healthy server before spawning, so a burst of concurrent launches collapses to a single server.managed === "autospawn"— a user'sargent server startserver is now taggedmanaged: "cli"and is never killed by the auto‑spawn path;node <bundlePath> startcommand-line identity match (not a bare substring);spawnToolsServerreaps its detached child on every reject path, so a readiness timeout can no longer leak a child that binds its port moments later as an untracked second server.The lock-free fast path (reuse a healthy tracked server) is unchanged, so steady‑state behaviour and latency are unaffected; the lock is only taken when a spawn is actually needed.
Tests
New
packages/argent-tools-client/test/launcher-duplicate-spawn.test.ts:autospawnserver is terminated (not orphaned) on respawn;managed: "cli"server is left alive instead of being killed;Each case was verified to fail against the pre-fix code and pass with the fix. Existing
launcher-spawn/launcher-statesuites are unchanged and green.🤖 Generated with Claude Code