fix(tool-server): prevent duplicate tool-servers after an nvm node switch by filip131311 · Pull Request #385 · software-mansion/argent

filip131311 · 2026-06-19T21:58:29Z

Problem

Switching Node versions with nvm can leave two (or more) tool-server processes alive at once while ~/.argent/tool-server.json tracks only one. The extra servers are orphaned (untracked, on leaked ports) until their 30‑minute idle timeout.

This was reproduced live: a machine had three node …/dist/tool-server.cjs start processes running, all spawned within the same minute, with state tracking only the last one.

Root cause

@argent/tools-client ensureToolsServer() ran a non-atomic
read state → check alive+healthy → clearState → spawn → writeState sequence with:

No cross-process serialization. An nvm switch relaunches argent mcp under the new Node while the previous MCP's 30s health monitor reconnects — several launchers run ensureToolsServer() at nearly the same time, all observe "no healthy server", and each spawns its own detached server on a fresh free port. Last‑writer‑wins on the state file orphans the rest.
No kill-before-respawn. When the tracked server was alive but unhealthy, the old code just clearState()d and spawned a new one, leaving the old detached process running untracked.

Fix (`packages/argent-tools-client/src/launcher.ts`)

O_EXCL spawn lock (~/.argent/tool-server.lock) serialises the spawn decision across processes: per-acquisition nonce, stale‑steal (dead holder / aged lock), bounded wait, and a hot‑spin‑proof reclaim loop. Inside the lock, ensureToolsServer double‑checks for a peer‑spawned healthy server before spawning, so a burst of concurrent launches collapses to a single server.
Kill-before-respawn terminates a wedged server we own before spawning its replacement, with three guards so it can never signal the wrong process:
- managed === "autospawn" — a user's argent server start server is now tagged managed: "cli" and is never killed by the auto‑spawn path;
- a structural node <bundlePath> start command-line identity match (not a bare substring);
- a re-check of that identity immediately before each signal (covers PID reuse across the SIGTERM grace window).
spawnToolsServer reaps its detached child on every reject path, so a readiness timeout can no longer leak a child that binds its port moments later as an untracked second server.

The lock-free fast path (reuse a healthy tracked server) is unchanged, so steady‑state behaviour and latency are unaffected; the lock is only taken when a spawn is actually needed.

Tests

New packages/argent-tools-client/test/launcher-duplicate-spawn.test.ts:

a 5× concurrent launch burst collapses to one server;
an alive‑but‑unhealthy autospawn server is terminated (not orphaned) on respawn;
a recycled pid whose command line doesn't match is never signalled;
a managed: "cli" server is left alive instead of being killed;
a readiness timeout reaps the spawned child (no orphan).

Each case was verified to fail against the pre-fix code and pass with the fix. Existing launcher-spawn / launcher-state suites are unchanged and green.

Note: the @swmansion/argent dispatcher › forwards SIGTERM to the child binary unit test is a pre-existing flake (alternates pass/fail independent of this change, which doesn't touch that package).

🤖 Generated with Claude Code

…itch Switching node versions via nvm could leave two or more tool-server processes alive at once while ~/.argent/tool-server.json tracked only one. Root cause in @argent/tools-client `ensureToolsServer()`: the read-state -> check-alive+healthy -> clearState -> spawn -> writeState sequence had no cross-process serialization (two launchers both spawn) and never terminated the previously-tracked detached server on respawn (orphan). An nvm switch relaunches `argent mcp` under the new node while the previous MCP's health monitor reconnects, so several launchers race and each spawns its own detached server on a fresh port; last-writer-wins state leaves the rest orphaned until their 30-min idle timeout. - Add an O_EXCL spawn lock (~/.argent/tool-server.lock) with a per-acquisition nonce, stale-steal (dead holder / aged lock), a bounded wait, and a hot-spin-proof reclaim loop. ensureToolsServer double-checks for a peer-spawned healthy server inside the lock before spawning. - Kill-before-respawn: terminate a wedged/unhealthy server we own before spawning its replacement, gated by (a) managed === "autospawn", (b) a structural `node <bundlePath> start` command-line identity match, and (c) a re-check immediately before each signal -- so a recycled pid or a user's `argent server start` (now tagged managed: "cli") is never killed. - spawnToolsServer reaps its detached child on every reject path so a readiness timeout can't leak an orphan that binds its port late. Adds launcher-duplicate-spawn.test.ts (burst-collapse, unhealthy-orphan, recycled-pid safety, CLI-server preserved, spawn-timeout reap); each case was verified to fail against the pre-fix code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tool-server): prevent duplicate tool-servers after an nvm node switch#385

fix(tool-server): prevent duplicate tool-servers after an nvm node switch#385
filip131311 wants to merge 1 commit into
mainfrom
fix/tool-server-duplicate-on-node-switch

filip131311 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

filip131311 commented Jun 19, 2026

Problem

Root cause

Fix (packages/argent-tools-client/src/launcher.ts)

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix (`packages/argent-tools-client/src/launcher.ts`)