fix(ios-profiler): make xctrace export async so native-profiler-stop can't trip the tool-server health check#384
Merged
Conversation
…can't trip the tool-server health check
`native-profiler-stop` shells out to `xctrace export` four times (TOC
discovery, CPU, hangs, leaks) via `execSync`, which blocks the Node event
loop for the full export duration. Under the host-wide `--all-processes`
capture fallback (the Xcode 26.5 `--device` deadlock workaround) the CPU
export is ~44 MB and the whole stop takes ~34s.
While the loop is frozen the tool-server's `/tools` health endpoint stops
answering. The MCP client's `ensureToolsServer()` gates every call on
`isToolsServerHealthy()` with a hard 2s timeout; a busy-but-alive server
fails that check, so the client declares it dead, spawns a replacement
tool-server and rotates the auth token — which 401s the very stop request
that was about to succeed. The export still lands on disk, but the call
errors and the session is orphaned.
Convert the four export passes to async (`exec` + `await`) so the event
loop stays free to answer health checks while xctrace runs. The 60s
timeout and 256 MiB maxBuffer (the ENOBUFS fix) carry over to the async
wrapper unchanged. `promisify(exec)` is resolved per-call so test suites
that `vi.doMock("child_process")` without an `exec` export still import.
Does not change stop latency (the export is genuinely slow) — only makes
it non-blocking. The respawn-and-rotate-token behaviour against a live PID
remains a separate latent issue worth hardening on the MCP side.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On the newest main (0.12.1),
native-profiler-stopon an iOS 26.5 sim returns:…even though the trace export actually completes and the XML lands on disk. Reproduced profiling Settings on a 26.5 sim.
Root cause — event-loop starvation, not a profiler bug
native-profiler-stopruns fourxctrace exportpasses (TOC discovery, CPU, hangs, leaks) throughexecSyncWithTimeout→child_process.execSync, which blocks the Node event loop for the entire export.--all-processescapture fallback (thexctrace --devicedeadlock workaround, feat(ios-profiler): pluggable capture strategy with --all-processes fallback for the xctrace --device deadlock #380). That export is ~44 MB and the whole stop takes ~34 s (durationMs: 34083inmcp-calls.log).GET /toolshealth endpoint can't answer. The MCP client'sensureToolsServer()gates every call onisToolsServerHealthy()with a hard 2 s timeout:~/.argent/tool-server.json→ the in-flight stop is 401'd. Confirmed live: one MCP client, twotool-server.cjsprocesses.The recording start path already uses async
spawn+await, which is why it never trips this; only the export path was synchronous.Fix
Convert the four export passes to async (
exec+await) so the event loop stays free to answer health checks whilexctraceruns. Carried over verbatim onto the async wrapper:timeout: 60_000— still caps a genuinely-stuckxctrace.maxBuffer: 256 MiB— the ENOBUFS fix (fix(ios-profiler): set maxBuffer on xctrace export to stop ENOBUFS killing every export #382);exechonors it.promisify(exec)is resolved per-call (not at module load) so test suites thatvi.doMock("child_process", …)without anexecexport still import the module — matching the original lazyexecSyncusage.This does not change stop latency (the export is genuinely slow) — it makes it non-blocking, which is the actual bug. Concurrent calls (health checks, screenshots) now work during a long stop.
Not addressed here (follow-ups)
ensureToolsServer()still rotates the token / abandons in-flight requests against a still-alive PID the moment a health check misses. Async export removes this trigger; the brittle respawn behaviour is a separate fix.native-profiler-analyzeparses the 44 MB XML synchronously it can block the loop similarly — worth checking next.Testing
tsc --noEmitcleaneslintclean on changed filestest/ios-instruments/, 34 tests). Updated theexportIosTraceDatamock frommockReturnValue→mockResolvedValue.