Skip to content

fix(ios-profiler): make xctrace export async so native-profiler-stop can't trip the tool-server health check#384

Merged
filip131311 merged 1 commit into
mainfrom
fix/ios-profiler-async-export
Jun 19, 2026
Merged

fix(ios-profiler): make xctrace export async so native-profiler-stop can't trip the tool-server health check#384
filip131311 merged 1 commit into
mainfrom
fix/ios-profiler-async-export

Conversation

@filip131311

Copy link
Copy Markdown
Collaborator

Problem

On the newest main (0.12.1), native-profiler-stop on an iOS 26.5 sim returns:

Missing or invalid Authorization header. Tool-server requires `Authorization: Bearer <token>`…

…even though the trace export actually completes and the XML lands on disk. Reproduced profiling Settings on a 26.5 sim.

Root cause — event-loop starvation, not a profiler bug

  1. native-profiler-stop runs four xctrace export passes (TOC discovery, CPU, hangs, leaks) through execSyncWithTimeoutchild_process.execSync, which blocks the Node event loop for the entire export.
  2. iOS 26.5 forces the host-wide --all-processes capture fallback (the xctrace --device deadlock workaround, feat(ios-profiler): pluggable capture strategy with --all-processes fallback for the xctrace --device deadlock #380). That export is ~44 MB and the whole stop takes ~34 s (durationMs: 34083 in mcp-calls.log).
  3. While the loop is frozen, the tool-server's GET /tools health endpoint can't answer. The MCP client's ensureToolsServer() gates every call on isToolsServerHealthy() with a hard 2 s timeout:
    const healthy = await isToolsServerHealthy(state.port, host, 2e3, state.token); // 2s
    if (healthy) return {…reuse…};
    await clearState();
    const token = generateToken();   // ← respawn + NEW token
  4. The busy-but-alive server fails the 2 s check → the client declares it dead, spawns a second tool-server and rotates the auth token in ~/.argent/tool-server.json → the in-flight stop is 401'd. Confirmed live: one MCP client, two tool-server.cjs processes.

The recording start path already uses async spawn + await, which is why it never trips this; only the export path was synchronous.

Fix

Convert the four export passes to async (exec + await) so the event loop stays free to answer health checks while xctrace runs. Carried over verbatim onto the async wrapper:

promisify(exec) is resolved per-call (not at module load) so test suites that vi.doMock("child_process", …) without an exec export still import the module — matching the original lazy execSync usage.

This does not change stop latency (the export is genuinely slow) — it makes it non-blocking, which is the actual bug. Concurrent calls (health checks, screenshots) now work during a long stop.

Not addressed here (follow-ups)

  • MCP respawn hardening: ensureToolsServer() still rotates the token / abandons in-flight requests against a still-alive PID the moment a health check misses. Async export removes this trigger; the brittle respawn behaviour is a separate fix.
  • Analyze path: if native-profiler-analyze parses the 44 MB XML synchronously it can block the loop similarly — worth checking next.

Testing

  • tsc --noEmit clean
  • eslint clean on changed files
  • Full tool-server suite: 1314 tests pass (incl. test/ios-instruments/, 34 tests). Updated the exportIosTraceData mock from mockReturnValuemockResolvedValue.

…can't trip the tool-server health check

`native-profiler-stop` shells out to `xctrace export` four times (TOC
discovery, CPU, hangs, leaks) via `execSync`, which blocks the Node event
loop for the full export duration. Under the host-wide `--all-processes`
capture fallback (the Xcode 26.5 `--device` deadlock workaround) the CPU
export is ~44 MB and the whole stop takes ~34s.

While the loop is frozen the tool-server's `/tools` health endpoint stops
answering. The MCP client's `ensureToolsServer()` gates every call on
`isToolsServerHealthy()` with a hard 2s timeout; a busy-but-alive server
fails that check, so the client declares it dead, spawns a replacement
tool-server and rotates the auth token — which 401s the very stop request
that was about to succeed. The export still lands on disk, but the call
errors and the session is orphaned.

Convert the four export passes to async (`exec` + `await`) so the event
loop stays free to answer health checks while xctrace runs. The 60s
timeout and 256 MiB maxBuffer (the ENOBUFS fix) carry over to the async
wrapper unchanged. `promisify(exec)` is resolved per-call so test suites
that `vi.doMock("child_process")` without an `exec` export still import.

Does not change stop latency (the export is genuinely slow) — only makes
it non-blocking. The respawn-and-rotate-token behaviour against a live PID
remains a separate latent issue worth hardening on the MCP side.
@filip131311 filip131311 merged commit 732fb2c into main Jun 19, 2026
6 checks passed
@filip131311 filip131311 deleted the fix/ios-profiler-async-export branch June 19, 2026 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant