fix: make peek after kill deterministically report no session#46
Merged
Conversation
The daemon acked quit while its socket file and listener still existed; teardown unlinked the socket only afterwards. A control command issued right after kill returned could connect in that window, read EOF, and die with error.ConnectionLost (exit 1) instead of the documented no-session exit 3. This flaked the agent-loop integration test on macOS CI. Retire the listener (close it and unlink the socket file) before acking quit, so by the time kill returns no new client can find the socket. Also map ConnectionLost in mustControl to the no-session failure: an EOF on a control exchange means the daemon died before replying, which callers should see as the session being gone. This covers control connections racing a natural command exit as well.
Restores the daemon half of the race fix: close the listening socket and unlink its file before replying to quit, so the socket is gone by the time kill returns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the flaky
agent loop: new, send, wait, peek, killintegration test that failed on macOS CI (run 27382774714) withboo peek: wanted exit 3, got .{ .Exited = 1 }: error: ConnectionLost.Problem
The daemon acked
quitwhile its socket file and listener still existed; the socket was unlinked only later during teardown, afterWindow.destroyreaped the child. Apeekissued right afterkillreturned could connect in that window, read EOF, and exit 1 witherror.ConnectionLostinstead of the documented no-session exit 3.Fix
quit. By the timekillreturns, no new client can find the socket, making the sequential kill-then-peek case deterministic.error.ConnectionLostinmustControlto the no-session failure (exit 3). An EOF on a control exchange means the daemon died before replying, which also covers control connections racing a natural command exit.Verification
A/B with a temporary 100ms sleep at the top of daemon teardown to widen the race window, driving a 30-iteration new/kill/peek loop:
error: ConnectionLost, exit 1 (exact CI failure)Also
zig build test-all: 132/132 passed, and a 300-iteration stress loop on the release build: 0 failures.Investigation notes
sessionInfoalready treats any control error as "daemon gone", solsandwait --idlewere robust;mustControlcallers (send,peek,wait --text,kill,rename) were not.deinitrunsWindow.destroy(child reaping) before unlinking the socket.Daemon.deinitnow reusesretireListener, which guards against double-close after an early retire.This PR was generated by Coder Agents on behalf of @kylecarbs.