Summary
The Go harness only writes trajectory files during its shutdown sequence (agent.Close() triggered by WebSocket close). There is no API to flush or checkpoint the trajectory mid-session. If the host process is killed (e.g., SIGKILL), all conversation history since agent start is lost.
Context
When using the SDK inside an ACP agent hosted by an IDE (Zed, IntelliJ), the IDE controls the agent process lifetime. Some IDEs (notably Zed — zed#59323) send SIGKILL on disconnect with no SIGTERM grace period. Even IDEs that send SIGTERM may not wait long enough for the Go harness to serialize.
The current shutdown sequence (from LocalConnection.disconnect()):
- Close WebSocket → triggers Go
defer agent.Close() → trajectory serialized
- Close stdin →
cleanupAllAgents → os.Exit(0)
- Wait 5s, then SIGTERM, then SIGKILL
This only works when the Python side initiates an orderly shutdown via agent.__aexit__(). If the process is killed externally, the Go harness dies with it and the trajectory is never written.
Requested feature
An API to save/checkpoint the trajectory to disk without tearing down the agent. For example:
# After each prompt completes, persist the trajectory
response = await agent.chat(prompt)
async for chunk in response.chunks:
...
await agent.save_trajectory() # writes to save_dir without disconnecting
Or the Go harness could auto-flush after each turn completes (when it transitions to STATE_IDLE).
Workaround
Currently we guard against missing trajectories by checking if the file exists before passing conversation_id to a new agent:
traj_path = Path(save_dir) / f"traj-{conversation_id}"
if not traj_path.exists():
log.warning("trajectory not found, starting fresh")
conversation_id = None
This prevents crashes but loses all conversation history.
Impact
- Session resume is unreliable — depends on whether the process got a clean shutdown
- IDE integrations that cannot guarantee graceful exit lose all conversation state
- Users see "starting fresh" instead of resuming where they left off
Summary
The Go harness only writes trajectory files during its shutdown sequence (
agent.Close()triggered by WebSocket close). There is no API to flush or checkpoint the trajectory mid-session. If the host process is killed (e.g.,SIGKILL), all conversation history since agent start is lost.Context
When using the SDK inside an ACP agent hosted by an IDE (Zed, IntelliJ), the IDE controls the agent process lifetime. Some IDEs (notably Zed — zed#59323) send
SIGKILLon disconnect with noSIGTERMgrace period. Even IDEs that sendSIGTERMmay not wait long enough for the Go harness to serialize.The current shutdown sequence (from
LocalConnection.disconnect()):defer agent.Close()→ trajectory serializedcleanupAllAgents→os.Exit(0)This only works when the Python side initiates an orderly shutdown via
agent.__aexit__(). If the process is killed externally, the Go harness dies with it and the trajectory is never written.Requested feature
An API to save/checkpoint the trajectory to disk without tearing down the agent. For example:
Or the Go harness could auto-flush after each turn completes (when it transitions to
STATE_IDLE).Workaround
Currently we guard against missing trajectories by checking if the file exists before passing
conversation_idto a new agent:This prevents crashes but loses all conversation history.
Impact