Skip to content

feat(trace): opt-in per-repo tracing via daemonless session-end mode#97

Open
titothedeveloper wants to merge 11 commits into
mainfrom
feat/daemonless-session-end
Open

feat(trace): opt-in per-repo tracing via daemonless session-end mode#97
titothedeveloper wants to merge 11 commits into
mainfrom
feat/daemonless-session-end

Conversation

@titothedeveloper

@titothedeveloper titothedeveloper commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Makes Claude Code tracing opt-in and per-repo, with no machine-wide always-on daemon. Today the plugin runs a single global daemon that traces every session on the machine; this PR lets a user opt into a daemonless session-end mode (trace_mode: "daemon" | "session-end", default daemon) and scope tracing to chosen repos (trace_roots). In session-end mode the daemon is bypassed entirely: at SessionEnd a pure function reconstructs the full GenAI-convention span tree from the completed transcript in one pass, uploads, and exits — no daemon, no socket.

Why these belong together: honest per-repo scoping is only possible because it's daemonless. A single global daemon is config-blind, so "scope it to my repo" is a false boundary (it keeps tracing everything while you believe it's scoped). Removing the shared process lets each session be evaluated independently from its own cwd — so daemonless is the mechanism, opt-in + per-repo is the point.

Default behavior is byte-identical, so this is non-disruptive, and it removes the daemon-fragility failure class (stale sockets, Unknown session, restarts) for users who opt in.

Scope (trace_roots) is currently set via the WEAVE_TRACE_ROOTS env var or settings.json; a friendly config trace add/list/remove CLI is a fast-follow.

What changes

Setting Value Effect
trace_mode (new) daemon (default) Unchanged — persistent daemon builds spans incrementally across hook events.
trace_mode session-end Daemonless: SessionEnd hook rebuilds the whole span tree from the transcript and uploads. No daemon process, no socket.

The emitted span tree mirrors the daemon's shape: per turn → invoke_agent <agentName> root → chat <model> + execute_tool <name> (results paired) + invoke_agent <subagent_type> for Agent calls, recursing subagent transcripts (all turns). Note: the session-end builder additionally stamps each child span with its owning gen_ai.agent.name/agent.id and uses real transcript timestamps on tool and invoke_agent spans (see "Agent-teams correctness fixes" below). The daemon does not yet do this, so session-end's per-span attribution/timing is now richer than the daemon's — the tree structure is the same, the child-span metadata is not byte-identical. (The default daemon mode itself is untouched and byte-identical to today.)

Architecture

trace_mode=daemon (default, unchanged):
  hooks ──events──▶ persistent daemon (holds spans open) ──OTLP──▶ Weave Agents

trace_mode=session-end (new):
  SessionEnd hook ──▶ weave-claude-code session-end
       │  (reads completed transcript)
       ▼
  buildTrace(transcript) ── one pass ──▶ OTLP exporter ──flush──▶ Weave Agents
       └─ coordinator transcript + subagents/agent-*.jsonl (recursed)

Changes

New

File Lines Change
src/buildTrace.ts +461 Pure buildTrace(tracer, transcriptPath, opts). Turn/chat/tool/subagent emission; meta-pool subagent correlation + leftover/re-spawn pass with subset dedup; per-span agent attribution; real transcript-time envelopes; scope gate.
src/sessionEnd.ts +154 SessionEnd handler: parse payload, resolve config (env > settings, daemon parity), scope-gate (cwd from transcript), skip subagent transcripts, build, flush. Never throws.
src/tracerProvider.ts +57 OTLP exporter factory (daemon's wiring factored out; daemon keeps its own copy).
tests/build-trace.test.ts +310 Single-turn, multi-turn, subagent subtree (all turns), agent-teams teammate (no toolUseId + re-spawn), scope gate, missing transcript.
tests/nest-test-tree.test.ts +194 Full agent-teams contract (A1–A11) against a real two-transcript re-spawn fixture: merged sub-agent, deduped tool counts, distinct agent_id, per-span attribution, real ordered timestamps, parent-envelopes-child, no orphans.
tests/session-end.test.ts +145 build+flush, config-skip, bad-payload, env-override, traceRoots.
tests/parser-tool-calls.test.ts +86 tool_use↔tool_result pairing, error flag, unpaired.
tests/install-preserves-config.test.ts +61 reinstall keeps weave_project/api_key/trace_mode.
tests/config-show-trace-mode.test.ts +32 config show surfaces trace_mode + live daemon state.

Modified

File Lines Change
src/parser.ts +103/-7 Additive toolCalls() (pairs tool_use with tool_result, now with per-call start/result timestamps) and prompt(). Daemon never calls them.
src/genaiSpans.ts +70 Optional agentName/agentId (owning agent) + startedAt on tool/chat/invoke_agent spans. All optional → daemon output unchanged.
src/cli.ts +71 session-end command (stdin, hook-safe, never throws) + config set trace_mode + live daemon-state line in config show.
src/setup.ts +34 Settings.trace_mode (optional; legacy files read as daemon); install preserves existing config instead of wiping it.
src/utils.ts +23 isCwdInScope scope gate (opt-in trace_roots; empty = global).
hooks/hook-handler.sh +22 session-end branch: bypasses daemon; only SessionEnd runs the builder.

Agent-teams correctness fixes

Dogfooding the builder against real agent-teams runs (/nest-test — coordinator + a teammate across two turns; /triage — coordinator + 6 specialists) surfaced three defects in the Agents view, each fixed + regression-tested in tests/nest-test-tree.test.ts and verified against live uploads in the W&B Agents store. All three live in the session-end path / shared genaiSpans; default daemon mode is unaffected (it passes none of the new optional args).

Fix Symptom Cause → resolution Commit
Re-spawn dedup Teammate's tools double-counted (e.g. 8 Bash instead of 5) The harness leaves two transcripts for one teammate (partial + complete), same agentType, no toolUseId. The leftover pass merged both. → listSubagents drops any teammate transcript whose tool_use-id set is a subset of a same-type sibling's, keeping the most complete. 3ce0104
Per-span agent attribution Sub-agent's tools rendered under the top-level agent; coordinator and sub-agent indistinguishable execute_tool/chat spans had no gen_ai.agent.name/agent.id; the Agents view groups by agent identity, not span tree. → thread an owning {agentName, agentId} and stamp each child span. 3ce0104
Real timestamps turn-2 tools appeared before turn-1; sub-agent invoke_agent wrapper stamped ~minutes after its own children execute_tool and invoke_agent spans had no startTime → build-time now() at SessionEnd. → parse real per-call times (assistant-msg start, tool_result end) and per-sub-agent transcript envelopes; stamp them. 7747b5d, 79b86ea

Known follow-up (see Known limitations): async teammates can outlive their spawning coordinator turn; we keep real times rather than fabricate a stretched coordinator end. The daemon shares genaiSpans but passes no owner/startedAt, so it still emits unattributed child spans — a separate daemon-side change.

How we got here

# Approach Why it failed / evolved
1 Keep hardening the persistent daemon Root cause of multi-day pain (socket races, Unknown session, global-always-on). Hardening doesn't remove the fragility class for our use.
2 Point native Claude Code OTel straight at Weave Native spans use claude_code.* names → land in Traces, not the Agents view. A GenAI-convention transform is mandatory; can't drop the translation.
3 Delete the daemon's socket server (~1000 LOC) for this mode Rejected — perturbs shared code and breaks default mode. Achieved the split by routing instead: session-end never reaches daemon.ts.
4 Subagent correlation by meta.json toolUseId only Caught by oracle-diff: agent-teams teammates (the primary workload) have no toolUseId in their meta — only agentType, which is the team-member name, not subagent_type. Produced empty subtrees for every team session.
5 Meta-pool correlation: claim by toolUseId → team namesubagent_type (FIFO) + leftover pass (this PR) Captures plain subagents and agent-teams teammates incl. re-spawns; re-spawns nest under the existing invoke_agent so coordinator child-count stays = #Agent calls (matches daemon).

Design decisions

Why opt-in (default stays daemon) instead of replacing the daemon?
Upstream is actively hardening the daemon; default must stay byte-identical so existing users are unaffected and the change is mergeable. Opt-in flippers (e.g. heavy agent-teams users) get the daemonless path and the fragility ends for them.

Why reconstruct at SessionEnd instead of incrementally?
The transcript at session end is the complete, authoritative record. A single pass over it is a pure transcript → spans function — deterministically testable against fixtures and the entire local transcript corpus, with no race against the writer and no cross-event state machine.

Why toolUseIdnamesubagent_type claim order?
Plain Agent-tool subagent metas carry toolUseId (exact link). Agent-teams teammate metas don't — their agentType equals the team-member name from the Agent call, which differs from subagent_type. Falling through all three correlates both kinds; FIFO + a leftover pass handle re-spawns.

Why route in the hook instead of deleting daemon code?
Smaller, lower-risk diff that keeps the default path provably untouched (daemon.ts unchanged). hooks.json stays static; non-SessionEnd events no-op cheaply in session-end mode.

Constraints

  • Default unchanged: trace_mode defaults to daemon; legacy settings files (field absent) read as daemon. No behavior change for existing users.
  • Fail-open: the session-end command and hook branch always exit 0 — a tracing failure never disrupts Claude Code.
  • Daemon path unperturbed: parser.ts additions are read only by new code; tracerProvider.ts is separate from daemon.ts's own tracer init.
  • No new dependencies.
  • Never drops work: oracle-diff over 189 real subagent sessions found 0 under-counts.

Test results

Command: npm run build && npm test
Output:

tsc strict — clean
tests 80 / pass 80 / fail 0

Additional validation (not in CI):

  • buildTrace run over all 1,336 local transcripts → 0 errors.
  • Oracle-diff vs an independent Python reconstruction → subagent-tool totals 189/189 exact, coordinator 172/189 exact + 17 over-counts (orphan teammates surfaced) + 0 under-counts.
  • Live e2e: a real claude -p session with trace_mode=session-end → its own SessionEnd hook → built + uploaded via live OTLP → verified in the W&B Agents store (turn root with its chat + Bash + Read spans). A second agent-teams /triage transcript verified all 6 specialists nested as turn-children with full tool subtrees (incl. re-spawns).
  • The live e2e surfaced one bug, fixed here: re-spawn leftovers set RESPONSE_MODEL on an already-ended invoke_agent span (dropped attr); fixed by deferring span-end until after the leftover pass, with a regression test.

What this PR does NOT include

Item Why deferred
Permission events (weave.permission_request) Hook-only, non-structural enrichment; not recoverable from the transcript in v1.
Compaction stats (weave.compaction.*) Hook-only, non-structural; deferred to a stateless sidecar in a follow-up.
Resume/forked-session conversation stitching v1 uses sessionId as conversationId; multi-session stitching can be layered on later.
trace_roots CLI subcommand Scope gate (isCwdInScope) ships here, but per-repo scoping is settable only via WEAVE_TRACE_ROOTS env / settings.json on this branch; the config trace subcommand lives on the scope branch.

Known limitations (v1)

These are real and not yet addressed — call them out before relying on this in anger:

  • Upload only on clean exit. The trace is built and sent when SessionEnd fires. If the process is hard-killed, the terminal/laptop is closed, or it crashes, SessionEnd may not fire and the whole session's trace is lost (the daemon, by contrast, streams incrementally). Verify Claude Code's SessionEnd-on-close behavior before depending on it.
  • No retry / no buffering. It's a single upload attempt. If the network is down at exit, the OTLP export fails and nothing persists it for a later flush. Durability fix would be a local disk spool/outbox.
  • No idempotency. Re-running the builder, or SessionEnd firing more than once, produces duplicate span trees under the same conversation_id (each build mints fresh span ids). A dedup key / spool would be needed for safe retries.
  • Async teammates outlive their spawning turn. A teammate's invoke_agent span now carries its real time envelope, but because agent-teams teammates run asynchronously they can end after the coordinator turn that spawned them — so a child span may extend past its parent turn's end time. This is faithful to reality (we deliberately do not fabricate a stretched coordinator end), but consumers assuming strict parent-envelopes-child timing across the coordinator→teammate boundary should be aware.
  • Scope is symlink-sensitive. isCwdInScope uses path.resolve, not realpath, so a trace_roots of /tmp/x won't match a cwd reported as /private/tmp/x (macOS). Use realpaths in trace_roots.

Planned follow-ups: per-turn Stop-hook builds (incremental, survives unclean exit) and a disk spool (survives network outage + crash, with dedup) — which together would make daemonless more durable than the daemon.

Try it yourself (clean-install cookbook)

End-to-end, the way a user would adopt it. Run in a normal terminal.

# 1. Stop any existing daemon (nothing stale lingers)
pkill -f 'weave-claude-code daemon' 2>/dev/null; rm -f ~/.weave-claude-code/daemon.sock

# 2. Install this build as the global plugin (CLI + marketplace source)
cd <path-to-this-checkout> && npm run build && npm install -g .

# 3. Clear the cache and reinstall from the local build
#    (needed when the version is unchanged, so the new hooks actually load)
rm -rf ~/.claude/plugins/cache/weave-claude-code
weave-claude-code install --source=local --force

# 4. Opt in and confirm
weave-claude-code config set trace_mode session-end
weave-claude-code config show     # → trace_mode: session-end [settings]; socket "(unused in session-end mode)"

# 5. Verify the daemonless hook is what will load (must be >= 1)
grep -c 'trace_mode' ~/.claude/plugins/cache/weave-claude-code/weave/*/hooks/hook-handler.sh

# 6. Open Claude, do real work, then EXIT (exit is what fires the SessionEnd upload)
cd <a-repo> && claude
#   …read a file, run a bash command… then /exit

# 7. The trace appears in your Weave project — no daemon involved.

Revert to the released daemon build when done:

npm install -g weave-claude-code
rm -rf ~/.claude/plugins/cache/weave-claude-code && weave-claude-code install --force
weave-claude-code config set trace_mode daemon

🤖 Generated with Claude Code

Add `trace_mode: "daemon" | "session-end"` (default `daemon`, unchanged).
In `session-end` mode the persistent daemon is bypassed entirely: a pure
function reconstructs the full GenAI-convention span tree from the completed
transcript in one pass at SessionEnd, uploads, and exits — no daemon, no
socket. This removes the daemon-fragility class of failures (stale sockets,
"Unknown session", restarts) for users who opt in, while keeping default
behavior byte-identical so the change is non-disruptive and upstream-mergeable.

- src/buildTrace.ts: pure `buildTrace(tracer, transcriptPath, opts)`. Per turn
  emits invoke_agent root → chat spans → execute_tool spans (results paired) →
  invoke_agent for Agent calls, recursing subagent transcripts (ALL turns).
  Subagent correlation enumerates the on-disk meta pool and claims by
  toolUseId → team-member name → subagent_type (FIFO), with a leftover pass
  nesting re-spawns under the existing invoke_agent (coordinator child count
  stays = #Agent calls, matching the daemon). Reuses genaiSpans emit helpers.
- src/sessionEnd.ts: SessionEnd handler — parse payload, resolve config (env >
  settings, daemon parity), scope-gate, build, flush. Never throws.
- src/tracerProvider.ts: OTLP exporter factory (daemon's wiring, factored out;
  daemon retains its own copy — additive, daemon path unperturbed).
- src/parser.ts: additive toolCalls() (tool_use↔tool_result pairing) + prompt().
- src/utils.ts: isCwdInScope scope gate (opt-in trace_roots; empty = global).
- src/cli.ts: `session-end` command (stdin, hook-safe) + `config set trace_mode`.
- hooks/hook-handler.sh: session-end branch bypasses the daemon; only SessionEnd
  runs the builder.

v1 intentionally drops permission events and compaction stats (non-structural).

Validated: 59/59 unit tests; tsc strict clean; daemon path unperturbed;
buildTrace over all 1336 local transcripts (0 errors); oracle-diff vs an
independent reconstruction — subagent-tool totals 189/189 exact, 0 under-counts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

titothedeveloper and others added 2 commits June 15, 2026 09:04
…pass

Live e2e against a real agent-teams /triage transcript surfaced "operation on
ended Span" warnings: the re-spawn leftover pass attaches a teammate transcript
(and sets RESPONSE_MODEL) onto the invoke_agent span created by the spawning
Agent call — but that span was already .end()ed in emitToolCall, so the
setAttribute was silently dropped.

Defer all invoke_agent .end() calls into an openInvokes list flushed after the
leftover pass (top level and recursively in emitAgentSubtree). Child spans were
already exported correctly (parent context only needs a valid span id), so span
counts/nesting are unchanged — this just lets re-spawn attribute writes land.

Regression test: a re-spawn supplies the model the claimed transcript lacks, so
RESPONSE_MODEL can only appear if the span is still open.

Verified end-to-end: session-end → live OTLP → W&B Agents store shows all 6
specialists nested as turn-children with full tool subtrees, no warnings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a "Tracing Modes" README section: how each mode works, why you'd switch
to session-end (removes the daemon's stale-socket/Unknown-session failure
class), the trade-off (drops permission + compaction enrichments in v1), and
the `config set trace_mode` opt-in command. Makes explicit that the default is
`daemon` and upgrading changes nothing unless you opt in.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@titothedeveloper titothedeveloper changed the title feat(trace): daemonless session-end trace mode (opt-out) feat(trace): daemonless session-end trace mode (opt-in) Jun 15, 2026
@titothedeveloper titothedeveloper marked this pull request as ready for review June 15, 2026 13:40
@titothedeveloper titothedeveloper requested a review from a team as a code owner June 15, 2026 13:40
titothedeveloper and others added 5 commits June 15, 2026 10:04
…session-end

`config show` listed the daemon_socket unconditionally but never showed
trace_mode, so a user who set session-end couldn't confirm it and the socket
line read as "a daemon is configured/running" when it's just a path string.

Show `trace_mode` (env override > settings > default daemon) with its source,
and annotate daemon_socket as "(unused in session-end mode)" when daemonless.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The socket path alone doesn't tell you whether a daemon is actually running.
Probe it and annotate: "(daemon running)", "(stale socket — daemon not
responding)", or "(no daemon running)" in daemon mode; "(unused in session-end
mode)" when daemonless.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`createConfig` unconditionally wrote a fresh settings.json with null
weave_project / wandb_api_key and trace_mode=daemon, so re-running (or a failed)
`weave-claude-code install` on an already-configured machine destroyed the
user's credentials and mode. Make it convergent: read the existing file and
preserve every user-controllable field (project, api key, agent_name, debug,
trace_mode, custom paths, installed_at); only the version is refreshed. Fresh or
unreadable configs still get safe defaults.

Found live while dogfooding the daemonless install.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…anscripts

Two bugs surfaced dogfooding the daemonless hook on a real agent-teams session:

1. Scope gate fail-closed and dropped EVERYTHING when trace_roots was set,
   because the SessionEnd payload omits cwd (interactive sessions). Derive cwd
   from the transcript (Claude Code stamps it on transcript lines) when the
   payload lacks it, so scope works regardless of payload shape.

2. Each teammate session fires its own SessionEnd, which would build a standalone
   duplicate trace on top of being nested in the coordinator's build. Skip
   transcripts under a /subagents/ dir — the parent coordinator captures them.

Regression tests for both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…agent.id

The Agents view reads token totals and session time off the turn-root /
invoke_agent spans, not by summing chat children — the builder left those
spans with no usage and ~0 duration, so the UI showed "0 tokens" and "4ms".
Also left gen_ai.agent.id empty on sub-agents.

- setUsageAttrs() stamps aggregate gen_ai.usage.* (OTel input total + output +
  cache + reasoning) on the turn root (turn.totalUsage) and each sub-agent
  invoke_agent span (summed across its turns).
- Turn root gets real startTime/endTime from transcript timestamps.
- Sub-agent invoke_agent gets gen_ai.agent.id, read from its agent-<id>.jsonl
  transcript filename.

Verified live: turn roots now ~1.5s (was ~0), tokens populate, agent_id set.
This is strictly better than the daemon, which also showed tokens=0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@titothedeveloper titothedeveloper changed the title feat(trace): daemonless session-end trace mode (opt-in) feat(trace): opt-in per-repo tracing via daemonless session-end mode Jun 15, 2026
titothedeveloper and others added 2 commits June 15, 2026 13:59
…-end mode

Two defects surfaced via a real /nest-test trace (coordinator + one teammate
driven across two turns):

1. Re-spawn double-counting. The harness leaves multiple transcripts for a
   single teammate (a partial snapshot + the complete one) sharing the same
   agentType and no toolUseId; the leftover-merge pass appended both, counting
   the overlapping turns' tool calls twice (e.g. 8 Bash instead of 5).
   listSubagents now drops any teammate transcript whose tool_use-id set is a
   subset of another same-type entry's, keeping the most complete.

2. Unattributed child spans. execute_tool and chat spans carried no
   gen_ai.agent.name / gen_ai.agent.id, so the Weave Agents view (which groups
   by agent identity, not span tree) rendered every tool under the top-level
   agent. Thread an owning {agentName, agentId} through emitToolCall /
   emitAgentSubtree / emitChatSpansFromAssistantCalls and stamp it on each
   child span: coordinator children -> coordinator, teammate children ->
   teammate + id.

New tests/nest-test-tree.test.ts asserts the full tree against a real
two-transcript re-spawn fixture: one merged invoke_agent, 5 Bash + 2
SendMessage (not double-counted), distinct non-empty agent_id, and every
child span carrying its owning agent. 78/78 green, tsc clean.

The daemon shares genaiSpans but does not pass owner, so it still emits
unattributed child spans — tracked separately for upstream.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
execute_tool spans were created with no startTime, so buildTrace stamped every
tool span with build-time now() at SessionEnd — collapsing all tool calls to a
single instant (~minutes after the real session) and destroying execution
order. In the UI this showed e.g. turn-2-done before turn-1-done, and tools
appeared after all chat spans instead of interleaved.

Parse each tool call's real times — assistant-message timestamp (tool_use
issued = span start) and tool_result message timestamp (result returned = span
end) — and stamp them on the span, mirroring chat spans. End falls back to the
start when no result is present, so a missing result gives a near-zero-duration
span rather than stretching to now().

New A10 asserts nest-probe's Bash spans have distinct, ordered real timestamps
(turn-1 before turn-2), not one build-time instant. 79/79 green.

The daemon shares these helpers but passes no startedAt (its spans are created
live at the real event time), so its behaviour is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@titothedeveloper

Copy link
Copy Markdown
Contributor Author

Follow-up: agent-teams attribution fixes from live /nest-test dogfooding

Two commits added on top of the original daemonless feature, after dogfooding the session-end builder against a real agent-teams run (/nest-test: coordinator + one teammate driven across two turns). Three defects surfaced in the Agents view that the original build-trace tests didn't cover, all now fixed + regression-tested.

Bugs found & fixed

# Symptom (Agents view) Root cause Fix Commit
1 Teammate's tools double-counted (8 Bash instead of 5) The harness leaves two transcripts for one teammate (a partial snapshot + the complete one), same agentType, no toolUseId. The leftover-merge pass appended both, re-counting the overlapping turn. listSubagents drops any teammate transcript whose tool_use-id set is a subset of another same-type entry's (keeps the most complete). 3ce0104
2 Clicking the coordinator and the sub-agent showed the same view; all tools rendered under the top-level agent execute_tool / chat spans carried no gen_ai.agent.name / agent.id — the Agents view groups by agent identity, not span tree, so unattributed spans rolled up to the root. Thread an owning {agentName, agentId} through the emit path; stamp each child span (coordinator children → coordinator, teammate children → teammate + id). 3ce0104
3 turn-2-done appeared before turn-1-done; tools sat after all chats execute_tool spans had no startTime → every tool stamped with build-time now() at SessionEnd, collapsing all tools to one instant. Parse each tool call's real times (assistant-msg = start, tool_result = end) and stamp them, mirroring chat spans. 7747b5d

Tests

  • New tests/nest-test-tree.test.ts (+ real two-transcript re-spawn fixture under tests/fixtures/nest-test/) asserts the full contract A1–A10: one merged invoke_agent, nests under coordinator, distinct non-empty agent_id, 5 Bash + 2 SendMessage (deduped), every child carries its owning agent, distinct ordered real timestamps (turn-1 before turn-2), coordinator-only roots, no orphans.
  • 79/79 tests green, tsc --noEmit clean. Code review: no Critical/Important.

Verified live (not a re-upload)

A real /nest-test run uploaded through the session-end hook — conversation 3ac3fd99-f20a-4759-8674-51dc6b5f0b14 — passes 10/10 checks in the Agents store: two distinct agents, correct per-tool attribution, deduped counts, ordered timestamps.

Known limitations (not addressed here)

  • Daemon shares genaiSpans but passes no owner, so the daemon still emits child spans without agent.name/agent.id → same "tools under the top-level agent" issue in daemon mode. (Daemon tool timestamps are fine — its spans are created live.) Tracked for a separate daemon-side fix.
  • Weave Conversations turn view lists a turn's tool calls flat and does not surface per-tool agent attribution, even though the span data now carries it (the Spans and Agents views render it correctly). Appears to be a UI-side rendering gap.

…me now())

Sub-agent invoke_agent spans were created via startInvokeAgentSpan with no
startTime and ended with bare .end(), so buildTrace stamped them with
build-time now() at SessionEnd. A teammate that ran for ~52s at 17:55 showed
its invoke_agent wrapper at 18:09 (the build moment) — a 1ms span ~14 minutes
AFTER its own children, which all carry real transcript times. The wrapper
fell entirely outside its children's time range.

Compute each subagent's real [start, end] envelope from its transcript line
timestamps (transcriptBounds) and stamp it: startTime at creation, real end
when the span is finally closed (tracked per-span in an invokeEnd map; extended
across re-spawn merges). This makes every invoke_agent span wrap its children
in real time. Per the "use the real timestamp, never fabricate" principle we
deliberately do NOT stretch the coordinator turn root to cover async teammates
that outlive it — its real end stays its real end.

New A11 asserts every invoke_agent span starts at/before its first child and
ends at/after its last (real-time envelope, no build-time now()). 80/80 green.
Verified live: re-running the fixed build on a real /nest-test transcript, the
nest-probe wrapper now spans its true 52s window.

Daemon unaffected (its spans are created live at real event time).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant