feat(trace): opt-in per-repo tracing via daemonless session-end mode#97
feat(trace): opt-in per-repo tracing via daemonless session-end mode#97titothedeveloper wants to merge 11 commits into
Conversation
Add `trace_mode: "daemon" | "session-end"` (default `daemon`, unchanged). In `session-end` mode the persistent daemon is bypassed entirely: a pure function reconstructs the full GenAI-convention span tree from the completed transcript in one pass at SessionEnd, uploads, and exits — no daemon, no socket. This removes the daemon-fragility class of failures (stale sockets, "Unknown session", restarts) for users who opt in, while keeping default behavior byte-identical so the change is non-disruptive and upstream-mergeable. - src/buildTrace.ts: pure `buildTrace(tracer, transcriptPath, opts)`. Per turn emits invoke_agent root → chat spans → execute_tool spans (results paired) → invoke_agent for Agent calls, recursing subagent transcripts (ALL turns). Subagent correlation enumerates the on-disk meta pool and claims by toolUseId → team-member name → subagent_type (FIFO), with a leftover pass nesting re-spawns under the existing invoke_agent (coordinator child count stays = #Agent calls, matching the daemon). Reuses genaiSpans emit helpers. - src/sessionEnd.ts: SessionEnd handler — parse payload, resolve config (env > settings, daemon parity), scope-gate, build, flush. Never throws. - src/tracerProvider.ts: OTLP exporter factory (daemon's wiring, factored out; daemon retains its own copy — additive, daemon path unperturbed). - src/parser.ts: additive toolCalls() (tool_use↔tool_result pairing) + prompt(). - src/utils.ts: isCwdInScope scope gate (opt-in trace_roots; empty = global). - src/cli.ts: `session-end` command (stdin, hook-safe) + `config set trace_mode`. - hooks/hook-handler.sh: session-end branch bypasses the daemon; only SessionEnd runs the builder. v1 intentionally drops permission events and compaction stats (non-structural). Validated: 59/59 unit tests; tsc strict clean; daemon path unperturbed; buildTrace over all 1336 local transcripts (0 errors); oracle-diff vs an independent reconstruction — subagent-tool totals 189/189 exact, 0 under-counts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
…pass Live e2e against a real agent-teams /triage transcript surfaced "operation on ended Span" warnings: the re-spawn leftover pass attaches a teammate transcript (and sets RESPONSE_MODEL) onto the invoke_agent span created by the spawning Agent call — but that span was already .end()ed in emitToolCall, so the setAttribute was silently dropped. Defer all invoke_agent .end() calls into an openInvokes list flushed after the leftover pass (top level and recursively in emitAgentSubtree). Child spans were already exported correctly (parent context only needs a valid span id), so span counts/nesting are unchanged — this just lets re-spawn attribute writes land. Regression test: a re-spawn supplies the model the claimed transcript lacks, so RESPONSE_MODEL can only appear if the span is still open. Verified end-to-end: session-end → live OTLP → W&B Agents store shows all 6 specialists nested as turn-children with full tool subtrees, no warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a "Tracing Modes" README section: how each mode works, why you'd switch to session-end (removes the daemon's stale-socket/Unknown-session failure class), the trade-off (drops permission + compaction enrichments in v1), and the `config set trace_mode` opt-in command. Makes explicit that the default is `daemon` and upgrading changes nothing unless you opt in. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…session-end `config show` listed the daemon_socket unconditionally but never showed trace_mode, so a user who set session-end couldn't confirm it and the socket line read as "a daemon is configured/running" when it's just a path string. Show `trace_mode` (env override > settings > default daemon) with its source, and annotate daemon_socket as "(unused in session-end mode)" when daemonless. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The socket path alone doesn't tell you whether a daemon is actually running. Probe it and annotate: "(daemon running)", "(stale socket — daemon not responding)", or "(no daemon running)" in daemon mode; "(unused in session-end mode)" when daemonless. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`createConfig` unconditionally wrote a fresh settings.json with null weave_project / wandb_api_key and trace_mode=daemon, so re-running (or a failed) `weave-claude-code install` on an already-configured machine destroyed the user's credentials and mode. Make it convergent: read the existing file and preserve every user-controllable field (project, api key, agent_name, debug, trace_mode, custom paths, installed_at); only the version is refreshed. Fresh or unreadable configs still get safe defaults. Found live while dogfooding the daemonless install. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…anscripts Two bugs surfaced dogfooding the daemonless hook on a real agent-teams session: 1. Scope gate fail-closed and dropped EVERYTHING when trace_roots was set, because the SessionEnd payload omits cwd (interactive sessions). Derive cwd from the transcript (Claude Code stamps it on transcript lines) when the payload lacks it, so scope works regardless of payload shape. 2. Each teammate session fires its own SessionEnd, which would build a standalone duplicate trace on top of being nested in the coordinator's build. Skip transcripts under a /subagents/ dir — the parent coordinator captures them. Regression tests for both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…agent.id The Agents view reads token totals and session time off the turn-root / invoke_agent spans, not by summing chat children — the builder left those spans with no usage and ~0 duration, so the UI showed "0 tokens" and "4ms". Also left gen_ai.agent.id empty on sub-agents. - setUsageAttrs() stamps aggregate gen_ai.usage.* (OTel input total + output + cache + reasoning) on the turn root (turn.totalUsage) and each sub-agent invoke_agent span (summed across its turns). - Turn root gets real startTime/endTime from transcript timestamps. - Sub-agent invoke_agent gets gen_ai.agent.id, read from its agent-<id>.jsonl transcript filename. Verified live: turn roots now ~1.5s (was ~0), tokens populate, agent_id set. This is strictly better than the daemon, which also showed tokens=0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-end mode
Two defects surfaced via a real /nest-test trace (coordinator + one teammate
driven across two turns):
1. Re-spawn double-counting. The harness leaves multiple transcripts for a
single teammate (a partial snapshot + the complete one) sharing the same
agentType and no toolUseId; the leftover-merge pass appended both, counting
the overlapping turns' tool calls twice (e.g. 8 Bash instead of 5).
listSubagents now drops any teammate transcript whose tool_use-id set is a
subset of another same-type entry's, keeping the most complete.
2. Unattributed child spans. execute_tool and chat spans carried no
gen_ai.agent.name / gen_ai.agent.id, so the Weave Agents view (which groups
by agent identity, not span tree) rendered every tool under the top-level
agent. Thread an owning {agentName, agentId} through emitToolCall /
emitAgentSubtree / emitChatSpansFromAssistantCalls and stamp it on each
child span: coordinator children -> coordinator, teammate children ->
teammate + id.
New tests/nest-test-tree.test.ts asserts the full tree against a real
two-transcript re-spawn fixture: one merged invoke_agent, 5 Bash + 2
SendMessage (not double-counted), distinct non-empty agent_id, and every
child span carrying its owning agent. 78/78 green, tsc clean.
The daemon shares genaiSpans but does not pass owner, so it still emits
unattributed child spans — tracked separately for upstream.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
execute_tool spans were created with no startTime, so buildTrace stamped every tool span with build-time now() at SessionEnd — collapsing all tool calls to a single instant (~minutes after the real session) and destroying execution order. In the UI this showed e.g. turn-2-done before turn-1-done, and tools appeared after all chat spans instead of interleaved. Parse each tool call's real times — assistant-message timestamp (tool_use issued = span start) and tool_result message timestamp (result returned = span end) — and stamp them on the span, mirroring chat spans. End falls back to the start when no result is present, so a missing result gives a near-zero-duration span rather than stretching to now(). New A10 asserts nest-probe's Bash spans have distinct, ordered real timestamps (turn-1 before turn-2), not one build-time instant. 79/79 green. The daemon shares these helpers but passes no startedAt (its spans are created live at the real event time), so its behaviour is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up: agent-teams attribution fixes from live
|
| # | Symptom (Agents view) | Root cause | Fix | Commit |
|---|---|---|---|---|
| 1 | Teammate's tools double-counted (8 Bash instead of 5) | The harness leaves two transcripts for one teammate (a partial snapshot + the complete one), same agentType, no toolUseId. The leftover-merge pass appended both, re-counting the overlapping turn. |
listSubagents drops any teammate transcript whose tool_use-id set is a subset of another same-type entry's (keeps the most complete). |
3ce0104 |
| 2 | Clicking the coordinator and the sub-agent showed the same view; all tools rendered under the top-level agent | execute_tool / chat spans carried no gen_ai.agent.name / agent.id — the Agents view groups by agent identity, not span tree, so unattributed spans rolled up to the root. |
Thread an owning {agentName, agentId} through the emit path; stamp each child span (coordinator children → coordinator, teammate children → teammate + id). |
3ce0104 |
| 3 | turn-2-done appeared before turn-1-done; tools sat after all chats |
execute_tool spans had no startTime → every tool stamped with build-time now() at SessionEnd, collapsing all tools to one instant. |
Parse each tool call's real times (assistant-msg = start, tool_result = end) and stamp them, mirroring chat spans. | 7747b5d |
Tests
- New
tests/nest-test-tree.test.ts(+ real two-transcript re-spawn fixture undertests/fixtures/nest-test/) asserts the full contract A1–A10: one mergedinvoke_agent, nests under coordinator, distinct non-emptyagent_id, 5 Bash + 2 SendMessage (deduped), every child carries its owning agent, distinct ordered real timestamps (turn-1 before turn-2), coordinator-only roots, no orphans. - 79/79 tests green,
tsc --noEmitclean. Code review: no Critical/Important.
Verified live (not a re-upload)
A real /nest-test run uploaded through the session-end hook — conversation 3ac3fd99-f20a-4759-8674-51dc6b5f0b14 — passes 10/10 checks in the Agents store: two distinct agents, correct per-tool attribution, deduped counts, ordered timestamps.
Known limitations (not addressed here)
- Daemon shares
genaiSpansbut passes noowner, so the daemon still emits child spans withoutagent.name/agent.id→ same "tools under the top-level agent" issue in daemon mode. (Daemon tool timestamps are fine — its spans are created live.) Tracked for a separate daemon-side fix. - Weave Conversations turn view lists a turn's tool calls flat and does not surface per-tool agent attribution, even though the span data now carries it (the Spans and Agents views render it correctly). Appears to be a UI-side rendering gap.
…me now()) Sub-agent invoke_agent spans were created via startInvokeAgentSpan with no startTime and ended with bare .end(), so buildTrace stamped them with build-time now() at SessionEnd. A teammate that ran for ~52s at 17:55 showed its invoke_agent wrapper at 18:09 (the build moment) — a 1ms span ~14 minutes AFTER its own children, which all carry real transcript times. The wrapper fell entirely outside its children's time range. Compute each subagent's real [start, end] envelope from its transcript line timestamps (transcriptBounds) and stamp it: startTime at creation, real end when the span is finally closed (tracked per-span in an invokeEnd map; extended across re-spawn merges). This makes every invoke_agent span wrap its children in real time. Per the "use the real timestamp, never fabricate" principle we deliberately do NOT stretch the coordinator turn root to cover async teammates that outlive it — its real end stays its real end. New A11 asserts every invoke_agent span starts at/before its first child and ends at/after its last (real-time envelope, no build-time now()). 80/80 green. Verified live: re-running the fixed build on a real /nest-test transcript, the nest-probe wrapper now spans its true 52s window. Daemon unaffected (its spans are created live at real event time). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Makes Claude Code tracing opt-in and per-repo, with no machine-wide always-on daemon. Today the plugin runs a single global daemon that traces every session on the machine; this PR lets a user opt into a daemonless
session-endmode (trace_mode: "daemon" | "session-end", defaultdaemon) and scope tracing to chosen repos (trace_roots). In session-end mode the daemon is bypassed entirely: atSessionEnda pure function reconstructs the full GenAI-convention span tree from the completed transcript in one pass, uploads, and exits — no daemon, no socket.Why these belong together: honest per-repo scoping is only possible because it's daemonless. A single global daemon is config-blind, so "scope it to my repo" is a false boundary (it keeps tracing everything while you believe it's scoped). Removing the shared process lets each session be evaluated independently from its own cwd — so daemonless is the mechanism, opt-in + per-repo is the point.
Default behavior is byte-identical, so this is non-disruptive, and it removes the daemon-fragility failure class (stale sockets,
Unknown session, restarts) for users who opt in.What changes
trace_mode(new)daemon(default)trace_modesession-endSessionEndhook rebuilds the whole span tree from the transcript and uploads. No daemon process, no socket.The emitted span tree mirrors the daemon's shape: per turn →
invoke_agent <agentName>root →chat <model>+execute_tool <name>(results paired) +invoke_agent <subagent_type>forAgentcalls, recursing subagent transcripts (all turns). Note: the session-end builder additionally stamps each child span with its owninggen_ai.agent.name/agent.idand uses real transcript timestamps on tool andinvoke_agentspans (see "Agent-teams correctness fixes" below). The daemon does not yet do this, so session-end's per-span attribution/timing is now richer than the daemon's — the tree structure is the same, the child-span metadata is not byte-identical. (The defaultdaemonmode itself is untouched and byte-identical to today.)Architecture
Changes
New
src/buildTrace.tsbuildTrace(tracer, transcriptPath, opts). Turn/chat/tool/subagent emission; meta-pool subagent correlation + leftover/re-spawn pass with subset dedup; per-span agent attribution; real transcript-time envelopes; scope gate.src/sessionEnd.tsSessionEndhandler: parse payload, resolve config (env > settings, daemon parity), scope-gate (cwd from transcript), skip subagent transcripts, build, flush. Never throws.src/tracerProvider.tstests/build-trace.test.tstests/nest-test-tree.test.tstests/session-end.test.tstests/parser-tool-calls.test.tstests/install-preserves-config.test.tstests/config-show-trace-mode.test.tsconfig showsurfaces trace_mode + live daemon state.Modified
src/parser.tstoolCalls()(pairs tool_use with tool_result, now with per-call start/result timestamps) andprompt(). Daemon never calls them.src/genaiSpans.tsagentName/agentId(owning agent) +startedAton tool/chat/invoke_agent spans. All optional → daemon output unchanged.src/cli.tssession-endcommand (stdin, hook-safe, never throws) +config set trace_mode+ live daemon-state line inconfig show.src/setup.tsSettings.trace_mode(optional; legacy files read asdaemon); install preserves existing config instead of wiping it.src/utils.tsisCwdInScopescope gate (opt-intrace_roots; empty = global).hooks/hook-handler.shsession-endbranch: bypasses daemon; onlySessionEndruns the builder.Agent-teams correctness fixes
Dogfooding the builder against real agent-teams runs (
/nest-test— coordinator + a teammate across two turns;/triage— coordinator + 6 specialists) surfaced three defects in the Agents view, each fixed + regression-tested intests/nest-test-tree.test.tsand verified against live uploads in the W&B Agents store. All three live in the session-end path / sharedgenaiSpans; default daemon mode is unaffected (it passes none of the new optional args).agentType, notoolUseId. The leftover pass merged both. →listSubagentsdrops any teammate transcript whose tool_use-id set is a subset of a same-type sibling's, keeping the most complete.3ce0104execute_tool/chatspans had nogen_ai.agent.name/agent.id; the Agents view groups by agent identity, not span tree. → thread an owning{agentName, agentId}and stamp each child span.3ce0104turn-2tools appeared beforeturn-1; sub-agentinvoke_agentwrapper stamped ~minutes after its own childrenexecute_toolandinvoke_agentspans had nostartTime→ build-timenow()at SessionEnd. → parse real per-call times (assistant-msg start, tool_result end) and per-sub-agent transcript envelopes; stamp them.7747b5d,79b86eaKnown follow-up (see Known limitations): async teammates can outlive their spawning coordinator turn; we keep real times rather than fabricate a stretched coordinator end. The daemon shares
genaiSpansbut passes no owner/startedAt, so it still emits unattributed child spans — a separate daemon-side change.How we got here
Unknown session, global-always-on). Hardening doesn't remove the fragility class for our use.claude_code.*names → land in Traces, not the Agents view. A GenAI-convention transform is mandatory; can't drop the translation.session-endnever reachesdaemon.ts.meta.jsontoolUseIdonlytoolUseIdin their meta — onlyagentType, which is the team-membername, notsubagent_type. Produced empty subtrees for every team session.toolUseId→ teamname→subagent_type(FIFO) + leftover pass (this PR)invoke_agentso coordinator child-count stays = #Agentcalls (matches daemon).Design decisions
Why opt-in (default stays daemon) instead of replacing the daemon?
Upstream is actively hardening the daemon; default must stay byte-identical so existing users are unaffected and the change is mergeable. Opt-in flippers (e.g. heavy agent-teams users) get the daemonless path and the fragility ends for them.
Why reconstruct at SessionEnd instead of incrementally?
The transcript at session end is the complete, authoritative record. A single pass over it is a pure
transcript → spansfunction — deterministically testable against fixtures and the entire local transcript corpus, with no race against the writer and no cross-event state machine.Why
toolUseId→name→subagent_typeclaim order?Plain
Agent-tool subagent metas carrytoolUseId(exact link). Agent-teams teammate metas don't — theiragentTypeequals the team-membernamefrom theAgentcall, which differs fromsubagent_type. Falling through all three correlates both kinds; FIFO + a leftover pass handle re-spawns.Why route in the hook instead of deleting daemon code?
Smaller, lower-risk diff that keeps the default path provably untouched (
daemon.tsunchanged).hooks.jsonstays static; non-SessionEndevents no-op cheaply in session-end mode.Constraints
trace_modedefaults todaemon; legacy settings files (field absent) read asdaemon. No behavior change for existing users.session-endcommand and hook branch always exit 0 — a tracing failure never disrupts Claude Code.parser.tsadditions are read only by new code;tracerProvider.tsis separate fromdaemon.ts's own tracer init.Test results
Command:
npm run build && npm testOutput:
Additional validation (not in CI):
buildTracerun over all 1,336 local transcripts → 0 errors.claude -psession withtrace_mode=session-end→ its ownSessionEndhook → built + uploaded via live OTLP → verified in the W&B Agents store (turn root with itschat+Bash+Readspans). A second agent-teams/triagetranscript verified all 6 specialists nested as turn-children with full tool subtrees (incl. re-spawns).RESPONSE_MODELon an already-endedinvoke_agentspan (dropped attr); fixed by deferring span-end until after the leftover pass, with a regression test.What this PR does NOT include
weave.permission_request)weave.compaction.*)sessionIdasconversationId; multi-session stitching can be layered on later.trace_rootsCLI subcommandisCwdInScope) ships here, but per-repo scoping is settable only viaWEAVE_TRACE_ROOTSenv /settings.jsonon this branch; theconfig tracesubcommand lives on the scope branch.Known limitations (v1)
These are real and not yet addressed — call them out before relying on this in anger:
SessionEndfires. If the process is hard-killed, the terminal/laptop is closed, or it crashes,SessionEndmay not fire and the whole session's trace is lost (the daemon, by contrast, streams incrementally). Verify Claude Code'sSessionEnd-on-close behavior before depending on it.SessionEndfiring more than once, produces duplicate span trees under the sameconversation_id(each build mints fresh span ids). A dedup key / spool would be needed for safe retries.invoke_agentspan now carries its real time envelope, but because agent-teams teammates run asynchronously they can end after the coordinator turn that spawned them — so a child span may extend past its parent turn's end time. This is faithful to reality (we deliberately do not fabricate a stretched coordinator end), but consumers assuming strict parent-envelopes-child timing across the coordinator→teammate boundary should be aware.isCwdInScopeusespath.resolve, notrealpath, so atrace_rootsof/tmp/xwon't match a cwd reported as/private/tmp/x(macOS). Use realpaths intrace_roots.Planned follow-ups: per-turn
Stop-hook builds (incremental, survives unclean exit) and a disk spool (survives network outage + crash, with dedup) — which together would make daemonless more durable than the daemon.Try it yourself (clean-install cookbook)
End-to-end, the way a user would adopt it. Run in a normal terminal.
Revert to the released daemon build when done:
🤖 Generated with Claude Code