feat(dashboard): 每会话「对话次数」与「token 用量」指标 (#75)#78
Conversation
) Add two session metrics alongside the existing Human Intervention metric: - conversation turns: number of UserPromptSubmit events per session - token usage: input/output/cache tokens summed from the Claude Code transcript, deduplicated by message.id (one turn repeats the same usage across content-block lines, so naive summing over-counts badly) Surfaced on dashboard cards (💬 / ⛁ badges with hover breakdown), aggregated idempotently into stats/<user>.yaml (prompts/tokens) during `teamai pull` via a separate reported-prompt-tokens.json snapshot, and summarized in `teamai digest`. Tools without a transcript (e.g. Cursor) degrade gracefully to prompt-count only. Privacy: counts only, no prompt or transcript text is persisted. Tests: unit (conversation-token-metrics) + e2e (conversation-token-e2e, incl. Cursor degradation). Verified end-to-end against a real claude-haiku-4-5 session. Co-authored-by: Cursor <cursoragent@cursor.com>
建议:会话被 compact 清出后再 resume,
|
Address PR review: `prompts` was accumulated from prompt_submit events, which compactEvents removes once a session goes stale. After a same-session resume the live count drops below the reported baseline in reported-prompt-tokens.json, so the delta clamps to 0 and post-resume turns are never reported. Make prompts compaction/resume-proof the same way tokens already are: count genuine human turns from the full transcript at Stop time and carry that cumulative snapshot on the Stop event. aggregateSessionMetrics now takes max(live submit count, latest Stop snapshot) — live count covers the pre-Stop window, the snapshot is the durable baseline. Also: fall back to top-level requestId when message.id is missing for token dedup, so a turn without a message id is no longer dropped entirely. Tests: prompt counting (excludes tool_results/interrupts/meta/sidechain), requestId dedup fallback, and a compaction+resume regression proving the delta still reports new turns. Co-authored-by: Cursor <cursoragent@cursor.com>
|
已修复 (3b3caf5),感谢这条很到位的分析 🙏 修法:把
transcript 计数口径:仅计真实人类 turn,排除 次要项也一并处理:token 去重在 新增测试:prompt 计数(排除上述非人类条目)、requestId 去重回退、以及一个 compaction + 同会话 resume 回归用例(基线 10 → resume 后快照 12 → delta=2,证明新轮仍被上报)。全量 unit 128 文件/1688 + e2e 7 文件/46 全绿。 |
Summary
Closes #75. 在保留现有 Human Intervention 指标(#34 / #70)的基础上,新增两个更朴素的会话指标:
UserPromptSubmit事件数)。input/output/cacheRead/cacheCreation分桶 + 总量,来自 Claude Code transcript 的message.usage。Changes
src/types.tsTokenUsage/SessionMetrics+ token 工具函数;扩展DashboardEvent.tokens、DashboardSession.promptCount/tokens、UserStats.prompts/tokenssrc/dashboard-collector.tsscanTranscriptStop一次扫描同时取干预+token(按 message.id 去重);rebuildSessions输出promptCount/tokens;新增aggregateSessionMetrics,旧aggregateSessionInterventions改为委托保持兼容src/team-push.tsreported-prompt-tokens.json+computePromptTokenDelta/mergePromptTokenStats,聚合到stats/<user>.yaml的prompts/tokenssrc/digest.tssummarizeConversation+formatTokenCount,digest 新增「对话量与 Token 用量」板块src/dashboard-html.ts💬 N(对话轮数)与⛁ X(token,悬停看明细)徽标docs/usage-guide.md设计要点:
Test plan
npx tsc --noEmit✅npx vitest run✅ 128 files / 1679 passed(新增conversation-token-metrics.test.ts17 例)npm run test:e2e✅ 7 files / 46 passed(新增conversation-token-e2e.test.ts,含 Cursor 降级用例)claude-haiku-4-5跑真实会话产出 transcript → 用构建出的二进制当生产钩子处理 → 得到对话次数=2、tokens={input:15,output:230,cacheRead:40796,cacheCreation:23028},与独立去重计算逐字段一致;真实数据里该消息重复 3 次但只计 1 次,去重在生产数据上确认有效。🤖 Generated with Claude Code
Made with Cursor