Skip to content

feat(dashboard): 每会话「对话次数」与「token 用量」指标 (#75)#78

Merged
jeff-r2026 merged 2 commits into
mainfrom
feat/conversation-token-metrics
Jun 30, 2026
Merged

feat(dashboard): 每会话「对话次数」与「token 用量」指标 (#75)#78
jeff-r2026 merged 2 commits into
mainfrom
feat/conversation-token-metrics

Conversation

@jeff-r2026

Copy link
Copy Markdown
Collaborator

Summary

Closes #75. 在保留现有 Human Intervention 指标(#34 / #70)的基础上,新增两个更朴素的会话指标:

  • 对话次数:每个会话里人类发了几轮 prompt(统计 UserPromptSubmit 事件数)。
  • Token 用量input / output / cacheRead / cacheCreation 分桶 + 总量,来自 Claude Code transcript 的 message.usage

⚠️ 关键:Claude Code 同一 turn(同一 message.id)会跨多行 JSONL 重复出现且 usage 相同,累加时message.id 去重,否则 token 严重虚高。

Changes

文件 改动
src/types.ts 新增 TokenUsage / SessionMetrics + token 工具函数;扩展 DashboardEvent.tokensDashboardSession.promptCount/tokensUserStats.prompts/tokens
src/dashboard-collector.ts scanTranscriptStop 一次扫描同时取干预+token(按 message.id 去重);rebuildSessions 输出 promptCount/tokens;新增 aggregateSessionMetrics,旧 aggregateSessionInterventions 改为委托保持兼容
src/team-push.ts 独立幂等快照 reported-prompt-tokens.json + computePromptTokenDelta/mergePromptTokenStats,聚合到 stats/<user>.yamlprompts/tokens
src/digest.ts summarizeConversation + formatTokenCount,digest 新增「对话量与 Token 用量」板块
src/dashboard-html.ts 会话卡片新增 💬 N(对话轮数)与 ⛁ X(token,悬停看明细)徽标
docs/usage-guide.md 补充两个新指标说明

设计要点:

  • 干预指标的幂等上报路径完全未改动,旧测试零修改全绿。
  • 无 transcript 的工具(如 Cursor)优雅降级:仍统计对话轮数,token 为 0 / N/A。
  • 隐私:只统计次数/数量,不落地任何 prompt 或 transcript 原文。

Test plan

  • npx tsc --noEmit
  • npx vitest run ✅ 128 files / 1679 passed(新增 conversation-token-metrics.test.ts 17 例)
  • npm run test:e2e ✅ 7 files / 46 passed(新增 conversation-token-e2e.test.ts,含 Cursor 降级用例)
  • 真实端到端:用 claude-haiku-4-5 跑真实会话产出 transcript → 用构建出的二进制当生产钩子处理 → 得到 对话次数=2tokens={input:15,output:230,cacheRead:40796,cacheCreation:23028},与独立去重计算逐字段一致;真实数据里该消息重复 3 次但只计 1 次,去重在生产数据上确认有效。

🤖 Generated with Claude Code

Made with Cursor

)

Add two session metrics alongside the existing Human Intervention metric:
- conversation turns: number of UserPromptSubmit events per session
- token usage: input/output/cache tokens summed from the Claude Code
  transcript, deduplicated by message.id (one turn repeats the same usage
  across content-block lines, so naive summing over-counts badly)

Surfaced on dashboard cards (💬 / ⛁ badges with hover breakdown), aggregated
idempotently into stats/<user>.yaml (prompts/tokens) during `teamai pull` via a
separate reported-prompt-tokens.json snapshot, and summarized in `teamai digest`.
Tools without a transcript (e.g. Cursor) degrade gracefully to prompt-count only.
Privacy: counts only, no prompt or transcript text is persisted.

Tests: unit (conversation-token-metrics) + e2e (conversation-token-e2e, incl.
Cursor degradation). Verified end-to-end against a real claude-haiku-4-5 session.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jeff-r2026

Copy link
Copy Markdown
Collaborator Author

建议:会话被 compact 清出后再 resume,prompts(对话轮数)会被少计

问题prompts 由 events.jsonl 中的 prompt_submit 事件累计,而 rebuildSessions 会丢弃闲置 >30min 的会话(dashboard-collector.tselapsed > DASHBOARD_STALE_TIMEOUT_MS),compactEvents 随即把该会话的全部事件从 events.jsonl 删除。但 reported-prompt-tokens.json 的幂等基线仍停留在 compact 前的高水位。

失败场景

  1. 会话 S1 已上报 prompts=10,本地基线 reported-prompt-tokens.jsonS1=10
  2. S1 闲置 >30min,被 compactEvents 从 events.jsonl 整段清出。
  3. 用户用同一 session_id resume(claude --resume 复用 session id 与 transcript),继续发 prompt。
  4. 此时 aggregateSessionMetrics 算出的 cur.prompts 只剩 resume 后的少量事件(如 2),computePromptTokenDeltadelta = max(0, 2 − 10) = 0
  5. → resume 后、直到 prompt 数重新超过旧高水位 10 之前的所有对话轮都永不上报,team 统计长期偏低。

关键不对称tokens 不受此影响——它来自 Stop 时对完整 transcript 的累计快照,而 transcript 不会被 compact,故 cur.tokens 始终 ≥ 旧基线、delta 正确。问题仅出在「用可被 compact 的事件计数」来支撑一个跨 compaction 的幂等基线。

建议:让 prompts 也走「Stop 时的累计快照」口径(与 tokens 一致,在 Stop 事件里携带该会话累计 prompt 数),而不是依赖会被 compact 清掉的逐条 prompt_submit 事件;这样基线与当前值同源、跨 compaction/resume 都自洽。

次要(低优先):scanTranscriptStop 中 assistant 消息若带 usagemessage.id 缺失/非字符串,该轮 token 会被整段丢弃;真实 Claude Code 正常回合都带 id,触发面窄,可视情况加无 id 兜底。

Address PR review: `prompts` was accumulated from prompt_submit events, which
compactEvents removes once a session goes stale. After a same-session resume the
live count drops below the reported baseline in reported-prompt-tokens.json, so
the delta clamps to 0 and post-resume turns are never reported.

Make prompts compaction/resume-proof the same way tokens already are: count
genuine human turns from the full transcript at Stop time and carry that
cumulative snapshot on the Stop event. aggregateSessionMetrics now takes
max(live submit count, latest Stop snapshot) — live count covers the pre-Stop
window, the snapshot is the durable baseline.

Also: fall back to top-level requestId when message.id is missing for token
dedup, so a turn without a message id is no longer dropped entirely.

Tests: prompt counting (excludes tool_results/interrupts/meta/sidechain),
requestId dedup fallback, and a compaction+resume regression proving the delta
still reports new turns.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jeff-r2026

Copy link
Copy Markdown
Collaborator Author

已修复 (3b3caf5),感谢这条很到位的分析 🙏

修法:把 prompts 改成与 tokens 同源——在 Stop 时从完整 transcript 统计真实人类对话轮数,并作为累计快照携带在 Stop 事件上。aggregateSessionMetricsmax(live submit count, 最近一次 Stop 快照)

  • Stop 之前用 prompt_submit 实时计数(活跃会话徽标体验不变);
  • Stop 之后以 transcript 快照为准——它不会被 compact,claude --resume 复用同一 transcript 时持续增长,故 cur.prompts 始终 ≥ 旧基线,delta 正确,resume 后的对话轮不再丢报

transcript 计数口径:仅计真实人类 turn,排除 tool_result[Request interrupted by user] 中断、以及 isMeta / isSidechain 注入条目。

次要项也一并处理:token 去重在 message.id 缺失时回退用顶层 requestId,避免该回合 token 被整段丢弃。

新增测试:prompt 计数(排除上述非人类条目)、requestId 去重回退、以及一个 compaction + 同会话 resume 回归用例(基线 10 → resume 后快照 12 → delta=2,证明新轮仍被上报)。全量 unit 128 文件/1688 + e2e 7 文件/46 全绿。

@jeff-r2026 jeff-r2026 merged commit 52cad9a into main Jun 30, 2026
7 checks passed
@jeff-r2026 jeff-r2026 deleted the feat/conversation-token-metrics branch June 30, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(dashboard): 统计每个会话的「人工对话次数」与「token 用量」

1 participant