feat(cli-analytics): detect AI agents driving the CLI#3891
Conversation
Adds detectAgent() — the differentiator — which flags whether an AI agent (Claude Code, Cursor, Codex, Gemini CLI, …) or a human is invoking the CLI, and which agent, from named env vars. Named detection is precision-first; CI/TTY are collected separately so they don't pollute the agent metric. Generated-By: PostHog Code Task-Id: b56cf1bb-97f1-4daf-a65c-01a2cdb12eaa
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 No Changeset FoundThis PR doesn't include a changeset. A changeset is required to release a new version. How to add a changesetRun this command and follow the prompts: pnpm changesetRemember: Never use |
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Prompt To Fix All With AIFix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
packages/cli-analytics/src/__tests__/agent-detection.test.ts:83-87
**Missing CI var in `detectCi` test**
`CONTINUOUS_INTEGRATION` is listed in `CI_ENV_VARS` in `agent-detection.ts` but absent from the `it.each` test cases here. If the var is ever removed from the source array, no test would catch the regression.
### Issue 2 of 4
packages/cli-analytics/src/__tests__/agent-detection.test.ts:5-31
**Secondary vars in multi-var signatures are untested**
`NAMED_AGENT_SIGNATURES` contains additional env vars beyond the first entry for several agents (`CURSOR_TRACE_ID` for `cursor`, `CODEX_CI` for `codex`, `OPENCODE` for `opencode`), but no test case covers those secondary vars. The parameterised suite only exercises the first var of each signature, so a typo in a secondary var would go undetected.
### Issue 3 of 4
packages/cli-analytics/src/extensions/agent-detection.ts:37
**`REPL_ID` is a broad env var that could false-positive**
`REPL_ID` was chosen as the Replit signal, but this variable name is generic enough that other REPL frameworks, test runners, or internal tooling could set it for unrelated purposes. The Replit docs expose `REPLIT_CLUSTER`, `REPLIT_DB_URL`, and similar vars that are more narrowly scoped to their platform; any of those would be a safer anchor.
### Issue 4 of 4
packages/cli-analytics/src/extensions/agent-detection.ts:31-32
**Generic `AGENT` var collides with CI/CD "build agent" convention**
Many CI systems (Jenkins, TeamCity, Buildkite) set an `AGENT` environment variable to describe the build agent pool or hostname. A truthy but non-AI value would still produce `{ isAgent: true, source: 'env_var' }`, mis-classifying a human developer running the CLI inside such a job. Adding a guard that skips the generic fallback when a CI var from `CI_ENV_VARS` is present would reduce false positives.
Reviews (1): Last reviewed commit: "feat(cli-analytics): detect AI agents dr..." | Re-trigger Greptile |
|
|
||
| it('named env vars win over heuristics', () => { | ||
| expect(detectAgent({ CLAUDECODE: '1', TERM: 'dumb' }, { isTty: false, includeHeuristics: true })).toEqual({ | ||
| isAgent: true, | ||
| agentName: 'claude_code', |
There was a problem hiding this comment.
Missing CI var in
detectCi test
CONTINUOUS_INTEGRATION is listed in CI_ENV_VARS in agent-detection.ts but absent from the it.each test cases here. If the var is ever removed from the source array, no test would catch the regression.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/cli-analytics/src/__tests__/agent-detection.test.ts
Line: 83-87
Comment:
**Missing CI var in `detectCi` test**
`CONTINUOUS_INTEGRATION` is listed in `CI_ENV_VARS` in `agent-detection.ts` but absent from the `it.each` test cases here. If the var is ever removed from the source array, no test would catch the regression.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| describe('named agents via env vars', () => { | ||
| const cases: Array<[string, Record<string, string>, AgentInfo]> = [ | ||
| [ | ||
| 'Claude Code (CLAUDECODE)', | ||
| { CLAUDECODE: '1' }, | ||
| { isAgent: true, agentName: 'claude_code', source: 'env_var' }, | ||
| ], | ||
| [ | ||
| 'Claude Code (entrypoint)', | ||
| { CLAUDE_CODE_ENTRYPOINT: 'cli' }, | ||
| { isAgent: true, agentName: 'claude_code', source: 'env_var' }, | ||
| ], | ||
| ['Cursor', { CURSOR_AGENT: '1' }, { isAgent: true, agentName: 'cursor', source: 'env_var' }], | ||
| ['Codex', { CODEX_SANDBOX: 'seatbelt' }, { isAgent: true, agentName: 'codex', source: 'env_var' }], | ||
| ['Gemini CLI', { GEMINI_CLI: '1' }, { isAgent: true, agentName: 'gemini_cli', source: 'env_var' }], | ||
| ['Augment', { AUGMENT_AGENT: '1' }, { isAgent: true, agentName: 'augment', source: 'env_var' }], | ||
| ['Cline', { CLINE_ACTIVE: 'true' }, { isAgent: true, agentName: 'cline', source: 'env_var' }], | ||
| ['OpenCode', { OPENCODE_CLIENT: '1' }, { isAgent: true, agentName: 'opencode', source: 'env_var' }], | ||
| ['Replit', { REPL_ID: 'abc' }, { isAgent: true, agentName: 'replit', source: 'env_var' }], | ||
| ] | ||
|
|
||
| it.each(cases)('detects %s', (_label, env, expected) => { | ||
| expect(detectAgent(env, { isTty: true })).toEqual(expected) | ||
| }) | ||
| }) | ||
|
|
||
| describe('generic AGENT / AI_AGENT', () => { |
There was a problem hiding this comment.
Secondary vars in multi-var signatures are untested
NAMED_AGENT_SIGNATURES contains additional env vars beyond the first entry for several agents (CURSOR_TRACE_ID for cursor, CODEX_CI for codex, OPENCODE for opencode), but no test case covers those secondary vars. The parameterised suite only exercises the first var of each signature, so a typo in a secondary var would go undetected.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/cli-analytics/src/__tests__/agent-detection.test.ts
Line: 5-31
Comment:
**Secondary vars in multi-var signatures are untested**
`NAMED_AGENT_SIGNATURES` contains additional env vars beyond the first entry for several agents (`CURSOR_TRACE_ID` for `cursor`, `CODEX_CI` for `codex`, `OPENCODE` for `opencode`), but no test case covers those secondary vars. The parameterised suite only exercises the first var of each signature, so a typo in a secondary var would go undetected.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
|
||
| export interface DetectAgentOptions { | ||
| /** Whether the process is attached to an interactive terminal. Defaults to `process.stdout.isTTY`. */ | ||
| isTty?: boolean |
There was a problem hiding this comment.
REPL_ID is a broad env var that could false-positive
REPL_ID was chosen as the Replit signal, but this variable name is generic enough that other REPL frameworks, test runners, or internal tooling could set it for unrelated purposes. The Replit docs expose REPLIT_CLUSTER, REPLIT_DB_URL, and similar vars that are more narrowly scoped to their platform; any of those would be a safer anchor.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/cli-analytics/src/extensions/agent-detection.ts
Line: 37
Comment:
**`REPL_ID` is a broad env var that could false-positive**
`REPL_ID` was chosen as the Replit signal, but this variable name is generic enough that other REPL frameworks, test runners, or internal tooling could set it for unrelated purposes. The Replit docs expose `REPLIT_CLUSTER`, `REPLIT_DB_URL`, and similar vars that are more narrowly scoped to their platform; any of those would be a safer anchor.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
|
||
| /** Env var whose presence indicates a CI runner — captured separately from agent. */ |
There was a problem hiding this comment.
Generic
AGENT var collides with CI/CD "build agent" convention
Many CI systems (Jenkins, TeamCity, Buildkite) set an AGENT environment variable to describe the build agent pool or hostname. A truthy but non-AI value would still produce { isAgent: true, source: 'env_var' }, mis-classifying a human developer running the CLI inside such a job. Adding a guard that skips the generic fallback when a CI var from CI_ENV_VARS is present would reduce false positives.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/cli-analytics/src/extensions/agent-detection.ts
Line: 31-32
Comment:
**Generic `AGENT` var collides with CI/CD "build agent" convention**
Many CI systems (Jenkins, TeamCity, Buildkite) set an `AGENT` environment variable to describe the build agent pool or hostname. A truthy but non-AI value would still produce `{ isAgent: true, source: 'env_var' }`, mis-classifying a human developer running the CLI inside such a job. Adding a guard that skips the generic fallback when a CI var from `CI_ENV_VARS` is present would reduce false positives.
How can I resolve this? If you propose a fix, please make it concise.|
Size Change: +10.2 kB (+0.06%) Total Size: 17 MB
ℹ️ View Unchanged
|
|
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |

Problem
Changes
Release info Sub-libraries affected
Libraries affected
Checklist
If releasing new changes
pnpm changesetto generate a changeset file🤖 Agent context
Autonomy: Human-driven (agent-assisted) — or — Fully autonomous
Created with PostHog Code