Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
6674ca0
Add Copilot CLI agent research and probe script
khaong Mar 2, 2026
e56b14f
Add Copilot CLI E2E agent runner
khaong Mar 2, 2026
3e28095
Implement Copilot CLI agent package with hooks and lifecycle support
khaong Mar 2, 2026
789dbf7
Fix Copilot CLI E2E runner: use --allow-all and default to haiku
khaong Mar 2, 2026
fc24fc6
Fix spurious task checkpoints for agents without SubagentStart hook
khaong Mar 2, 2026
08af246
Fix interactive E2E tests for Copilot CLI
khaong Mar 2, 2026
66bb6e1
Fix tmux env option ordering for BSD compatibility
khaong Mar 2, 2026
7a63533
Switch Copilot CLI E2E tests to gpt-4.1 (zero premium requests)
khaong Mar 2, 2026
2a90c86
fix(copilot): DetectPresence delegates to AreHooksInstalled (#590)
peyton-alt Mar 3, 2026
74e8a4b
feat(copilot): implement TranscriptAnalyzer for JSONL transcripts (#592)
peyton-alt Mar 3, 2026
bcaa467
Merge remote-tracking branch 'origin/main' into alex/add-copilot-agent
khaong Mar 4, 2026
862e10a
fix(copilot): address PR review feedback
khaong Mar 4, 2026
ba22b0b
fix(copilot): improve error handling and add test coverage
khaong Mar 4, 2026
793ff5f
Merge branch 'main' into alex/add-copilot-agent
khaong Mar 4, 2026
1df71eb
fix(strategy): detect COPILOT_CLI env var in hasTTY()
khaong Mar 4, 2026
010194c
fix(e2e): add concurrency gate for Copilot CLI tests
khaong Mar 4, 2026
50f61d9
copilot e2e
squishykid Mar 4, 2026
dbe1f50
Merge remote-tracking branch 'origin/main' into alex/add-copilot-agent
khaong Mar 5, 2026
2c79112
Merge branch 'main' into alex/add-copilot-agent
Soph Mar 5, 2026
b3688e4
removed unnecessary test, improved comment
Soph Mar 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/e2e-isolated.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ on:
required: true
default: "gemini-cli"
type: choice
options: [claude-code, opencode, gemini-cli, cursor-cli, factoryai-droid]
options: [claude-code, opencode, gemini-cli, cursor-cli, factoryai-droid, copilot-cli]
test:
description: "Test name filter (regex)"
required: true
Expand Down Expand Up @@ -40,6 +40,7 @@ jobs:
gemini-cli) npm install -g @google/gemini-cli ;;
cursor-cli) curl https://cursor.com/install -fsS | bash ;;
factoryai-droid) curl -fsSL https://app.factory.ai/cli | sh ;;
copilot-cli) npm install -g @github/copilot ;;
esac
echo "$HOME/.local/bin" >> $GITHUB_PATH

Expand All @@ -49,6 +50,7 @@ jobs:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
FACTORY_API_KEY: ${{ secrets.FACTORY_API_KEY }}
COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
run: go run ./e2e/bootstrap

- name: Run isolated test
Expand All @@ -57,6 +59,7 @@ jobs:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
FACTORY_API_KEY: ${{ secrets.FACTORY_API_KEY }}
COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
E2E_ARTIFACT_DIR: ${{ github.workspace }}/e2e-artifacts
E2E_ENTIRE_BIN: /usr/local/bin/entire
run: |
Expand Down
6 changes: 5 additions & 1 deletion .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ on:
- gemini-cli
- factoryai-droid
- cursor-cli
- copilot-cli
push:
branches:
- main
Expand All @@ -40,7 +41,7 @@ jobs:
if [ -n "$input" ]; then
echo "agents=[\"$input\"]" >> "$GITHUB_OUTPUT"
else
echo 'agents=["claude-code","opencode","gemini-cli","factoryai-droid","cursor-cli"]' >> "$GITHUB_OUTPUT"
echo 'agents=["claude-code","opencode","gemini-cli","factoryai-droid","cursor-cli","copilot-cli"]' >> "$GITHUB_OUTPUT"
fi

e2e-tests:
Expand Down Expand Up @@ -70,6 +71,7 @@ jobs:
gemini-cli) npm install -g @google/gemini-cli ;;
cursor-cli) curl https://cursor.com/install -fsS | bash ;;
factoryai-droid) curl -fsSL https://app.factory.ai/cli | sh ;;
copilot-cli) npm install -g @github/copilot ;;
esac
echo "$HOME/.local/bin" >> $GITHUB_PATH

Expand All @@ -79,6 +81,7 @@ jobs:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
FACTORY_API_KEY: ${{ secrets.FACTORY_API_KEY }}
COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
run: go run ./e2e/bootstrap

- name: Run E2E Tests
Expand All @@ -87,6 +90,7 @@ jobs:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
FACTORY_API_KEY: ${{ secrets.FACTORY_API_KEY }}
COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_GITHUB_TOKEN }}
E2E_CONCURRENT_TEST_LIMIT: ${{ matrix.agent == 'gemini-cli' && '6' || matrix.agent == 'factoryai-droid' && '1' || '' }}
run: mise run test:e2e --agent ${{ matrix.agent }}

Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,4 @@ cmd/entire/entire
docs/requirements
docs/plans
docs/reviews

tmp/
204 changes: 204 additions & 0 deletions cmd/entire/cli/agent/copilotcli/AGENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Copilot CLI — Integration One-Pager

## Verdict: COMPATIBLE

Copilot CLI has a complete hook system with 8 hook types, JSONL transcripts, and session management. All required lifecycle events map cleanly.

## Static Checks

| Check | Result | Notes |
|-------|--------|-------|
| Binary present | PASS | `/opt/homebrew/bin/copilot` |
| Help available | PASS | Full `--help` with subcommands |
| Version info | PASS | `0.0.420` |
| Hook keywords | PASS | 8 hook types in `.github/hooks/*.json` |
| Session keywords | PASS | `--resume`, `--continue`, `--share` |
| Config directory | PASS | `~/.copilot/` (user-level), `.github/hooks/` (repo-level) |
| Documentation | PASS | docs.github.com/en/copilot/reference/hooks-configuration |

## Binary

- Name: `copilot`
- Version: `0.0.420` (GA as of 2026-02-25)
- Install: `brew install copilot-cli` or `npm install -g @github/copilot`

## Hook Mechanism

- Config file: `.github/hooks/*.json` (all JSON files in directory are auto-discovered)
- Our file: `.github/hooks/entire.json` (dedicated file, avoids conflicts)
- Config format: JSON
- Hook registration: Array of hook entries per event name, each with `type: "command"` and `bash` field

### Hook Config Format

```json
{
"version": 1,
"hooks": {
"hookName": [
{
"type": "command",
"bash": "entire hooks copilot-cli hook-name"
}
]
}
}
```

Note: Uses `bash` key (not `command` like Claude Code/Gemini). Also supports `powershell` for Windows. Each entry can have optional `cwd`, `timeoutSec` (default 30), `env`, and `comment` fields.

### Hook Names and Event Mapping

| Native Hook Name | When It Fires | Stdin Payload Fields | Entire EventType |
|-----------------|---------------|---------------------|-----------------|
| `userPromptSubmitted` | User submits a prompt | `timestamp`, `cwd`, `sessionId`, `prompt` | `TurnStart` |
| `sessionStart` | Agent session begins/resumes | `timestamp`, `cwd`, `sessionId`, `source`, `initialPrompt` | `SessionStart` |
| `agentStop` | Agent finishes a turn | `timestamp`, `cwd`, `sessionId`, `transcriptPath`, `stopReason` | `TurnEnd` |
| `sessionEnd` | Session completed/terminated | `timestamp`, `cwd`, `sessionId`, `reason` | `SessionEnd` |
| `subagentStop` | Subagent finishes | `timestamp`, `cwd`, `sessionId`, *(TBD — needs capture)* | `SubagentEnd` |
| `preToolUse` | Before tool execution | *(TBD — needs capture)* | *(pass-through)* |
| `postToolUse` | After tool execution | *(TBD — needs capture)* | *(pass-through)* |
| `errorOccurred` | Error during execution | *(TBD — needs capture)* | *(pass-through)* |

**Event ordering quirk:** `userPromptSubmitted` fires BEFORE `sessionStart` on the first prompt. This matches Claude Code's behavior and the framework's session phase state machine handles it correctly (TurnStart can arrive before SessionStart).

**Valid Entire EventTypes:** `SessionStart`, `TurnStart`, `TurnEnd`, `Compaction`, `SessionEnd`, `SubagentStart`, `SubagentEnd`

### Hook Input Payloads (Captured)

**userPromptSubmitted:**
```json
{"timestamp":1771480081360,"cwd":"/path/to/repo","sessionId":"b0ff98c0-8e01-4b73-bf92-9649b139931b","prompt":"hi"}
```

**sessionStart:**
```json
{"timestamp":1771480081383,"cwd":"/path/to/repo","sessionId":"b0ff98c0-8e01-4b73-bf92-9649b139931b","source":"new","initialPrompt":"hi"}
```

**agentStop:**
```json
{"timestamp":1771480085412,"cwd":"/path/to/repo","sessionId":"b0ff98c0-8e01-4b73-bf92-9649b139931b","transcriptPath":"/home/user/.copilot/session-state/b0ff98c0-8e01-4b73-bf92-9649b139931b/events.jsonl","stopReason":"end_turn"}
```

**sessionEnd:**
```json
{"timestamp":1771480085425,"cwd":"/path/to/repo","sessionId":"b0ff98c0-8e01-4b73-bf92-9649b139931b","reason":"complete"}
```

## Transcript

- Location: `~/.copilot/session-state/<sessionId>/events.jsonl`
- Format: JSONL (one JSON object per line)
- Session ID extraction: `sessionId` field in hook payloads (UUID format)
- Session ID format: UUID (`b0ff98c0-8e01-4b73-bf92-9649b139931b`)

### Transcript Entry Schema

Each line has: `type`, `data`, `id` (UUID), `timestamp` (ISO 8601), `parentId`

**Event types observed:**
- `session.start` — Session metadata (sessionId, version, producer, copilotVersion, context)
- `user.message` — User messages (`content`, `transformedContent`, `attachments`, `interactionId`)
- `assistant.turn_start` — Start of assistant turn (`turnId`, `interactionId`)
- `assistant.message` — Assistant response (`content`, `toolRequests[]`, `reasoningText`)
- `tool.execution_complete` — Tool result (`toolCallId`, `toolTelemetry.properties.filePaths` (JSON-encoded string array), `linesAdded`, `linesRemoved`)
- `assistant.turn_end` — End of assistant turn (`turnId`)

**Example entries:**
```jsonl
{"type":"session.start","data":{"sessionId":"4de47255-...","version":1,"producer":"copilot-agent","copilotVersion":"0.0.420","startTime":"2026-03-02T01:11:45.379Z","context":{"cwd":"/path","gitRoot":"/path","branch":"main","repository":"org/repo"}},"id":"c1c1...","timestamp":"2026-03-02T01:11:45.421Z","parentId":null}
{"type":"user.message","data":{"content":"exit","transformedContent":"...","attachments":[],"interactionId":"6cd0..."},"id":"20fa...","timestamp":"...","parentId":"00ac..."}
{"type":"assistant.turn_start","data":{"turnId":"0","interactionId":"6cd0..."},"id":"3b61...","timestamp":"...","parentId":"20fa..."}
{"type":"assistant.message","data":{"messageId":"a078...","content":"To exit...","toolRequests":[],"interactionId":"6cd0...","reasoningText":"..."},"id":"c892...","timestamp":"...","parentId":"aaeb..."}
{"type":"assistant.turn_end","data":{"turnId":"0"},"id":"0ae4...","timestamp":"...","parentId":"4989..."}
```

### Tool Usage in Transcripts

The `assistant.message` entries have a `toolRequests` array with tool call IDs. After each tool executes, a `tool.execution_complete` event is emitted with `toolTelemetry.properties.filePaths`, where `filePaths` is a string containing a JSON array of file paths modified by that tool call. The `TranscriptAnalyzer` implementation parses this `filePaths` property to extract modified files for checkpoint metadata.

**Note:** Token usage is NOT available in the Copilot CLI JSONL format. The transcript events do not include input/output token counts or cost information, so `TokenCalculator` is not implemented.

### Transcript Position

Position = line count (JSONL format). Use `agent.ChunkJSONL()` / `agent.ReassembleJSONL()` for chunking.

### TranscriptAnalyzer

The `TranscriptAnalyzer` interface is implemented for Copilot CLI, providing:
- `GetTranscriptPosition` — counts JSONL lines (lightweight, no JSON parsing)
- `ExtractModifiedFilesFromOffset` — collects `filePaths` from `tool.execution_complete` events after a given line offset
- `ExtractPrompts` — collects `content` from `user.message` events
- `ExtractSummary` — returns the `content` of the last `assistant.message` event

## Session State Directory

```
~/.copilot/session-state/<sessionId>/
├── events.jsonl # Transcript (JSONL)
├── workspace.yaml # Workspace metadata (cwd, git info, summary)
├── checkpoints/ # Agent's own checkpoints
├── files/ # File snapshots
├── research/ # Research data
└── rewind-snapshots/ # Agent's rewind snapshots
```

## Config Preservation

- Hook config is in `.github/hooks/*.json` — each file is auto-discovered
- We create a **dedicated** `.github/hooks/entire.json` file, leaving other hook files untouched
- No need for read-modify-write of existing files
- If `entire.json` already exists, read-modify-write to preserve any user additions

## CLI Flags

- Non-interactive prompt: `copilot -p "prompt" --allow-all-tools`
- Interactive with initial prompt: `copilot -i "prompt"`
- Interactive mode: `copilot`
- Resume most recent: `copilot --continue`
- Resume specific: `copilot --resume <sessionId>`
- Autopilot (no confirmations): `copilot --autopilot`
- Config directory: `--config-dir <dir>` (default: `~/.copilot`)
- Relevant env vars: `COPILOT_ALLOW_ALL` (equivalent to `--allow-all-tools`)

## Presence Detection

- No repo-level `.copilot/` directory (unlike other agents)
- `DetectPresence` delegates to `AreHooksInstalled`, which reads `.github/hooks/entire.json` and checks if any hook entry has an Entire command prefix (`entire ` or `go run ${COPILOT_PROJECT_DIR}/cmd/entire/main.go `)
- Simply having a `.github/hooks/` directory is NOT sufficient -- the directory must contain `entire.json` with Entire hook entries
- Alternative: check for `copilot` binary in PATH

## Protected Directories

- `.github` — contains hook configs (committed to repo, unlikely to be affected by rewind)
- No agent-specific repo directory to protect

## Subagent Lifecycle

Copilot CLI has **one-sided** subagent hooks: `subagentStop` fires but `subagentStart` does NOT
(tested by registering the hook — Copilot CLI ignores it). The transcript does contain
`subagent.started` / `subagent.completed` events with `toolCallId`, but these are not
surfaced as hooks.

The `preToolUse` / `postToolUse` hooks fire with `toolName: "task"` for subagent lifecycle,
which could be used as an alternative mechanism for `SubagentStart`/`SubagentEnd` in the future.

Because there is no `SubagentStart` hook, the framework cannot capture pre-task state (untracked
files snapshot). The `handleLifecycleSubagentEnd` dispatcher falls back to the session's
pre-prompt state to avoid spurious task checkpoints from pre-existing untracked files
(e.g., `.github/hooks/entire.json`).

## Gaps & Limitations

- No `Compaction` event — Copilot CLI doesn't appear to have a context compaction hook
- No `SubagentStart` hook — only `subagentStop` fires; framework falls back to pre-prompt state
- `subagentStop` payload fields beyond `timestamp`/`cwd`/`sessionId` not yet fully captured
- `preToolUse` / `postToolUse` payloads not yet leveraged (could provide SubagentStart equivalent)
- Tool request schema in transcripts needs verification with a real tool-using session
- `transcriptPath` only available in `agentStop` hook — `userPromptSubmitted` and `sessionStart` don't include it, so we compute it from `~/.copilot/session-state/<sessionId>/events.jsonl`

## Captured Payloads

- Hook capture logs: `tmp/test-copilot/logs/copilot-hooks.jsonl`
- Local transcript: `~/.copilot/session-state/4de47255-3d43-4938-b8fa-b6e49f1d0aca/events.jsonl`
Loading