garrytan · mneves75 · Mar 16, 2026 · Mar 16, 2026
diff --git a/.codex/skills/gstack-browse/SKILL.md b/.codex/skills/gstack-browse/SKILL.md
@@ -0,0 +1,25 @@
+---
+name: gstack-browse
+description: Use gstack's compiled browser engine for browsing, UI verification, screenshots, and dogfooding flows.
+---
+
+# gstack-browse
+
+Use this skill when the user says `/browse` or asks to browse a site, click through a flow, capture screenshots, verify UI behavior, or dogfood a web app with gstack's browser engine.
+
+## First steps
+
+1. Read `../../../references/workflows/compatibility.md`.
+2. Read `../../../references/workflows/browse.md`.
+3. If the browser binary is missing, run `../../../setup --host codex`.
+
+## Tool expectations
+
+- Prefer shell execution via `../../../browse/bin/find-browse` or `../../../browse/dist/browse`.
+- Use the compiled gstack browser instead of host-native browser tooling whenever it can do the job.
+- Pull in `../../../BROWSER.md` only when you need deeper command coverage.
+
+## Boundaries
+
+- Switch to `gstack-qa` for a structured QA report.
+- Switch to `gstack-browser-cookies` for cookie/session import.
diff --git a/.codex/skills/gstack-browser-cookies/SKILL.md b/.codex/skills/gstack-browser-cookies/SKILL.md
@@ -0,0 +1,24 @@
+---
+name: gstack-browser-cookies
+description: Import real browser cookies into gstack's browser session for authenticated testing.
+---
+
+# gstack-browser-cookies
+
+Use this skill when the user says `/setup-browser-cookies` or asks to import browser cookies for authenticated testing.
+
+## First steps
+
+1. Read `../../../references/workflows/browser-cookies.md`.
+2. Read `../../../references/workflows/compatibility.md`.
+3. If the browser binary is missing, run `../../../setup --host codex`.
+
+## Tool expectations
+
+- Use the compiled gstack browser CLI for the import.
+- Verify the resulting session against a real authenticated page.
+- Keep secrets out of the transcript.
+
+## Boundaries
+
+- If the user wants broader verification after import, switch to `gstack-qa` or `gstack-browse`.
diff --git a/.codex/skills/gstack-plan-ceo-review/SKILL.md b/.codex/skills/gstack-plan-ceo-review/SKILL.md
@@ -0,0 +1,24 @@
+---
+name: gstack-plan-ceo-review
+description: Pressure-test a plan in founder mode and decide whether to expand, hold, or reduce scope.
+---
+
+# gstack-plan-ceo-review
+
+Use this skill when the user says `/plan-ceo-review` or wants a founder-level product review of a plan, feature, or roadmap item.
+
+## First steps
+
+1. Read `../../../references/workflows/plan-ceo-review.md`.
+2. Read `../../../references/workflows/compatibility.md` if the request used slash-style aliases.
+
+## Tool expectations
+
+- Review the plan, not the code.
+- Pressure-test the premise, ambition level, and 12-month trajectory.
+- Make the scope mode explicit: expansion, hold, or reduction.
+- Use Codex-native user input only when ambition level or success criteria are genuinely ambiguous.
+
+## Boundaries
+
+- Hand off to `gstack-plan-eng-review` when the product direction is locked and technical review is next.
diff --git a/.codex/skills/gstack-plan-eng-review/SKILL.md b/.codex/skills/gstack-plan-eng-review/SKILL.md
@@ -0,0 +1,24 @@
+---
+name: gstack-plan-eng-review
+description: Pressure-test a plan technically for architecture, failure modes, rollout, and tests.
+---
+
+# gstack-plan-eng-review
+
+Use this skill when the user says `/plan-eng-review` or wants technical review of a plan before implementation.
+
+## First steps
+
+1. Read `../../../references/workflows/plan-eng-review.md`.
+2. Read `../../../references/workflows/compatibility.md` if the request used slash-style aliases.
+
+## Tool expectations
+
+- Audit the existing system surface first.
+- Review architecture, failure modes, security, rollout, rollback, and tests.
+- Prefer concrete diagrams and explicit invariants over generic advice.
+- Use Codex-native user input only for real architecture or scope decisions.
+
+## Boundaries
+
+- Do not start implementing in this mode.
diff --git a/.codex/skills/gstack-qa/SKILL.md b/.codex/skills/gstack-qa/SKILL.md
@@ -0,0 +1,26 @@
+---
+name: gstack-qa
+description: Run structured QA passes with gstack's browser engine and report template.
+---
+
+# gstack-qa
+
+Use this skill when the user says `/qa` or asks for a smoke test, systematic QA pass, diff-aware verification, or a structured bug report.
+
+## First steps
+
+1. Read `../../../references/workflows/compatibility.md`.
+2. Read `../../../references/workflows/qa.md`.
+3. Read `../../../qa/templates/qa-report-template.md`.
+4. Read `../../../qa/references/issue-taxonomy.md`.
+5. If the browser binary is missing, run `../../../setup --host codex`.
+
+## Tool expectations
+
+- Drive the session with the compiled gstack browser.
+- Execute the QA pass and write the report; do not stop at analysis.
+- Use Codex-native user input only when auth, CAPTCHA, or a missing target URL blocks progress.
+
+## Boundaries
+
+- If the user only wants browser interaction without a QA report, use `gstack-browse`.
diff --git a/.codex/skills/gstack-retro/SKILL.md b/.codex/skills/gstack-retro/SKILL.md
@@ -0,0 +1,23 @@
+---
+name: gstack-retro
+description: Produce an engineering retrospective grounded in git history and recent delivery outcomes.
+---
+
+# gstack-retro
+
+Use this skill when the user says `/retro` or asks for an engineering retrospective based on recent project activity.
+
+## First steps
+
+1. Read `../../../references/workflows/retro.md`.
+2. Read `../../../references/workflows/compatibility.md` if the request used slash-style aliases.
+
+## Tool expectations
+
+- Ground the retro in git history, open TODOs, and recent delivery outcomes.
+- Separate wins, failures, and follow-ups clearly.
+- Keep praise specific and criticism operational.
+
+## Boundaries
+
+- This mode is for analysis and synthesis, not implementation.
diff --git a/.codex/skills/gstack-review/SKILL.md b/.codex/skills/gstack-review/SKILL.md
@@ -0,0 +1,26 @@
+---
+name: gstack-review
+description: Run a findings-first pre-landing review of the current diff using gstack's review checklist.
+---
+
+# gstack-review
+
+Use this skill when the user says `/review` or wants a findings-first pre-landing review of the current diff.
+
+## First steps
+
+1. Read `../../../references/workflows/review.md`.
+2. Read `../../../review/checklist.md`.
+3. Read `../../../review/TODOS-format.md`.
+4. Read `../../../review/greptile-triage.md` only if Greptile comments are relevant.
+
+## Tool expectations
+
+- Use repo inspection, `git diff`, and shell commands directly.
+- Review the full diff before writing findings.
+- Keep findings ordered by severity with file references.
+- Use Codex-native user input only for real blocking decisions.
+
+## Boundaries
+
+- Do not drift into implementation unless the user explicitly asks for fixes.
diff --git a/.codex/skills/gstack-ship/SKILL.md b/.codex/skills/gstack-ship/SKILL.md
@@ -0,0 +1,25 @@
+---
+name: gstack-ship
+description: Execute deterministic shell-driven release steps for a ready branch.
+---
+
+# gstack-ship
+
+Use this skill when the user says `/ship` or wants deterministic shell-driven release steps for a ready branch.
+
+## First steps
+
+1. Read `../../../references/workflows/ship.md`.
+2. Read `../../../references/workflows/review.md` if a pre-landing review has not already been done.
+
+## Tool expectations
+
+- Execute the repo’s real validation and release commands.
+- Prefer documented wrappers and scripts over ad hoc command chains.
+- Stop on merge conflicts, failing validation, or missing release prerequisites.
+- Use Codex-native user input only when a real release decision is required.
+
+## Boundaries
+
+- Do not ship from `main`.
+- Do not invent project commands that the repo does not actually have.
diff --git a/.gitignore b/.gitignore
@@ -3,6 +3,8 @@ node_modules/
 browse/dist/
 .gstack/
 .claude/skills/
+.codex/skills/gstack
+.agents/skills/gstack
 /tmp/
 *.log
 bun.lock

diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,28 @@
+# gstack for Codex
+
+This repo now has two layers:
+
+- Portable core: `browse/`, the shared build/setup scripts, QA/review assets, and the workflow reference docs under `references/workflows/`.
+- Host glue: `CLAUDE.md` plus top-level `*/SKILL.md` for Claude, and `.codex/skills/*` for Codex.
+
+## Codex rules
+
+- If the user says `/browse`, `/qa`, `/review`, `/ship`, `/plan-ceo-review`, `/plan-eng-review`, `/setup-browser-cookies`, or `/retro`, map that to the corresponding Codex skill:
+  - `/browse` → `gstack-browse`
+  - `/qa` → `gstack-qa`
+  - `/review` → `gstack-review`
+  - `/ship` → `gstack-ship`
+  - `/plan-ceo-review` → `gstack-plan-ceo-review`
+  - `/plan-eng-review` → `gstack-plan-eng-review`
+  - `/setup-browser-cookies` → `gstack-browser-cookies`
+  - `/retro` → `gstack-retro`
+- Prefer the gstack browser binary first for browser automation and QA. Do not reach for host-native browser tooling when `browse/dist/browse` can do the job.
+- If the browser binary is missing, run `./setup --host codex` from the gstack root before doing browser work.
+- Codex skills live under `.codex/skills/`. Their detailed workflow contracts live under `references/workflows/`.
+- Claude-only assets stay in `CLAUDE.md` and the top-level workflow directories. Do not rewrite those when a Codex-only change will solve the problem.
+
+## Editing guidance
+
+- Keep the browser runtime host-neutral. Path resolution and installation may branch by host, but the CLI/server behavior should stay shared.
+- When workflow behavior changes, update the shared workflow reference in `references/workflows/` first, then touch Codex skills and Claude templates only where host-specific wording differs.
+- Do not add Claude-only tool names or `.claude/skills` paths to `.codex/skills/*`.
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -1,10 +1,15 @@
 # Architecture
 
-This document explains **why** gstack is built the way it is. For setup and commands, see CLAUDE.md. For contributing, see CONTRIBUTING.md.
+This document explains **why** gstack is built the way it is. For Codex behavior, see AGENTS.md. For Claude-specific setup and commands, see CLAUDE.md. For contributing, see CONTRIBUTING.md.
 
 ## The core idea
 
-gstack gives Claude Code a persistent browser and a set of opinionated workflow skills. The browser is the hard part — everything else is Markdown.
+gstack gives coding agents a persistent browser and a set of opinionated workflow skills. The browser is the hard part — everything else is Markdown.
+
+The repo is now split into a portable core plus host-specific glue:
+
+- Portable core: `browse/`, shared setup/build scripts, QA/review references, and `references/workflows/*`
+- Host glue: top-level `*/SKILL.md` plus `CLAUDE.md` for Claude, `.codex/skills/*` plus `AGENTS.md` for Codex
 
 The key insight: an AI agent interacting with a browser needs **sub-second latency** and **persistent state**. If every command cold-starts a browser, you're waiting 3-5 seconds per tool call. If the browser dies between commands, you lose cookies, tabs, and login sessions. So gstack runs a long-lived Chromium daemon that the CLI talks to over localhost HTTP.
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -4,7 +4,12 @@ Thanks for wanting to make gstack better. Whether you're fixing a typo in a skil
 
 ## Quick start
 
-gstack skills are Markdown files that Claude Code discovers from a `skills/` directory. Normally they live at `~/.claude/skills/gstack/` (your global install). But when you're developing gstack itself, you want Claude Code to use the skills *in your working tree* — so edits take effect instantly without copying or deploying anything.
+gstack now has two runtime surfaces:
+
+- Claude: top-level workflow directories plus `CLAUDE.md`
+- Codex: `.codex/skills/*` plus `AGENTS.md`
+
+Codex does not need a special dev-mode shim when you are working in this repo: the repo-local `.codex/skills/` tree is already part of the checkout. Claude still uses dev mode so its `skills/` loader points at your working tree instead of a copied global install.
 
 That's what dev mode does. It symlinks your repo into the local `.claude/skills/` directory so Claude Code reads skills straight from your checkout.
 
@@ -62,6 +67,8 @@ bin/dev-teardown
 
 ## Testing & evals
 
+Static packaging checks are host-agnostic. Claude E2E and LLM eval tiers remain Claude-specific because they shell out to `claude -p` or call Anthropic directly.
+
 ### Setup
 
 ```bash
@@ -79,14 +86,16 @@ Bun auto-loads `.env` — no extra config. Conductor workspaces inherit `.env` f
 
 | Tier | Command | Cost | What it tests |
 |------|---------|------|---------------|
-| 1 — Static | `bun test` | Free | Command validation, snapshot flags, SKILL.md correctness, TODOS-format.md refs, observability unit tests |
-| 2 — E2E | `bun run test:e2e` | ~$3.85 | Full skill execution via `claude -p` subprocess |
+| 1 — Static | `bun test` | Free | Command validation, snapshot flags, SKILL.md correctness, Codex packaging checks, TODOS-format.md refs, observability unit tests |
+| 2 — E2E (Claude) | `bun run test:e2e` | ~$3.85 | Full skill execution via `claude -p` subprocess |
+| 2 — E2E (Codex) | `bun run test:codex-e2e` | variable | Codex noninteractive smoke against repo-local `.codex/skills/*` |
 | 3 — LLM eval | `bun run test:evals` | ~$0.15 standalone | LLM-as-judge scoring of generated SKILL.md docs |
 | 2+3 | `bun run test:evals` | ~$4 combined | E2E + LLM-as-judge (runs both) |
 
 ```bash
 bun test                     # Tier 1 only (runs on every commit, <5s)
 bun run test:e2e             # Tier 2: E2E only (needs EVALS=1, can't run inside Claude Code)
+bun run test:codex-e2e       # Tier 2: Codex E2E only (needs CODEX_EVALS=1)
 bun run test:evals           # Tier 2 + 3 combined (~$4/run)
 ```
 
@@ -96,6 +105,7 @@ Runs automatically with `bun test`. No API keys needed.
 
 - **Skill parser tests** (`test/skill-parser.test.ts`) — Extracts every `$B` command from SKILL.md bash code blocks and validates against the command registry in `browse/src/commands.ts`. Catches typos, removed commands, and invalid snapshot flags.
 - **Skill validation tests** (`test/skill-validation.test.ts`) — Validates that SKILL.md files reference only real commands and flags, and that command descriptions meet quality thresholds.
+- **Codex packaging tests** (`test/codex-compat.test.ts`) — Validates that `.codex/skills/*` exist, avoid Claude-only references, and that the compatibility alias wrappers emit the correct skill names.
 - **Generator tests** (`test/gen-skill-docs.test.ts`) — Tests the template system: verifies placeholders resolve correctly, output includes value hints for flags (e.g. `-d <N>` not just `-d`), enriched descriptions for key commands (e.g. `is` lists valid states, `press` lists key examples).
 
 ### Tier 2: E2E via `claude -p` (~$3.85/run)
@@ -114,6 +124,19 @@ EVALS=1 bun test test/skill-e2e.test.ts
 - Saves full NDJSON transcripts and failure JSON for debugging
 - Tests live in `test/skill-e2e.test.ts`, runner logic in `test/helpers/session-runner.ts`
 
+### Tier 2b: E2E via `codex exec --json`
+
+Runs a small noninteractive smoke suite against the repo-local Codex skills.
+
+```bash
+CODEX_EVALS=1 bun test test/codex-e2e.test.ts
+```
+
+- Gated by `CODEX_EVALS=1`
+- Uses `codex exec --json` instead of `claude -p`
+- Validates that repo-local `.codex/skills/*` load cleanly and can drive basic shell-backed workflows
+- Tests live in `test/codex-e2e.test.ts`, runner logic in `test/helpers/codex-session-runner.ts`
+
 ### E2E observability
 
 When E2E tests run, they produce machine-readable artifacts in `~/.gstack-dev/`:
@@ -166,6 +189,8 @@ Tests run against the browse binary directly — they don't require dev mode.
 
 SKILL.md files are **generated** from `.tmpl` templates. Don't edit the `.md` directly — your changes will be overwritten on the next build.
 
+This applies to the Claude-facing top-level workflow skills. The Codex-facing `.codex/skills/*/SKILL.md` files are maintained directly and should stay short; the durable shared behavior belongs in `references/workflows/*`.
+
 ```bash
 # 1. Edit the template
 vim SKILL.md.tmpl              # or browse/SKILL.md.tmpl