Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .codex/skills/gstack-browse/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
name: gstack-browse
description: Use gstack's compiled browser engine for browsing, UI verification, screenshots, and dogfooding flows.
---

# gstack-browse

Use this skill when the user says `/browse` or asks to browse a site, click through a flow, capture screenshots, verify UI behavior, or dogfood a web app with gstack's browser engine.

## First steps

1. Read `../../../references/workflows/compatibility.md`.
2. Read `../../../references/workflows/browse.md`.
3. If the browser binary is missing, run `../../../setup --host codex`.

## Tool expectations

- Prefer shell execution via `../../../browse/bin/find-browse` or `../../../browse/dist/browse`.
- Use the compiled gstack browser instead of host-native browser tooling whenever it can do the job.
- Pull in `../../../BROWSER.md` only when you need deeper command coverage.

## Boundaries

- Switch to `gstack-qa` for a structured QA report.
- Switch to `gstack-browser-cookies` for cookie/session import.
24 changes: 24 additions & 0 deletions .codex/skills/gstack-browser-cookies/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: gstack-browser-cookies
description: Import real browser cookies into gstack's browser session for authenticated testing.
---

# gstack-browser-cookies

Use this skill when the user says `/setup-browser-cookies` or asks to import browser cookies for authenticated testing.

## First steps

1. Read `../../../references/workflows/browser-cookies.md`.
2. Read `../../../references/workflows/compatibility.md`.
3. If the browser binary is missing, run `../../../setup --host codex`.

## Tool expectations

- Use the compiled gstack browser CLI for the import.
- Verify the resulting session against a real authenticated page.
- Keep secrets out of the transcript.

## Boundaries

- If the user wants broader verification after import, switch to `gstack-qa` or `gstack-browse`.
24 changes: 24 additions & 0 deletions .codex/skills/gstack-plan-ceo-review/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: gstack-plan-ceo-review
description: Pressure-test a plan in founder mode and decide whether to expand, hold, or reduce scope.
---

# gstack-plan-ceo-review

Use this skill when the user says `/plan-ceo-review` or wants a founder-level product review of a plan, feature, or roadmap item.

## First steps

1. Read `../../../references/workflows/plan-ceo-review.md`.
2. Read `../../../references/workflows/compatibility.md` if the request used slash-style aliases.

## Tool expectations

- Review the plan, not the code.
- Pressure-test the premise, ambition level, and 12-month trajectory.
- Make the scope mode explicit: expansion, hold, or reduction.
- Use Codex-native user input only when ambition level or success criteria are genuinely ambiguous.

## Boundaries

- Hand off to `gstack-plan-eng-review` when the product direction is locked and technical review is next.
24 changes: 24 additions & 0 deletions .codex/skills/gstack-plan-eng-review/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: gstack-plan-eng-review
description: Pressure-test a plan technically for architecture, failure modes, rollout, and tests.
---

# gstack-plan-eng-review

Use this skill when the user says `/plan-eng-review` or wants technical review of a plan before implementation.

## First steps

1. Read `../../../references/workflows/plan-eng-review.md`.
2. Read `../../../references/workflows/compatibility.md` if the request used slash-style aliases.

## Tool expectations

- Audit the existing system surface first.
- Review architecture, failure modes, security, rollout, rollback, and tests.
- Prefer concrete diagrams and explicit invariants over generic advice.
- Use Codex-native user input only for real architecture or scope decisions.

## Boundaries

- Do not start implementing in this mode.
26 changes: 26 additions & 0 deletions .codex/skills/gstack-qa/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
name: gstack-qa
description: Run structured QA passes with gstack's browser engine and report template.
---

# gstack-qa

Use this skill when the user says `/qa` or asks for a smoke test, systematic QA pass, diff-aware verification, or a structured bug report.

## First steps

1. Read `../../../references/workflows/compatibility.md`.
2. Read `../../../references/workflows/qa.md`.
3. Read `../../../qa/templates/qa-report-template.md`.
4. Read `../../../qa/references/issue-taxonomy.md`.
5. If the browser binary is missing, run `../../../setup --host codex`.

## Tool expectations

- Drive the session with the compiled gstack browser.
- Execute the QA pass and write the report; do not stop at analysis.
- Use Codex-native user input only when auth, CAPTCHA, or a missing target URL blocks progress.

## Boundaries

- If the user only wants browser interaction without a QA report, use `gstack-browse`.
23 changes: 23 additions & 0 deletions .codex/skills/gstack-retro/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
name: gstack-retro
description: Produce an engineering retrospective grounded in git history and recent delivery outcomes.
---

# gstack-retro

Use this skill when the user says `/retro` or asks for an engineering retrospective based on recent project activity.

## First steps

1. Read `../../../references/workflows/retro.md`.
2. Read `../../../references/workflows/compatibility.md` if the request used slash-style aliases.

## Tool expectations

- Ground the retro in git history, open TODOs, and recent delivery outcomes.
- Separate wins, failures, and follow-ups clearly.
- Keep praise specific and criticism operational.

## Boundaries

- This mode is for analysis and synthesis, not implementation.
26 changes: 26 additions & 0 deletions .codex/skills/gstack-review/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
name: gstack-review
description: Run a findings-first pre-landing review of the current diff using gstack's review checklist.
---

# gstack-review

Use this skill when the user says `/review` or wants a findings-first pre-landing review of the current diff.

## First steps

1. Read `../../../references/workflows/review.md`.
2. Read `../../../review/checklist.md`.
3. Read `../../../review/TODOS-format.md`.
4. Read `../../../review/greptile-triage.md` only if Greptile comments are relevant.

## Tool expectations

- Use repo inspection, `git diff`, and shell commands directly.
- Review the full diff before writing findings.
- Keep findings ordered by severity with file references.
- Use Codex-native user input only for real blocking decisions.

## Boundaries

- Do not drift into implementation unless the user explicitly asks for fixes.
25 changes: 25 additions & 0 deletions .codex/skills/gstack-ship/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
name: gstack-ship
description: Execute deterministic shell-driven release steps for a ready branch.
---

# gstack-ship

Use this skill when the user says `/ship` or wants deterministic shell-driven release steps for a ready branch.

## First steps

1. Read `../../../references/workflows/ship.md`.
2. Read `../../../references/workflows/review.md` if a pre-landing review has not already been done.

## Tool expectations

- Execute the repo’s real validation and release commands.
- Prefer documented wrappers and scripts over ad hoc command chains.
- Stop on merge conflicts, failing validation, or missing release prerequisites.
- Use Codex-native user input only when a real release decision is required.

## Boundaries

- Do not ship from `main`.
- Do not invent project commands that the repo does not actually have.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ node_modules/
browse/dist/
.gstack/
.claude/skills/
.codex/skills/gstack
.agents/skills/gstack
/tmp/
*.log
bun.lock
Expand Down
28 changes: 28 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# gstack for Codex

This repo now has two layers:

- Portable core: `browse/`, the shared build/setup scripts, QA/review assets, and the workflow reference docs under `references/workflows/`.
- Host glue: `CLAUDE.md` plus top-level `*/SKILL.md` for Claude, and `.codex/skills/*` for Codex.

## Codex rules

- If the user says `/browse`, `/qa`, `/review`, `/ship`, `/plan-ceo-review`, `/plan-eng-review`, `/setup-browser-cookies`, or `/retro`, map that to the corresponding Codex skill:
- `/browse` → `gstack-browse`
- `/qa` → `gstack-qa`
- `/review` → `gstack-review`
- `/ship` → `gstack-ship`
- `/plan-ceo-review` → `gstack-plan-ceo-review`
- `/plan-eng-review` → `gstack-plan-eng-review`
- `/setup-browser-cookies` → `gstack-browser-cookies`
- `/retro` → `gstack-retro`
- Prefer the gstack browser binary first for browser automation and QA. Do not reach for host-native browser tooling when `browse/dist/browse` can do the job.
- If the browser binary is missing, run `./setup --host codex` from the gstack root before doing browser work.
- Codex skills live under `.codex/skills/`. Their detailed workflow contracts live under `references/workflows/`.
- Claude-only assets stay in `CLAUDE.md` and the top-level workflow directories. Do not rewrite those when a Codex-only change will solve the problem.

## Editing guidance

- Keep the browser runtime host-neutral. Path resolution and installation may branch by host, but the CLI/server behavior should stay shared.
- When workflow behavior changes, update the shared workflow reference in `references/workflows/` first, then touch Codex skills and Claude templates only where host-specific wording differs.
- Do not add Claude-only tool names or `.claude/skills` paths to `.codex/skills/*`.
9 changes: 7 additions & 2 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# Architecture

This document explains **why** gstack is built the way it is. For setup and commands, see CLAUDE.md. For contributing, see CONTRIBUTING.md.
This document explains **why** gstack is built the way it is. For Codex behavior, see AGENTS.md. For Claude-specific setup and commands, see CLAUDE.md. For contributing, see CONTRIBUTING.md.

## The core idea

gstack gives Claude Code a persistent browser and a set of opinionated workflow skills. The browser is the hard part — everything else is Markdown.
gstack gives coding agents a persistent browser and a set of opinionated workflow skills. The browser is the hard part — everything else is Markdown.

The repo is now split into a portable core plus host-specific glue:

- Portable core: `browse/`, shared setup/build scripts, QA/review references, and `references/workflows/*`
- Host glue: top-level `*/SKILL.md` plus `CLAUDE.md` for Claude, `.codex/skills/*` plus `AGENTS.md` for Codex

The key insight: an AI agent interacting with a browser needs **sub-second latency** and **persistent state**. If every command cold-starts a browser, you're waiting 3-5 seconds per tool call. If the browser dies between commands, you lose cookies, tabs, and login sessions. So gstack runs a long-lived Chromium daemon that the CLI talks to over localhost HTTP.

Expand Down
31 changes: 28 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@ Thanks for wanting to make gstack better. Whether you're fixing a typo in a skil

## Quick start

gstack skills are Markdown files that Claude Code discovers from a `skills/` directory. Normally they live at `~/.claude/skills/gstack/` (your global install). But when you're developing gstack itself, you want Claude Code to use the skills *in your working tree* — so edits take effect instantly without copying or deploying anything.
gstack now has two runtime surfaces:

- Claude: top-level workflow directories plus `CLAUDE.md`
- Codex: `.codex/skills/*` plus `AGENTS.md`

Codex does not need a special dev-mode shim when you are working in this repo: the repo-local `.codex/skills/` tree is already part of the checkout. Claude still uses dev mode so its `skills/` loader points at your working tree instead of a copied global install.

That's what dev mode does. It symlinks your repo into the local `.claude/skills/` directory so Claude Code reads skills straight from your checkout.

Expand Down Expand Up @@ -62,6 +67,8 @@ bin/dev-teardown

## Testing & evals

Static packaging checks are host-agnostic. Claude E2E and LLM eval tiers remain Claude-specific because they shell out to `claude -p` or call Anthropic directly.

### Setup

```bash
Expand All @@ -79,14 +86,16 @@ Bun auto-loads `.env` — no extra config. Conductor workspaces inherit `.env` f

| Tier | Command | Cost | What it tests |
|------|---------|------|---------------|
| 1 — Static | `bun test` | Free | Command validation, snapshot flags, SKILL.md correctness, TODOS-format.md refs, observability unit tests |
| 2 — E2E | `bun run test:e2e` | ~$3.85 | Full skill execution via `claude -p` subprocess |
| 1 — Static | `bun test` | Free | Command validation, snapshot flags, SKILL.md correctness, Codex packaging checks, TODOS-format.md refs, observability unit tests |
| 2 — E2E (Claude) | `bun run test:e2e` | ~$3.85 | Full skill execution via `claude -p` subprocess |
| 2 — E2E (Codex) | `bun run test:codex-e2e` | variable | Codex noninteractive smoke against repo-local `.codex/skills/*` |
| 3 — LLM eval | `bun run test:evals` | ~$0.15 standalone | LLM-as-judge scoring of generated SKILL.md docs |
| 2+3 | `bun run test:evals` | ~$4 combined | E2E + LLM-as-judge (runs both) |

```bash
bun test # Tier 1 only (runs on every commit, <5s)
bun run test:e2e # Tier 2: E2E only (needs EVALS=1, can't run inside Claude Code)
bun run test:codex-e2e # Tier 2: Codex E2E only (needs CODEX_EVALS=1)
bun run test:evals # Tier 2 + 3 combined (~$4/run)
```

Expand All @@ -96,6 +105,7 @@ Runs automatically with `bun test`. No API keys needed.

- **Skill parser tests** (`test/skill-parser.test.ts`) — Extracts every `$B` command from SKILL.md bash code blocks and validates against the command registry in `browse/src/commands.ts`. Catches typos, removed commands, and invalid snapshot flags.
- **Skill validation tests** (`test/skill-validation.test.ts`) — Validates that SKILL.md files reference only real commands and flags, and that command descriptions meet quality thresholds.
- **Codex packaging tests** (`test/codex-compat.test.ts`) — Validates that `.codex/skills/*` exist, avoid Claude-only references, and that the compatibility alias wrappers emit the correct skill names.
- **Generator tests** (`test/gen-skill-docs.test.ts`) — Tests the template system: verifies placeholders resolve correctly, output includes value hints for flags (e.g. `-d <N>` not just `-d`), enriched descriptions for key commands (e.g. `is` lists valid states, `press` lists key examples).

### Tier 2: E2E via `claude -p` (~$3.85/run)
Expand All @@ -114,6 +124,19 @@ EVALS=1 bun test test/skill-e2e.test.ts
- Saves full NDJSON transcripts and failure JSON for debugging
- Tests live in `test/skill-e2e.test.ts`, runner logic in `test/helpers/session-runner.ts`

### Tier 2b: E2E via `codex exec --json`

Runs a small noninteractive smoke suite against the repo-local Codex skills.

```bash
CODEX_EVALS=1 bun test test/codex-e2e.test.ts
```

- Gated by `CODEX_EVALS=1`
- Uses `codex exec --json` instead of `claude -p`
- Validates that repo-local `.codex/skills/*` load cleanly and can drive basic shell-backed workflows
- Tests live in `test/codex-e2e.test.ts`, runner logic in `test/helpers/codex-session-runner.ts`

### E2E observability

When E2E tests run, they produce machine-readable artifacts in `~/.gstack-dev/`:
Expand Down Expand Up @@ -166,6 +189,8 @@ Tests run against the browse binary directly — they don't require dev mode.

SKILL.md files are **generated** from `.tmpl` templates. Don't edit the `.md` directly — your changes will be overwritten on the next build.

This applies to the Claude-facing top-level workflow skills. The Codex-facing `.codex/skills/*/SKILL.md` files are maintained directly and should stay short; the durable shared behavior belongs in `references/workflows/*`.

```bash
# 1. Edit the template
vim SKILL.md.tmpl # or browse/SKILL.md.tmpl
Expand Down
Loading