Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
6cf424f
feat: add escalation protocol to preamble — all skills get DONE/BLOCK…
garrytan Mar 16, 2026
e3d36b6
feat: add verification gate to /ship (Step 6.5) — no push without fre…
garrytan Mar 16, 2026
b4c3326
feat: add scope drift detection + verification of claims to /review
garrytan Mar 16, 2026
e716363
feat: mandatory implementation alternatives + design doc lookup in /p…
garrytan Mar 16, 2026
50d0bc2
feat: design doc lookup in /plan-eng-review + fix branch name sanitiz…
garrytan Mar 16, 2026
4c0a3fe
feat: new /brainstorm and /debug skills
garrytan Mar 16, 2026
ba3c2de
test: structural tests for new skills + escalation protocol assertions
garrytan Mar 16, 2026
69fcf2d
chore: bump version and changelog (v0.6.0)
garrytan Mar 16, 2026
f5b981f
fix: resolve merge conflicts with origin/main (v0.4.2 base branch det…
garrytan Mar 16, 2026
9a55efe
fix: resolve merge conflicts with origin/main (v0.4.3 + v0.4.4)
garrytan Mar 17, 2026
95e4430
fix: resolve merge conflicts with origin/main (v0.5.0 + v0.4.5)
garrytan Mar 17, 2026
efa519b
chore: rename /brainstorm → /office-hours across references
garrytan Mar 17, 2026
1bf5c00
feat: YC Office Hours — dual-mode product diagnostic + builder brains…
garrytan Mar 17, 2026
111a332
test: add 14 assertions for YC Office Hours content coverage
garrytan Mar 17, 2026
46b6ded
fix: resolve merge conflicts with origin/main (v0.5.1 + v0.5.2)
garrytan Mar 17, 2026
b071df3
fix: resolve merge conflicts with origin/main (v0.6.0 + v0.6.0.1 + v0…
garrytan Mar 17, 2026
8b61067
docs: update project documentation for v0.6.1
garrytan Mar 17, 2026
106c8f0
fix: resolve merge conflicts with origin/main (v0.6.1 qa-design-revie…
garrytan Mar 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,8 @@ Read the philosophy: https://garryslist.org/posts/boil-the-ocean
- **Ship with confidence — coverage audit shows what's tested and what's not.** `/ship` Step 3.4 builds a code path map from your diff, searches for corresponding tests, and produces an ASCII coverage diagram with quality stars (★★★ = edge cases + errors, ★★ = happy path, ★ = smoke test). Gaps get tests auto-generated. PR body shows "Tests: 42 → 47 (+5 new)".
- **Your retro tracks test health.** `/retro` now shows total test files, tests added this period, regression test commits, and trend deltas. If test ratio drops below 20%, it flags it as a growth area.
- **Design reviews generate regression tests too.** `/qa-design-review` Phase 8e.5 skips CSS-only fixes (those are caught by re-running the design audit) but writes tests for JavaScript behavior changes like broken dropdowns or animation failures.
- **New `/office-hours` skill — YC Office Hours before you build.** Two modes: startup founders get six forcing questions that push toward radical honesty about demand, users, and product (demand reality, status quo, desperate specificity, narrowest wedge, observation & surprise, future-fit). Everyone else — hackathons, side projects, open source, learning — gets an enthusiastic design thinking partner. It asks you upfront what you're here for and adapts. Both modes write a design doc that feeds into `/plan-ceo-review` and `/plan-eng-review`.
- **New `/debug` skill — find the root cause, not the symptom.** Systematic debugging with an Iron Law: no fixes without root cause investigation first. Traces data flow, matches against known bug patterns, tests hypotheses one at a time. If 3 fixes fail, it stops and questions the architecture instead of thrashing.

### For contributors

Expand Down
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ gstack/
├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
├── plan-eng-review/ # /plan-eng-review skill
├── office-hours/ # /office-hours skill (YC Office Hours — startup diagnostic + builder brainstorm)
├── debug/ # /debug skill (systematic root-cause debugging)
├── retro/ # Retrospective skill
├── document-release/ # /document-release skill (post-ship doc updates)
├── setup # One-time setup: build binary + symlink skills
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ project where you actually felt the pain.

### Session awareness

When you have 3+ gstack sessions open simultaneously, every question tells you which project, which branch, and what's happening. No more staring at a question thinking "wait, which window is this?" The format is consistent across all 13 skills.
When you have 3+ gstack sessions open simultaneously, every question tells you which project, which branch, and what's happening. No more staring at a question thinking "wait, which window is this?" The format is consistent across all 15 skills.

## Working on gstack inside the gstack repo

Expand Down
25 changes: 23 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In the last 60 days I have written **over 600,000 lines of production code** —

Same person. Different era. The difference is the tooling.

**gstack is how I do it.** It is my open source software factory. It turns Claude Code into a virtual engineering team you actually manage — a CEO who rethinks the product, an eng manager who locks the architecture, a designer who catches AI slop, a paranoid reviewer who finds production bugs, a QA lead who opens a real browser and clicks through your app, and a release engineer who ships the PR. Thirteen specialists, all as slash commands, all Markdown, **all free, MIT license, available right now.**
**gstack is how I do it.** It is my open source software factory. It turns Claude Code into a virtual engineering team you actually manage — a CEO who rethinks the product, an eng manager who locks the architecture, a designer who catches AI slop, a paranoid reviewer who finds production bugs, a QA lead who opens a real browser and clicks through your app, and a release engineer who ships the PR. Fifteen specialists, all as slash commands, all Markdown, **all free, MIT license, available right now.**

I am learning how to get to the edge of what agentic systems can do as of March 2026, and this is my live experiment. I am sharing it because I want the whole world on this journey with me.

Expand Down Expand Up @@ -123,6 +123,8 @@ One feature. Seven commands. The agent reframed the product, ran an 80-item desi
| `/design-review` | **Designer Who Codes** | Same audit as /plan-design-review, then fixes what it finds. Atomic commits, before/after screenshots. |
| `/setup-browser-cookies` | **Session Manager** | Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages. |
| `/retro` | **Eng Manager** | Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. |
| `/office-hours` | **YC Office Hours** | Two modes. Startup: six forcing questions on demand, users, and product. Builder: brainstorming for side projects, hackathons, and learning. Writes a design doc. |
| `/debug` | **Debugger** | Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes. |
| `/document-release` | **Technical Writer** | Update all project docs to match what you just shipped. Catches stale READMEs automatically. |

**[Deep dives with examples and philosophy for every skill →](docs/skills.md)**
Expand All @@ -147,6 +149,25 @@ gstack is powerful with one session. It is transformative with ten.

One person, ten parallel agents, each with the right cognitive mode. That is a different way of building software.

---

## Install — takes 30 seconds

**Requirements:** [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Git](https://git-scm.com/), [Bun](https://bun.sh/) v1.0+

### Step 1: Install on your machine

Open Claude Code and paste this. Claude does the rest.

> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /debug, /document-release. Then ask the user if they also want to add gstack to the current project so teammates get it.

### Step 2: Add to your repo so teammates get it (optional)

> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /debug, /document-release, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.

Real files get committed to your repo (not a submodule), so `git clone` just works. Everything lives inside `.claude/`. Nothing touches your PATH or runs in the background.

---
## Come ride the wave

This is **free, MIT licensed, open source, available now.** No premium tier. No waitlist. No strings.
Expand All @@ -157,7 +178,7 @@ Same tools, different outcome — because gstack gives you structured roles and

The models are getting better fast. The people who figure out how to work with them now — really work with them, not just dabble — are going to have a massive advantage. This is that window. Let's go.

Thirteen specialists. All slash commands. All Markdown. All free. **[github.com/garrytan/gstack](https://github.com/garrytan/gstack)** — MIT License
Fifteen specialists. All slash commands. All Markdown. All free. **[github.com/garrytan/gstack](https://github.com/garrytan/gstack)** — MIT License

> **We're hiring.** Want to ship 10K+ LOC/day and help harden gstack?
> Come work at YC — [ycombinator.com/software](https://ycombinator.com/software)
Expand Down
25 changes: 25 additions & 0 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,31 @@ Hey gstack team — ran into this while using /{skill-name}:

Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"

## Completion Status Protocol

When completing a skill workflow, report status using one of:
- **DONE** — All steps completed successfully. Evidence provided for each claim.
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.

### Escalation

It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."

Bad work is worse than no work. You will not be penalized for escalating.
- If you have attempted a task 3 times without success, STOP and escalate.
- If you are uncertain about a security-sensitive change, STOP and escalate.
- If the scope of work exceeds what you can verify, STOP and escalate.

Escalation format:
```
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
```

# gstack browse: QA Testing & Dogfooding

Persistent headless Chromium. First call auto-starts (~3s), then ~100-200ms per command.
Expand Down
20 changes: 16 additions & 4 deletions TODOS.md
Original file line number Diff line number Diff line change
Expand Up @@ -408,13 +408,25 @@
**Priority:** P3
**Depends on:** Ref staleness Parts 1+2 (shipped)

## Design Review
## Office Hours / Design

### Design docs → Supabase team store sync

**What:** Add design docs (`*-design-*.md`) to the Supabase sync pipeline alongside test plans, retro snapshots, and QA reports.

**Why:** Cross-team design discovery at scale. Local `~/.gstack/projects/$SLUG/` keyword-grep discovery works for same-machine users now, but Supabase sync makes it work across the whole team. Duplicate ideas surface, everyone sees what's been explored.

**Context:** /office-hours writes design docs to `~/.gstack/projects/$SLUG/`. The team store already syncs test plans, retro snapshots, QA reports. Design docs follow the same pattern — just add a sync adapter.

### /design-consultation interactive skill — SHIPPED
**Effort:** S
**Priority:** P2
**Depends on:** `garrytan/team-supabase-store` branch landing on main

## Design Review

~~**What:** Interactive skill that walks user through creating a DESIGN.md from scratch.~~
### /plan-design-review + /qa-design-review + /design-consultation — SHIPPED

Shipped as `/design-consultation` on garrytan/design branch. Renamed from `/setup-design-md` to reflect the consultant approach (agent proposes a complete coherent system, user adjusts). Includes competitive research via WebSearch, combined font+color preview page, coherence validation, and LLM-judged E2E tests.
Shipped as v0.5.0 on main. Includes `/plan-design-review` (report-only design audit), `/qa-design-review` (audit + fix loop), and `/design-consultation` (interactive DESIGN.md creation). `{{DESIGN_METHODOLOGY}}` resolver provides shared 80-item design audit checklist.

## Document-Release

Expand Down
25 changes: 25 additions & 0 deletions browse/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,31 @@ Hey gstack team — ran into this while using /{skill-name}:

Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"

## Completion Status Protocol

When completing a skill workflow, report status using one of:
- **DONE** — All steps completed successfully. Evidence provided for each claim.
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.

### Escalation

It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."

Bad work is worse than no work. You will not be penalized for escalating.
- If you have attempted a task 3 times without success, STOP and escalate.
- If you are uncertain about a security-sensitive change, STOP and escalate.
- If the scope of work exceeds what you can verify, STOP and escalate.

Escalation format:
```
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
```

# browse: QA Testing & Dogfooding

Persistent headless Chromium. First call auto-starts (~3s), then ~100ms per command.
Expand Down
Loading