Skip to content

add PLAN_ROLLOUT proposal — PR-stack-aware planning#1192

Closed
mastermanas805 wants to merge 2 commits into
garrytan:mainfrom
mastermanas805:proposal/plan-rollout-skill
Closed

add PLAN_ROLLOUT proposal — PR-stack-aware planning#1192
mastermanas805 wants to merge 2 commits into
garrytan:mainfrom
mastermanas805:proposal/plan-rollout-skill

Conversation

@mastermanas805
Copy link
Copy Markdown

@mastermanas805 mastermanas805 commented Apr 24, 2026

What's the problem

gstack has great plan-time skills (/plan-eng-review, /plan-ceo-review) and great ship-time skills (/ship, /review). There's a gap between them. Nothing asks "is this one PR or three?" or "in what order should these units ship?"

The pain is LLM-specific. A single Claude Code session produces a 2,000-line diff across 15 files. Reviewers drown. Scope creep hides. Bugs ship under LGTM pressure. I feel this every time I use LLMs for non-trivial work. I suspect others do too.

What I'm proposing

/plan-rollout runs after plan approval. Reads the plan plus SYSTEM.md (a new repo-root semantic contract graph) plus the discovered import graph. Produces decomposition.md (PR stack with reader guides, dep ordering, time-budget estimates) and rollout.md (rollout strategy with inverse rollback auto-generated per step).

/spill-check runs during implementation. Compares the current diff against the declared PR unit. Flags undeclared files. Adaptive: strict for code, soft for infra/meta files like CLAUDE.md, package.json, bun.lock.

SYSTEM.md is the interesting primitive. Human-declared role contracts (auth mints session tokens middleware enforces; breaks if format changes without middleware redeploy; rollout-edge hard). Separate from the package/import graph, which the LLM discovers at runtime via AST and grep. Reconciled jointly: declared contracts give the why, discovered imports give the what, disagreements surface for human resolution.

Does it actually work

Dogfooded the design end to end against honojs/hono#4633 (405 Method Not Allowed). Authored SYSTEM.md for Hono's 8 components. Decomposed the issue into a 3-PR stack with graceful dep relaxation (PR-3 can merge without PR-2 via feature detection on an optional interface method).

Implemented what would be PR-1 locally. 171 LOC, 3 files, 86/86 tests pass, zero regressions across the 4 router implementations not touched.

8 design gaps surfaced during the dogfood. All folded into v1 scope. Highlights:

  • SYSTEM.md needs a kind field (component | leaf-util | types-only) so shared utility dirs don't force awkward fits.
  • Rollout templates need a package-type field because library rollouts (npm publish + revert) differ materially from service rollouts (coordinated deploy + state restore).
  • Reviewer-time formula needs real calibration data. v1 ships conservative defaults and logs predicted-vs-actual so v2 can do better.

The 4 PRs if this lands

  1. Foundation: lib/plan-rollout/system-map-*.ts + tests + docs/SYSTEM-MD.md. Standalone, no skills modified.
  2. /plan-rollout skill + the helpers the SKILL.md calls. Depends on docs: add README and CLAUDE.md #1.
  3. /spill-check skill + spill classifier. Independent of refactor: reorganize codebase into modular structure #2.
  4. Integration into /ship, /review, /plan-ceo-review, /plan-eng-review. Zero-regression gated on decomposition.md existence.

~75 min cumulative review time. PR-1 is low-risk standalone and should land first.

What I want from you

Does this shape fit gstack? In-tree or separate plugin? Any expansions I've got wrong? Convention checks: artifacts in ~/.gstack/projects/ vs .gstack/ in-repo? SYSTEM.md format?

If this is a nope, tell me, saves us both time. If it's yes-but-shape-it, I'll rework. If yes, I'll open the 4-PR stack.

No rush. I know the queue is deep.

mastermanas805 and others added 2 commits April 25, 2026 00:44
Proposes two new skills + a declarative schema to address the gap between
plan approval and shipping:

- /plan-rollout: decomposes an approved plan into a reviewable PR stack
  and a rollout plan. Outputs decomposition.md + rollout.md consumed by
  /ship, /review, /spill-check, /land-and-deploy.
- /spill-check: detects scope creep mid-implementation by comparing the
  current diff against the declared PR unit.
- SYSTEM.md: repo-root declarative semantic contract graph — components,
  roles, role-level contracts with rollout-edge semantics. Reconciled
  against the LLM-discovered import graph at runtime.

Includes a CEO plan (full spec), SKILL.md drafts, schema documentation,
usage guide, integration notes for /ship and /review, and a TypeScript
parser stub.

The design was stress-tested end-to-end by simulating the workflow
against honojs/hono issue #4633. 8 concrete design gaps surfaced by the
dogfood are folded into v1 scope; documented in the CEO plan.

Filing as a proposal doc in docs/designs/ to get directional feedback
before opening the 4-PR implementation stack — see the attached issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 changed the title docs(designs): add PLAN_ROLLOUT proposal — PR-stack-aware planning add PLAN_ROLLOUT proposal — PR-stack-aware planning Apr 24, 2026
@mastermanas805
Copy link
Copy Markdown
Author

@garrytan Requesting your suggestion

@garrytan
Copy link
Copy Markdown
Owner

Thanks @mastermanas805 — closing as deferred. PLAN_ROLLOUT is a 2164-line proposal warranting standalone focused review. Happy to revisit if scoped down to a focused slice.

@garrytan garrytan closed this May 10, 2026
mastermanas805 added a commit to mastermanas805/gstack that referenced this pull request May 10, 2026
… docs

First of the four PRs proposed in garrytan#1192. Pure foundation — library code and
documentation, no skills added, no existing behavior changed.

What lands:

- lib/plan-rollout/types.ts
  Shared types for the plan-rollout family: Component, Contract, SystemMap,
  ImportEdge, ReconcileFlag, SystemMapParseError. Stable public surface so
  the follow-up /plan-rollout and /spill-check skills can depend on it.

- lib/plan-rollout/system-map-parser.ts
  Parses a SYSTEM.md file (YAML frontmatter + markdown body), validates the
  schema, normalizes field names (rollout-order → rolloutOrder, breaks-if →
  breaksIf, rollout-edge → rolloutEdge). Throws SystemMapParseError with a
  specific .reason field on every validation failure, suitable for surfacing
  in user prompts. Exports componentForFile (longest-prefix match) and
  rolloutOrder (group components into ship tiers).

- lib/plan-rollout/system-map-reconcile.ts
  Takes a parsed SystemMap and a discovered ImportEdge[]; returns
  ReconcileFlag[] covering three categories — import-without-contract,
  contract-without-imports, rollout-order-inversion. Pure function, no I/O.
  Leaf-util and types-only components are excluded. note: runtime-only
  and note: legacy suppress the contract-without-imports flag.

- lib/plan-rollout/system-map-scaffolder.ts
  Walks a repo's src/ subdirectories (or top-level dirs if no src/), proposes
  one component per source-containing directory, classifies utils/lib/helpers
  as leaf-util and types/typings as types-only, infers a starting role from
  package.json description or README first paragraph. Writes SYSTEM.md.draft;
  never overwrites SYSTEM.md itself (refuses or drops to .draft via force).

- docs/SYSTEM-MD.md
  Full schema documentation with worked example, field reference,
  reconciliation rules, and relationship to CLAUDE.md / CODEOWNERS.

- test/plan-rollout/ (46 tests, all passing)
  parser.test.ts — valid fixtures, invalid fixtures (missing role, duplicate
  names, bad rollout-edge, unknown with, self-referential), error paths
  (missing frontmatter, malformed YAML, wrong version, empty components,
  non-integer rollout-order, invalid kind), CRLF handling, narrative
  preservation, componentForFile longest-prefix and prefix-not-a-subdir
  cases, rolloutOrder bucketing.

  reconcile.test.ts — all three flag categories, leaf-util exclusion,
  runtime-only suppression, bidirectional contract check, rollout-order
  inversion including equal-order no-flag, irrelevant-edge ignore.

  scaffolder.test.ts — draft generation from a fixture repo, role inference
  from README, TODO fallback, SYSTEM.md existence refusal, force behavior,
  custom outputPath, ignore rules for node_modules/dist/dotfiles,
  fallback when no src/, draft-cannot-be-parsed-directly property.

- Adds yaml@^2.8.3 as a runtime dependency (first use in gstack; needed for
  SYSTEM.md frontmatter parsing).

Zero changes to existing code. All existing tests still pass; the
pre-existing skill-validation.test.ts failure on main is unrelated to this
change (it's about compiled binaries in git).

Refs: garrytan#1192

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805 added a commit to mastermanas805/gstack that referenced this pull request May 10, 2026
Schema spec for the SYSTEM.md primitive — human-declared role contracts
between components, distinct from the package/import graph (which is
discovered at runtime).

This PR is doc-only: 214 lines, no code, no tests, no dependencies. Library
code (parser, reconciler, scaffolder) and consuming skills follow in
subsequent PRs gated on the primitive landing first.

Refs: garrytan#1192

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805
Copy link
Copy Markdown
Author

mastermanas805 commented May 11, 2026

Child PR: #1424/plan-rollout MVP skill + SYSTEM.md schema.

Size vs the 2164-line proposal here (after the final compression pass):

  • Raw diff: 1206 lines (~44% smaller)
  • Substantive (human-authored): 310 lines (~7x smaller)
  • The 896-line gap is the generated plan-rollout/SKILL.md — deterministic output of the template, reviewer skim

Per OSS PR-size research (SmartBear/Cisco, Google internal), review effectiveness drops sharply beyond 400 changed lines. Substantive content here is well inside the healthy band.

Scope reductions from the original proposal:

  • Dropped /spill-check skill (deferred to v1.1+)
  • Dropped system-map-parser.ts (skill prose handles parsing via bash for now)
  • Dropped CEO-plan, integration-notes, usage docs
  • Dropped the standalone V0 design doc — dogfood evidence moved into the PR description where reviewers actually look
  • Kept: one skill that runs, schema it consumes, registry entries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants