Found by the #528 parity snapshots (Phase D) with a clean-app reset before each run and a frame-count/integration guard.
What
Same wizard (#701), same apps, --harness anthropic vs --harness pi, both linear, both on a freshly-reset pristine app:
| Harness |
15-app-router-saas |
15-app-router-todo |
| anthropic (default) |
✅ full integration (21 frames; instrumentation-client.ts, lib/posthog-server.ts, instrumented pages/routes) |
✅ full (28 frames) |
| pi |
❌ empty (7 frames; deps + .env.local added, skillsComplete: true, but no instrumentation code at all) |
❌ empty (8 frames) |
The pi run reaches "Writing your PostHog setup…", then the flow advances to keep-skills with zero code changes. result.json reports hasPosthogDep: true + skillsComplete: true, so the loose E2E check passes — but the app is not actually instrumented.
Confirmed it's really pi
If SNAP_HARNESS=pi were ignored, the run would fall back to anthropic and fully integrate (as default does). It produced a shallow result instead → pi is the resolved harness and pi is the one under-integrating.
Note — discrepancy
Earlier ad-hoc runs this session showed pi producing deep integrations (Stripe routes, instrumentation). Via the workbench wizard-ci flow it's consistently empty (2/2 apps). Suggests pi is flaky or env-dependent (the workbench run strips CLAUDE*/ANTHROPIC* and drives through a PTY). Needs root-causing before any pi ramp.
Impact
Blocks the epic's core deliverable (pi as a viable second runner). Not a #701-merge blocker — pi is opt-in behind flags and default/anthropic is unaffected — but pi must stay off until fixed. Related: #789 (pi+gpt-5 too slow).
Found by the #528 parity snapshots (Phase D) with a clean-app reset before each run and a frame-count/integration guard.
What
Same wizard (#701), same apps,
--harness anthropicvs--harness pi, both linear, both on a freshly-reset pristine app:instrumentation-client.ts,lib/posthog-server.ts, instrumented pages/routes).env.localadded,skillsComplete: true, but no instrumentation code at all)The pi run reaches "Writing your PostHog setup…", then the flow advances to keep-skills with zero code changes.
result.jsonreportshasPosthogDep: true+skillsComplete: true, so the loose E2E check passes — but the app is not actually instrumented.Confirmed it's really pi
If
SNAP_HARNESS=piwere ignored, the run would fall back to anthropic and fully integrate (as default does). It produced a shallow result instead → pi is the resolved harness and pi is the one under-integrating.Note — discrepancy
Earlier ad-hoc runs this session showed pi producing deep integrations (Stripe routes, instrumentation). Via the workbench
wizard-ciflow it's consistently empty (2/2 apps). Suggests pi is flaky or env-dependent (the workbench run stripsCLAUDE*/ANTHROPIC*and drives through a PTY). Needs root-causing before any pi ramp.Impact
Blocks the epic's core deliverable (pi as a viable second runner). Not a #701-merge blocker — pi is opt-in behind flags and default/anthropic is unaffected — but pi must stay off until fixed. Related: #789 (pi+gpt-5 too slow).