Skip to content

feat(pi): real PostHog MCP dashboard, env lockdown, perf parity#701

Merged
gewenyu99 merged 22 commits into
mainfrom
pi/mcp-dashboard
Jul 3, 2026
Merged

feat(pi): real PostHog MCP dashboard, env lockdown, perf parity#701
gewenyu99 merged 22 commits into
mainfrom
pi/mcp-dashboard

Conversation

@gewenyu99

@gewenyu99 gewenyu99 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Epic #520 · PostHog data-ops via real MCP + env lockdown (new sub-issue pending).

Problem

  • pi needed real PostHog data-ops (dashboards/insights) through the hosted MCP — not a REST hack — and its bash env needed locking down so no secret leaks into npm install.

Changes

  • jiti-load pi's MCP extension (pi-mcp-adapter) against boot.mcpUrl (same hosted MCP as anthropic); register only curated dashboard/insight tools as direct tools (disableProxyTool: true) so the ~30-tool proxy never pollutes context.
  • bindExtensions({}) after createAgentSession so the adapter connects on session_start.
  • MCP token passed by env-var name (bearerTokenEnv) — never on disk or in the bash env.
  • Scrubbed minimal bash env (allowlist only) via noTools: 'builtin' + a re-registered bash spawnHook; shared into the subagent.

Test plan

  • pi creates a real dashboard + 5 insights, exit 0, on express-todo AND django.
  • No secret/ambient var reaches an npm install; pinned by pi-env-lockdown.test.ts.

@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Details

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci basic-integration
  • /wizard-ci error-tracking-upload-source-maps
  • /wizard-ci misc
  • /wizard-ci revenue

Test an individual app:

  • /wizard-ci basic-integration/android
  • /wizard-ci basic-integration/angular
  • /wizard-ci basic-integration/astro
Show more apps
  • /wizard-ci basic-integration/django
  • /wizard-ci basic-integration/fastapi
  • /wizard-ci basic-integration/flask
  • /wizard-ci basic-integration/javascript-node
  • /wizard-ci basic-integration/javascript-web
  • /wizard-ci basic-integration/laravel
  • /wizard-ci basic-integration/next-js
  • /wizard-ci basic-integration/nuxt
  • /wizard-ci basic-integration/python
  • /wizard-ci basic-integration/rails
  • /wizard-ci basic-integration/react-native
  • /wizard-ci basic-integration/react-router
  • /wizard-ci basic-integration/sveltekit
  • /wizard-ci basic-integration/swift
  • /wizard-ci basic-integration/tanstack-router
  • /wizard-ci basic-integration/tanstack-start
  • /wizard-ci basic-integration/vue
  • /wizard-ci error-tracking-upload-source-maps/android
  • /wizard-ci error-tracking-upload-source-maps/cicd-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-nested-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-gitlab-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-ssh-vps-node-raw
  • /wizard-ci error-tracking-upload-source-maps/flutter
  • /wizard-ci error-tracking-upload-source-maps/ios
  • /wizard-ci error-tracking-upload-source-maps/next
  • /wizard-ci error-tracking-upload-source-maps/next-no-posthog
  • /wizard-ci error-tracking-upload-source-maps/node-raw
  • /wizard-ci error-tracking-upload-source-maps/node-rollup
  • /wizard-ci error-tracking-upload-source-maps/node-rollup-typescript-plugin
  • /wizard-ci error-tracking-upload-source-maps/node-webpack
  • /wizard-ci error-tracking-upload-source-maps/nuxt-3-6
  • /wizard-ci error-tracking-upload-source-maps/nuxt-4-3
  • /wizard-ci error-tracking-upload-source-maps/react-native
  • /wizard-ci error-tracking-upload-source-maps/react-vite
  • /wizard-ci error-tracking-upload-source-maps/rust
  • /wizard-ci misc/quack-quack
  • /wizard-ci revenue/stripe

Results will be posted here when complete.

gewenyu99 commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

@gewenyu99 gewenyu99 changed the title feat(pi): real PostHog MCP dashboard (#10), env lockdown, perf parity feat(pi): real PostHog MCP dashboard, env lockdown, perf parity Jun 20, 2026
@gewenyu99 gewenyu99 force-pushed the pi/auth-isolation branch from ae67e49 to 29dc7d2 Compare June 20, 2026 21:36
gewenyu99 added a commit that referenced this pull request Jun 22, 2026
Same resolved version; just the package.json floor, so #701 and #702 don't
conflict on the zod line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
gewenyu99 added a commit that referenced this pull request Jun 24, 2026
Same resolved version; just the package.json floor, so #701 and #702 don't
conflict on the zod line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

daniloc commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

ah lol you got there

@daniloc daniloc left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say go for it

a few things to think about here:

  • is this declarative and compositional in a way that you're going to like working with via an agent over time? Clean slots you can pull in and out of use according to runtime conditions or as the project evolves are what kept us sane on the basics
  • do you feel good about this thing only triggering on specific conditions? the switch point in the code seems straightforward enough, but we're playing with live ammo these days, so worth being belt and suspenders about it where you can

@gewenyu99

gewenyu99 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author
Details

Live snapshot evidence — reshaped runner plan

Real end-to-end wizard runs on the reshaped runner (runner-plan.ts: config-map + per-flag middleware), driven by e2e-harness/tui-snapshots against a throwaway Next.js 15 app, on the real gateway (gateway.us.posthog.com, US project 228144). All four backends resolve through one seam; each created real PostHog resources. Key moments per run — intro → working → full task list complete → outro. Full per-screen text frames linked under each.

[runner] resolved: runner=pi   model=claude-sonnet-4-6   → gateway /wizard      (anthropic-messages)
[runner] resolved: runner=pi   model=openai/gpt-5        → gateway /wizard/v1   (openai-completions)

pi · sonnet → dashboard 1765769 (text frames)

intro
working
full task list complete
outro

anthropic · claude-agent-sdk (control) → dashboard 1765799 (text frames)

intro
working
full task list complete
outro

pi · gpt-5 — OpenAI completions through the gateway → dashboard 1765879 (text frames)

intro
working
full task list complete
outro

orchestrator — task-queue runner → 7/8 steps (text frames)

intro
working
full task list complete
outro

gewenyu99 and others added 2 commits June 26, 2026 20:30
Re-add the wizard-runner flag key on top of latest main (it lived only on the
old stack, which is being re-authored). Read by the wizardRunner resolver
middleware in #692b; no importer yet.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mcp-prompt-streaming.ts and agent-prompt-loader.ts hardcoded 'claude-sonnet-4-6';
point them at the shared DEFAULT_AGENT_MODEL constant (agent-interface already
uses it on main). Value unchanged; prep for the MODELS alias map in #692a.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gewenyu99 gewenyu99 force-pushed the pi/auth-isolation branch from 29dc7d2 to 0afcc78 Compare June 27, 2026 01:16
Base automatically changed from pi/auth-isolation to pi/perf-tuning June 27, 2026 01:16
gewenyu99 added a commit that referenced this pull request Jun 27, 2026
pi loads its own MCP extension (pi-mcp, jiti-loaded) against the hosted PostHog MCP
and registers the curated dashboard/insight tools as direct tools, so a pi run
creates a real dashboard. bash spawns with a scrubbed allowlist-only env so no
secret reaches an install. Fuller anti-spiral runtime notes; 1M context.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
gewenyu99 added a commit that referenced this pull request Jun 27, 2026
pi loads its own MCP extension (pi-mcp, jiti-loaded) against the hosted PostHog MCP
and registers the curated dashboard/insight tools as direct tools, so a pi run
creates a real dashboard. bash spawns with a scrubbed allowlist-only env so no
secret reaches an install. Fuller anti-spiral runtime notes; 1M context.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gewenyu99 gewenyu99 changed the base branch from pi/perf-tuning to pi/auth-isolation June 27, 2026 01:20
@gewenyu99 gewenyu99 force-pushed the pi/auth-isolation branch from 096c7b1 to 62fdb75 Compare June 27, 2026 02:24
gewenyu99 added a commit that referenced this pull request Jun 27, 2026
pi loads its own MCP extension (pi-mcp, jiti-loaded) against the hosted PostHog MCP
and registers the curated dashboard/insight tools as direct tools, so a pi run
creates a real dashboard. bash spawns with a scrubbed allowlist-only env so no
secret reaches an install. Fuller anti-spiral runtime notes; 1M context.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gewenyu99

Copy link
Copy Markdown
Collaborator Author

/wizard-ci basic-integration/next-js

@gewenyu99 gewenyu99 marked this pull request as ready for review July 2, 2026 21:37
@wizard-ci-bot

wizard-ci-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

🧙 Wizard CI Results

Trigger ID: af1d012
Workflow: View run

App Confidence PR YARA
basic-integration/next-js/15-app-router-saas 4/5 #2328 (logs)
basic-integration/next-js/15-app-router-todo 4/5 #2329 (logs)
basic-integration/next-js/15-pages-router-saas 4/5 #2331 (logs) ⚠️
basic-integration/next-js/15-pages-router-todo 4/5 #2327 (logs)

Configuration

Setting Value
Wizard ref pi/mcp-dashboard
Context Mill ref main
PostHog ref master

Search for trigger ID af1d012 in wizard-workbench PRs.

⚠️ YARA Scanner — basic-integration/next-js/15-pages-router-saas
71 tool calls scanned, 1 violation(s) detected
[REVERTED] posthog_pii_in_capture_call (high) — PostToolUse:Edit

pi hardcoded reasoning:true for every gateway model, so it sent reasoning_effort
even to non-reasoning openai models — gpt-4o → gateway UnsupportedParamsError →
the run no-op'd (deps + .env, zero code). Add switchboard/models.ts: a
configurable per-model capability table (reasoning) with a transport-based
default (anthropic on, openai off; a reasoning openai model opts back in). The
pi harness reads reasoning from it instead of guessing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gewenyu99

gewenyu99 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Go / no-go — #701 → main (final, with live evidence)

Live parity via the workbench snapshot flow (real run → Playwright frames → CI PR with the code diff). Both Next.js app-router apps, each reset to pristine main before the run, frame-count + integration guarded, timed. PRs opened→closed as artifacts on PostHog/wizard-workbench:

Config saas todo
anthropic / claude-sonnet-4-6 ✅ full · #2317 ✅ full · #2320
pi / claude-sonnet-4-6 ✅ full · #2322 ✅ full · #2324
pi / openai/gpt-4o (sonnet-class) ⚠️ broken output · #2332 · ⏱2m39s
pi / openai/gpt-5 ❌ too slow (>15 min, #789)

Verdict

Fixes landed on #701 this pass: model-alias → gateway-id, and the pi reasoning-per-model matrix (#792).

@gewenyu99

gewenyu99 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Picking an OpenAI-class backup to sonnet — switchboard exploration runs

Live pi-harness runs on app-router apps, all reset to pristine main first. One rubric (workbench evaluation.md, 5 dims: client-init file · server SDK · client initialized · captures · identify; wrong-SDK-server-side caps at 2).

The switchboard

Default flag-less behavior is unchanged — anthropic + sonnet + linear. New: reasoning effort is a per-model trait in the capability matrix (switchboard/models.tsthinkingLevel → pi reasoning_effort), and the wizard-runner=pi flag now pairs pi with its backup model. --harness/--model/--sequence are dev-only (tree-shaken from prod).

Runs (pi harness)

Model Effort ⏱ Time saas todo PR Notes
claude-sonnet-4-6 (control) default ~4m37s 5/5 baseline The bar.
openai/gpt-5-mini medium 3m30s 5/5 ~3/5 #2334 todo run #2337; saas full + deep-verified (instrumentation-client.ts w/ init+proxy+error-tracking, posthog-server.ts, identify, capture). Faster than sonnet. Server-correct on both apps; skipped client init on todo — the one inconsistency.
openai/gpt-5 low 8m39s 5/5 #2335 Full. Low effort keeps flagship under 15m (was >15m, #789) but 2.5× slower + pricier than mini.
openai/o4-mini default 20m28s 5/5 (full) #2336 Full integration incl. client init, but far too slow to be a drop-in.
openai/gpt-4o n/a (non-reasoning) 2m39s 1/5 #2332 Broken: browser SDK imported server-side, never initialized, gave up on a build error. Too weak to follow the skill.

¹ sonnet control artifact still pending the 1Password signing hiccup; o4-mini + gpt-5-mini-todo now pushed.

What moved gpt-5-mini from 3/5 → 5/5 (on saas)

Two changes in #701: (1) a pi setup-order steering line — finish SDK init for every runtime before adding captures, don't jump to the fix/revise step; (2) medium reasoning effort. Both shipped together; the steering is the targeted cause.

Recommendation

gpt-5-mini @ medium effort is the backup — sonnet-class quality at ~1/3 the runtime, evaluator-verified 5/5 on saas. One caveat before calling it a clean drop-in: it's server-correct everywhere but the client-init step isn't yet consistent (nailed it on saas, skipped it on todo). Next step is firming the setup-order steering so client init is reliable across app shapes. gpt-5 @ low is the conservative fallback (5/5, slower); o4-mini too slow; gpt-4o unusable.

GO to merge #701 — additive, default-off, sonnet default untouched; wizard-runner=pi opts into the gpt-5-mini backup path.

…soning effort

The pi runner drives reasoning models; pair it with the smaller, faster,
cheaper openai reasoning model (gpt-5-mini) instead of inheriting the anthropic
sonnet default. The anthropic default path is untouched.

Reasoning effort becomes a per-model trait in the switchboard capability matrix
(thinkingLevel), which the pi harness forwards to the session as reasoning_effort
for openai-completions. gpt-5 runs at low effort (fast flagship), gpt-5-mini at
medium. A pi steering note keeps weaker models from skipping SDK initialization
before adding captures.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
gpt-5-mini shipped client captures behind a defensive `require('posthog-js')`
guard on the todo app — no init, so no events fire. Firm the setup-order steering
to require a real import + initialize at the entry point and forbid the guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gewenyu99 gewenyu99 merged commit 9730dfd into main Jul 3, 2026
17 checks passed
@gewenyu99 gewenyu99 deleted the pi/mcp-dashboard branch July 3, 2026 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants