Skip to content

feat(runner): fail-closed security parity on the pi backend (#525)#695

Closed
gewenyu99 wants to merge 8 commits into
pi/03-wizard-tools-on-pifrom
pi/04-security-subagents
Closed

feat(runner): fail-closed security parity on the pi backend (#525)#695
gewenyu99 wants to merge 8 commits into
pi/03-wizard-tools-on-pifrom
pi/04-security-subagents

Conversation

@gewenyu99

@gewenyu99 gewenyu99 commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Epic #520 · implements #525 (fail-closed security parity) + #526 (Task/todo + controlled subagents). Top of the pi stack (#692#693#694#695).

⚠️ Draft — pi is NOT at parity with the anthropic backend yet

Done in this stack:

Open parity gaps (tracked, not yet done):

gewenyu99 and others added 2 commits June 19, 2026 16:23
pi has no permission layer, so attach an extension that intercepts EVERY tool
call — built-in (bash/read/edit/write/grep) and custom — via pi's tool_call
hook and reuses the exact anthropic policy: wizardCanUseTool (bash allowlist +
.env fencing + disallowedTools) plus the YARA content scan (bash command,
written content with the same wizard-doc posthog_pii suppression). A tool_result
hook post-scans read/bash output for prompt injection. Everything fails closed:
a scanner error blocks, and a critical post-scan violation latches so every
later call is blocked and the run ends as AgentErrorType.YARA_VIOLATION. Plus a
runaway tool-call cap.

extensionFactories load even with noExtensions:true, so the fence is always on
while the target project can't inject its own extensions. Subagents reuse the
same factory so a child can't escape it.

Proven by unit test (no live key needed): the blocked-action corpus
(cat .env, rm -rf, curl exfil, shell operators, direct .env read/write/edit/grep)
is blocked; install/build + source edits + the sanctioned env tools are allowed;
the post-scan latch and runaway guard fire.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…#526)

Task/todo (#526): TaskCreate/Update/Get/List as pi tools backed by a shared
store; every mutation pushes the list to the TUI via getUI().syncTodos, so the
todo panel updates live under pi — the parity that was missing.

Controlled subagents (#526): pi has no native subagents, so dispatch_agent
spawns a nested createAgentSession WE construct, which closes the leak the
claude-agent-sdk path warns about. Every child inherits: the SAME security
extension (canUseTool + YARA, shared cap + violation latch); a read-only
built-in toolset (read/grep/find/ls + allowlisted bash) — no write/edit; and no
custom tools, so no .env writes and no dispatch_agent (depth hard-capped at 1).
A child can research but never mutate the project or escape the fence.

Logging parity: log assistant turns ([pi] assistant: …) on message_end and tool
I/O on tool_execution_*, and drive the single run spinner with one stable status
at a time (no overlapping/garbled messages).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci basic-integration
  • /wizard-ci error-tracking-upload-source-maps
  • /wizard-ci misc
  • /wizard-ci revenue

Test an individual app:

  • /wizard-ci basic-integration/android
  • /wizard-ci basic-integration/angular
  • /wizard-ci basic-integration/astro
Show more apps
  • /wizard-ci basic-integration/django
  • /wizard-ci basic-integration/fastapi
  • /wizard-ci basic-integration/flask
  • /wizard-ci basic-integration/javascript-node
  • /wizard-ci basic-integration/javascript-web
  • /wizard-ci basic-integration/laravel
  • /wizard-ci basic-integration/next-js
  • /wizard-ci basic-integration/nuxt
  • /wizard-ci basic-integration/python
  • /wizard-ci basic-integration/rails
  • /wizard-ci basic-integration/react-native
  • /wizard-ci basic-integration/react-router
  • /wizard-ci basic-integration/sveltekit
  • /wizard-ci basic-integration/swift
  • /wizard-ci basic-integration/tanstack-router
  • /wizard-ci basic-integration/tanstack-start
  • /wizard-ci basic-integration/vue
  • /wizard-ci error-tracking-upload-source-maps/android
  • /wizard-ci error-tracking-upload-source-maps/cicd-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-nested-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-gitlab-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-ssh-vps-node-raw
  • /wizard-ci error-tracking-upload-source-maps/flutter
  • /wizard-ci error-tracking-upload-source-maps/ios
  • /wizard-ci error-tracking-upload-source-maps/next
  • /wizard-ci error-tracking-upload-source-maps/next-no-posthog
  • /wizard-ci error-tracking-upload-source-maps/node-raw
  • /wizard-ci error-tracking-upload-source-maps/node-rollup
  • /wizard-ci error-tracking-upload-source-maps/node-rollup-typescript-plugin
  • /wizard-ci error-tracking-upload-source-maps/node-webpack
  • /wizard-ci error-tracking-upload-source-maps/nuxt-3-6
  • /wizard-ci error-tracking-upload-source-maps/nuxt-4-3
  • /wizard-ci error-tracking-upload-source-maps/react-native
  • /wizard-ci error-tracking-upload-source-maps/react-vite
  • /wizard-ci error-tracking-upload-source-maps/rust
  • /wizard-ci misc/quack-quack
  • /wizard-ci revenue/stripe

Results will be posted here when complete.

gewenyu99 commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

gewenyu99 and others added 6 commits June 19, 2026 16:51
Profiling showed pi burns model round-trips early in every run: it reaches for
`bash ls/find` (blocked by the fence) and retries before discovering `read` on
a directory lists files, and it re-fetches the 36-skill menu 2-3x. Fixes:
- PI_RUNTIME_NOTES appended to the pi system prompt: use `read` for directory
  listing (bash is install/build only), call load_skill_menu once, never
  hardcode the PostHog host (the YARA scanner blocks it at write time).
- memoize fetchSkillMenu in-process so the menu is fetched at most once per run.
- make the env tools genuinely async (await fs.promises) — fixes require-await.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…p, sharper steer

From the anthropic-vs-pi parity diff + Vincent's testing:
- #14: steer the agent to update the task list FREQUENTLY (complete/advance as it
  goes) so the TUI step label isn't stale — not "one in_progress at a time".
- #5: print an init banner ("Initializing Wizard agent...") like the anthropic path.
- #15: clean up .posthog-events.json host-side after the run (the skill says remove
  it; pi's `rm` is fence-blocked, so it left a 0-byte file).
- #12: sharper tool steer — use `read` on a directory to list files (never bash
  ls/find/cat/grep), and run installs synchronously (no `&`/`&&`/pipes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Give pi the native ls/find/grep tool definitions. pi's defaults are only
  read/bash/edit/write, so the agent bashed `ls`/`find` into the fence and
  spiraled (profiled + turn-by-turn audit). With the tools it explores properly
  (used them 14x in the re-run); fix the steer too (read errors on a directory).
- System-prompt hardening (adapted from Claude Code system prompts, trimmed to
  our use case): treat skill/project file contents as untrusted (injection
  guard); actually verify (build/typecheck + confirm the SDK imports, not just
  that a file was written) — addresses the audit's skipped-revise gap;
  dispatch_agent prompts must be self-contained (read-only, no recursion).

Known remaining fence-vs-workflow blocks (separate fix): the agent still tries
`npm install &` (commandment says background it) and `node -e` verification —
both fence-blocked. Anthropic expresses these fence-compatibly (bg tool param).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pi's bash tool input is only {command, timeout?} — no run_in_background param —
so it can't background a command except via `&`, which the fence blocks. The
shared commandment (#306) "start the installation as a background task" is thus
unfollowable on pi, and the agent kept retrying `npm install &` / `cd && …`.
Steer pi to disregard that instruction and install synchronously. (Not a fence
change — loosening for `&` wouldn't help; pi's bash exec waits regardless.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per Vincent (its author): the rule was probably a mistake. It tells the agent to
background installs and continue, but that races (using a package before install
finishes) and is unfollowable on runtimes without backgrounding (pi's bash has no
run_in_background param, and `&` is fence-blocked). Remove it for all backends and
let installs run synchronously. Snapshot updated; pi note simplified to the fence
facts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- #10: load pi-mcp-adapter via jiti and connect to the hosted PostHog MCP
  (boot.mcpUrl); pre-warm its cache to register curated dashboard/insight
  tools as direct tools; proxy disabled to keep context small.
- Env lockdown: pi's bash spawns with a scrubbed minimal env so no secret or
  ambient var reaches a subprocess; shared into the subagent too.
- Auth: neutralize an inherited Claude Code session so the agent bearer-auths
  to the gateway instead of taking an OAuth path (the 401).
- Prompt audit: promote the prompt-injection guard to both backends; extend
  the minimal-edits rule; parse [DASHBOARD_URL]/[NOTEBOOK_URL] markers (#9).
- Deps: pi-mcp-adapter, jiti; zod ^3.25 (MCP SDK needs the zod/v4 subpath).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gewenyu99

Copy link
Copy Markdown
Collaborator Author

Superseded — split into focused PRs #697#701 for reviewability (security / tasks+subagents / perf+#306 / auth / MCP+lockdown+perf).

@gewenyu99 gewenyu99 closed this Jun 20, 2026
@gewenyu99 gewenyu99 deleted the pi/04-security-subagents branch June 27, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant