feat(self-driving): Self-driving setup skills for Wizard by sortafreel · Pull Request #192 · PostHog/context-mill

sortafreel · 2026-06-19T14:21:00Z

What changed

A new self-driving-setup skill
The skill generator now caches fetched docs on disk and retries with backoff. Before, the ~50 parallel doc fetches during a build could overload posthog.com and a single dropped connection would take down the build or dev server.

Workflow skill for the wizard's new product-autonomy program (wizard autonomy). Eight chained references: probe Signals API access, read the setup report + project profile, confirm org AI-data-processing approval, require the GitHub integration, enable the matching signal sources, ask-then-connect issue trackers, sync + tune the scout fleet, and write posthog-product-autonomy-report.md. Abort strings match the wizard's PRODUCT_AUTONOMY_ABORT_CASES verbatim. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

pnpm 10 refuses to prepare the git-hosted warlock dependency unless it is in onlyBuiltDependencies; the package.json pnpm section overrides pnpm-workspace.yaml, so the allowlist there never applied. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…t did not.

github-actions · 2026-06-19T14:21:11Z

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

/wizard-ci all

Test all apps in a directory:

/wizard-ci basic-integration
/wizard-ci error-tracking-upload-source-maps
/wizard-ci misc
/wizard-ci revenue

Test an individual app:

/wizard-ci basic-integration/android
/wizard-ci basic-integration/angular
/wizard-ci basic-integration/astro

Show more apps

/wizard-ci basic-integration/django
/wizard-ci basic-integration/fastapi
/wizard-ci basic-integration/flask
/wizard-ci basic-integration/javascript-node
/wizard-ci basic-integration/javascript-web
/wizard-ci basic-integration/laravel
/wizard-ci basic-integration/next-js
/wizard-ci basic-integration/nuxt
/wizard-ci basic-integration/python
/wizard-ci basic-integration/rails
/wizard-ci basic-integration/react-native
/wizard-ci basic-integration/react-router
/wizard-ci basic-integration/sveltekit
/wizard-ci basic-integration/swift
/wizard-ci basic-integration/tanstack-router
/wizard-ci basic-integration/tanstack-start
/wizard-ci basic-integration/vue
/wizard-ci error-tracking-upload-source-maps/android
/wizard-ci error-tracking-upload-source-maps/cicd-docker-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-github-actions-docker-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-github-actions-nested-docker-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-github-actions-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-gitlab-node-raw
/wizard-ci error-tracking-upload-source-maps/cicd-ssh-vps-node-raw
/wizard-ci error-tracking-upload-source-maps/flutter
/wizard-ci error-tracking-upload-source-maps/ios
/wizard-ci error-tracking-upload-source-maps/next
/wizard-ci error-tracking-upload-source-maps/next-no-posthog
/wizard-ci error-tracking-upload-source-maps/node-raw
/wizard-ci error-tracking-upload-source-maps/node-rollup
/wizard-ci error-tracking-upload-source-maps/node-rollup-typescript-plugin
/wizard-ci error-tracking-upload-source-maps/node-webpack
/wizard-ci error-tracking-upload-source-maps/nuxt-3-6
/wizard-ci error-tracking-upload-source-maps/nuxt-4-3
/wizard-ci error-tracking-upload-source-maps/react-native
/wizard-ci error-tracking-upload-source-maps/react-vite
/wizard-ci error-tracking-upload-source-maps/rust
/wizard-ci misc/quack-quack
/wizard-ci revenue/stripe

Results will be posted here when complete.

sortafreel · 2026-06-19T18:37:43Z

🤖 Validated self-review — PR #192

I ran an automated PR review over this branch, then re-validated every finding against the current code and cross-checked each contract claim against the local posthog and wizard repos. The inline comments below are the kept findings; each has a threaded reply labeled 🤖 Agent response (the reply text is agent-drafted — I did not hand-write it).

Method: one validator + one adversarial skeptic per finding (the skeptic tries to refute each keep, correcting the raw review's "everything is valid" bias). Bar: keep if it could plausibly happen in a real wizard run; drop if theoretical, already-handled, or wrong about the code.

⚠️ Two things invalidated many raw findings: (1) the review was stale — the old 3-ai-approval.md step was deleted and later files renumbered; (2) it judged the skill markdown in isolation, missing harness controls (BASE_ALLOWED_TOOLS, the AI-opt-in gate, fail-closed YARA scanning).

Disposition	Count
Kept (real, production-plausible)	14
Dropped — adversarial skeptic refuted	12
Dropped at validation (fixed / wrong / mitigated)	10
Total	36

Notably, none of the 7 findings the raw review rated must_fix survived as must_fix — 4 dropped, 3 downgraded. Kept set: 7 should_fix + 7 nice_to_have.

Kept findings (see inline comments)

id	severity	finding
`1-3-1`	should_fix	GitHub gate accepts integrations that may not cover this repository
`1-4-1`	should_fix	Existing source detection can mark unusable warehouse sources as connected
`1-5-1`	should_fix	Custom scout cadence is documented incorrectly
`2-4-3`	should_fix	Linear source creation can sync the wrong workspace when multiple integrations exist
`2-5-1`	should_fix	Scout tuning can disable user-managed custom scouts
`3-1-1`	should_fix	Doc fetch retries have no per-attempt timeout
`3-2-1`	should_fix	Source/scout write failures can still produce a completed setup
`1-3-2`	nice_to_have	Unknown scout surfaces handed off with contradictory behavior
`1-4-3`	nice_to_have	Connected-tools multi-select does not define the decline branch
`1-5-3`	nice_to_have	Final report naming rule contradicts itself
`2-5-3`	nice_to_have	Generated scout bodies lack a required prompt-injection safety guard
`3-3-2`	nice_to_have	Project profile failures are silently swallowed
`3-4-3`	nice_to_have	Single post-UI verification can misclassify a successful GitHub source connection
`3-5-2`	nice_to_have	Failed scout disables can be reported as successful disables

22 dropped findings + why (click to expand)

Factually wrong against the code/contract

id	why dropped
`1-2-1`	No bypass — the AI-opt-in gate is injected at the program layer for `wizard skill <id>` too; it parks the run until org approval lands.
`2-5-2`	`signals-scout-config-sync` already auto-registers configs via `register_missing_configs` ("author a skill, get a scout"); 6b:64 already re-runs sync.
`2-4-1`	Recipe step is gated on "Row exists" — the agent already holds the row id from the step-1 list; a missing id errors loudly, not a silent no-op.
`2-4-2`	Hard-coded payload is correct against `ExternalDataSourceCreatePayload` (schemas under `payload`, `access_method` defaults to warehouse).
`1-4-2`	`Bash` is in `BASE_ALLOWED_TOOLS` — always live; ToolSearch defers only the large MCP schemas, not core tools.
`2-5-4`	`Write` is in `BASE_ALLOWED_TOOLS` — always live; the read is a benign idempotency hint, no broader than the step-2 read.
`3-5-3`	Premise false — the general scout shares the same discipline, and 6b:62 independently mandates the full quality bar regardless of template.

Already mitigated by the wizard harness

id	why dropped
`2-3-1`	Mitigated by fail-closed YARA scanning of every `Read`/`Grep` (prompt-injection + secrets categories) before any write or authorize link.
`2-3-2`	Consumer is the already-OAuth-scoped agent; probe rows collapse to a boolean and are never instructed into the report.
`3-3-1`	Single hosted MCP versioned in lockstep with the release; no "older deployment that predates the tool."

Under-specified, but the flow already absorbs it

id	why dropped
`1-3-3`	`3-github.md:55` ("never continue without GitHub") + re-ask (never a continue branch) close the path; "exit" maps to the specified `[ABORT]`.
`3-3-3`	A transient list error collapses into the same wait/re-verify/re-ask path as the eventual-consistency lag; `[ABORT]` only on the explicit "cant" choice.
`3-2-2`	`evaluateAskCap` returns a soft "proceed with defaults / `[ABORT] requirements-incomplete`"; the one artifact-writing ask (6b) is propose-first.
`2-3-3`	The bare blocks are schematics of one question (the array element); a competent agent builds the `{questions:[…]}` wrapper from the self-describing schema.
`3-4-2`	Every consumer absorbs a failed list — connectors re-detect, Zendesk/pganalyze are unconditional, and the report defaults to the safe "dormant" direction.
`3-5-1`	The partial-rows window is sub-second and self-heals on the coordinator’s ~30-min tick (missing configs default to enabled).

Build-pipeline tooling, out of the agent-run scope

id	why dropped
`1-1-1`	Build-only — clean CI has no `.docs-cache`, so a dead URL fails fast; masking only in local dev, and the result is valid-old markdown + a warning.
`1-1-2`	Build-only internal knob; needs a deliberate set + a unit typo; worst case a slower but still-correct build.
`3-1-2`	Build-only — CI cache starts empty, so the first dead URL hard-fails; the multiplication only manifests locally during a live outage.
`3-1-3`	Build-only fetch of a cached docs site; the only consequence is a re-runnable build.
`2-1-1`	`@posthog/warlock@0.2.2` ships prebuilt with no install/postinstall script; `.modules.yaml` shows nothing ignored — the scanner runs.
`2-1-2`	`engines` is advisory; no `.npmrc` so `engine-strict` is off, and the scanner runs only in CI on Node 22.22+.

gewenyu99

Lgtm.

Twixes · 2026-06-22T13:32:29Z

+```
+{
+  id: "github-connect",
+  prompt: "Self-driving needs GitHub access to investigate findings in your code and open fixes — setup can't finish without it.\n\nOpen this link to install the PostHog GitHub App in one click, then approve access. Grant it the repos you want Self-driving to work with — include this project's repo so step 5 can also watch its issues:\n\n<github authorize URL>\n\nThen come back here.\n\n(Need to re-link an existing installation instead? Use your integrations settings: <integrations settings URL>.)",


A couple of things about this:

"Then come back here" - I did all of this, came back, and honestly didn't get for a minute that I need to click "Done" manually 😅 I'd expect this to be just like the desktop app - auto-detecting a callback, or polling.
Relatedly, as a user, when I get a wall of text like this, I have an insatiable urge to press "Enter". Basically, at first sight, this feels like an EULA. We should make this feel more like a loading state, which is waiting until you've connected GH

The <integrations settings URL> link wraps in a way that makes it useless - unable to click and unable to copy (<github authorize URL> is fine as its in its own line)

Oh, I see you tackled problem 2 (link wrapping) in the Wizard PR – but this didn't seem to work for me in Cursor 🤔

Removed the re-link, noted the polling, will iterate.

Twixes · 2026-06-22T13:45:32Z

+```
+{
+  id: "connected-tools",
+  prompt: "Self-driving can also watch your other tools and pull their issues into the inbox. Which of these do you use?",


Nit: I feel "<...> and pull their issues into the inbox" undersells it a bit – something like this would be more active framing: "Self-driving can also address problems surfaced in your other tools."
But I don't feel strongly here.

I do definitely believe an option like "I don't see a tool in this list" would provide really interesting data. Probs needs a wizard change to support that.

Good point. updated the copy.

As for custom options - yup, noted.

Twixes

A few comments for your consideration

Twixes · 2026-06-22T14:17:50Z

+1. Call `inbox-source-configs-list`.
+2. **Success — including an empty list** — means the API is reachable: proceed. (The probe can't prove beta enrollment — the wizard's detect step and the beta flags own that — but it's the strongest signal available to you.) Keep the returned rows: step 2 and step 4 use them as the already-enabled baseline. Mark your access task completed and continue.


Hmm, but the endpoint powering inbox-source-configs-list isn't flagged. It's GET /api/projects/{team_id}/signals/source_configs/ and I don't think there's any reason why it wouldn't be reachable.

No reason, other than the scope not being available, but that would not mean "self-driving is not available for this project", that would means that something is wrong with authentication.

What's this whole step for then?

Initially it was about FFs, but we decided to drop all FFs. I still find it valuable, as it checks that MCP works, sources configs are pulled, and so on, before we do any edits. Plus, I 100% can imagine us adding new stuff that we would want FF in the future, so I'll keep it.

Twixes · 2026-06-22T14:19:12Z

+
+1. **Read `./posthog-setup-report.md`.** It is ground truth for what the base integration instrumented **in this repo**: events, error tracking, feature flags. Do not re-derive what it already states. It is NOT authority over project-level facts — session replay in particular may be instrumented in another repo or via the snippet, so the report can rule replay in but never out (step 4 probes the server for that).


Worth clarifying that this won't be present if the codebase was set up with PostHog a while ago. (Or never)

Yup, updated.

Twixes · 2026-06-22T14:25:48Z

+
+2. **Call `signals-scout-project-profile-get`.** It returns products in use, connected integrations, warehouse sources, and the signal source configs split enabled/disabled — one call instead of four. **Tolerate failure**: it can 404 or error on a team without a profile yet. If it fails, fall back to the step-1 source list and the report; do not retry more than once and do not abort. **Note "profile unavailable" in your checklist** — a profile 404 is expected on a first-run team, so any later decision that relies only on the profile must record "unknown", not a confident negative.
+
+3. **Server-side product usage.** The run prompt's "Project state" block is authoritative for the opt-ins it lists (session replay recording, exception autocapture, surveys): **opt-in ON = product enabled**, even if no data has arrived yet. Where the block says OFF/unknown and the repo gave no signal, spend ONE cheap probe each for usage evidence (tolerate 403/404 → record "unknown"):


Maybe I'm being a bad LLM here:
What is the "run prompt" referring to here?

Each wizard run has a prompt (prompt.ts) that decides the order of steps and skill to follow that we use.

Twixes · 2026-06-22T14:34:54Z

+4. **Light scan for what the report, profile, and server state won't cover.** Targeted lookups only — package manifests, config files, a grep or two. You are answering these questions:
+   - **Revenue**: is there a payment SDK (Stripe, Paddle, LemonSqueezy, RevenueCat…) or revenue events?
+   - **Surveys**: does the code or profile show PostHog surveys in use?
+   - **AI/LLM**: are there `$ai_*` events, an LLM SDK, or LLM analytics in the profile?
+   - **Logs**: is the PostHog logs product in use (per the profile)?
+   - **CSP**: is a Content-Security-Policy with PostHog CSP reporting configured?
+   - **Support**: does the team use PostHog support/conversations (per the profile)?
+   - **Issue trackers**: any hints of Linear, Zendesk, or pganalyze (you will still ask in step 5 — hints only shape the question, they never authorize enabling).


I think error tracking, replay, and analytics are all worth including in the code scan as well, as you might e.g. not yet have errors in PostHog, but indeed have error tracking hooked up

Fair enough, not sure what "analytics" means (general scout should cover regular analytics), but added bias toward error tracking/replay.

Twixes · 2026-06-22T17:52:55Z

+
+Load via `ToolSearch select:mcp__posthog-wizard__integrations-github-repos-retrieve,mcp__posthog-wizard__external-data-sources-create`.
+
+If `integrations-github-repos-retrieve` or `external-data-sources-create` isn't available (older server), skip the auto-create and record GitHub Issues as a dormant source (the dormant fallback below). **Not an abort.**


I guess we won't need to worry about an older server here?

yup, removed

Twixes · 2026-06-22T17:54:32Z

+next_step: 6b-tailor-scouts.md
+---
+
+# Step 6 — Configure the scout fleet


This is now a scout troop in the UI! (that's also what we'll go with marketing-wise, like hedgehog boyscouts troop, literally)

Good point, renamed.

Twixes · 2026-06-22T18:20:10Z

+
+1. **Materialize**: call `signals-scout-config-sync`. It is idempotent — it seeds the canonical scout skills for this team and creates any missing configs, then returns the fleet.
+
+   **Soft-degrade if the tool is missing or fails**: fall back to `signals-scout-config-list`. If that returns rows, tune those. If it returns nothing, the fleet hasn't been materialized yet — record a follow-up ("the scout fleet materializes automatically within ~30 minutes; tune it later in PostHog or re-run this setup") and continue to step 7. **Not an abort.**


Hmm, interesting, does it really happen that you have to wait 30 minutes?

You kinda do. Quoting from other PR:

Scout runs immediately, the sources should fire right away too, but it'll take 30 minutes or so for "get signal -> emit signal -> build repo cache -> pick the repo -> research the signal -> return the report". We can say 20 if you want.

Twixes · 2026-06-22T18:52:25Z

+- `llm_analytics` (internal-only, not a user-facing responder)
+- `logs` (not a v1 responder)
+- Anything with `source_type` `evaluation` or `alert_state_change`
+- The connected-tool sources (`github`, `linear`, `zendesk`, `pganalyze`) — those are step 5, ask-first.


Thinking we should def also ask them about non-signal sources, but ones we can connect via MCP, and are context-rich. E.g. Notion is a prime suspect here

Yup, surely need to be one of the next iterations.

sortafreel and others added 17 commits June 10, 2026 16:23

chore: adjust sources monitoring

b08cebd

feat: allow skills to pick github

4fea255

feat: Skills to create custom scouts.

634c620

fix: Add nudge if the user said they'll connect zendesk/pganalyze, bu…

2550ee9

…t did not.

fix: Add retries/cache for context mill.

3dcb03e

fix: Adjust logic for "picked, but not connected"

3e10a6e

fix: gate conditional scouts on positive evidence

901140c

Merge branch 'main' into feat/product-autonomy-setup-skill

a9f2d00

fix: Simplify step 3.

e6b98f7

fix: Skills for scouts/github.

d10f51a

feat: Simplify sources connection

dae4dd0

chore: Rename to self-driving.

3d36929

chore: Rename to self-driving.

8af2177

chore: Rename to self-driving.

efb250e

fix: Ensure "None" goes first.

0924d1e

sortafreel changed the title ~~(DONT-REVIEW-YET) Feat/product autonomy setup skill~~ (DONT-REVIEW-YET) feat(self-driving): Self-driving setup skills for Wizard Jun 19, 2026

fix: Remove excessive third step.

4e6b24f