Skip to content

feat(self-driving): Self-driving setup skills for Wizard#192

Merged
sortafreel merged 25 commits into
mainfrom
feat/product-autonomy-setup-skill
Jun 22, 2026
Merged

feat(self-driving): Self-driving setup skills for Wizard#192
sortafreel merged 25 commits into
mainfrom
feat/product-autonomy-setup-skill

Conversation

@sortafreel

@sortafreel sortafreel commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

What changed

  • A new self-driving-setup skill
  • The skill generator now caches fetched docs on disk and retries with backoff. Before, the ~50 parallel doc fetches during a build could overload posthog.com and a single dropped connection would take down the build or dev server.

sortafreel and others added 17 commits June 10, 2026 16:23
Workflow skill for the wizard's new product-autonomy program
(wizard autonomy). Eight chained references: probe Signals API access,
read the setup report + project profile, confirm org AI-data-processing
approval, require the GitHub integration, enable the matching signal
sources, ask-then-connect issue trackers, sync + tune the scout fleet,
and write posthog-product-autonomy-report.md. Abort strings match the
wizard's PRODUCT_AUTONOMY_ABORT_CASES verbatim.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
pnpm 10 refuses to prepare the git-hosted warlock dependency unless it
is in onlyBuiltDependencies; the package.json pnpm section overrides
pnpm-workspace.yaml, so the allowlist there never applied.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci basic-integration
  • /wizard-ci error-tracking-upload-source-maps
  • /wizard-ci misc
  • /wizard-ci revenue

Test an individual app:

  • /wizard-ci basic-integration/android
  • /wizard-ci basic-integration/angular
  • /wizard-ci basic-integration/astro
Show more apps
  • /wizard-ci basic-integration/django
  • /wizard-ci basic-integration/fastapi
  • /wizard-ci basic-integration/flask
  • /wizard-ci basic-integration/javascript-node
  • /wizard-ci basic-integration/javascript-web
  • /wizard-ci basic-integration/laravel
  • /wizard-ci basic-integration/next-js
  • /wizard-ci basic-integration/nuxt
  • /wizard-ci basic-integration/python
  • /wizard-ci basic-integration/rails
  • /wizard-ci basic-integration/react-native
  • /wizard-ci basic-integration/react-router
  • /wizard-ci basic-integration/sveltekit
  • /wizard-ci basic-integration/swift
  • /wizard-ci basic-integration/tanstack-router
  • /wizard-ci basic-integration/tanstack-start
  • /wizard-ci basic-integration/vue
  • /wizard-ci error-tracking-upload-source-maps/android
  • /wizard-ci error-tracking-upload-source-maps/cicd-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-nested-docker-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-github-actions-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-gitlab-node-raw
  • /wizard-ci error-tracking-upload-source-maps/cicd-ssh-vps-node-raw
  • /wizard-ci error-tracking-upload-source-maps/flutter
  • /wizard-ci error-tracking-upload-source-maps/ios
  • /wizard-ci error-tracking-upload-source-maps/next
  • /wizard-ci error-tracking-upload-source-maps/next-no-posthog
  • /wizard-ci error-tracking-upload-source-maps/node-raw
  • /wizard-ci error-tracking-upload-source-maps/node-rollup
  • /wizard-ci error-tracking-upload-source-maps/node-rollup-typescript-plugin
  • /wizard-ci error-tracking-upload-source-maps/node-webpack
  • /wizard-ci error-tracking-upload-source-maps/nuxt-3-6
  • /wizard-ci error-tracking-upload-source-maps/nuxt-4-3
  • /wizard-ci error-tracking-upload-source-maps/react-native
  • /wizard-ci error-tracking-upload-source-maps/react-vite
  • /wizard-ci error-tracking-upload-source-maps/rust
  • /wizard-ci misc/quack-quack
  • /wizard-ci revenue/stripe

Results will be posted here when complete.

@sortafreel sortafreel changed the title (DONT-REVIEW-YET) Feat/product autonomy setup skill (DONT-REVIEW-YET) feat(self-driving): Self-driving setup skills for Wizard Jun 19, 2026
@sortafreel

Copy link
Copy Markdown
Contributor Author

🤖 Validated self-review — PR #192

I ran an automated PR review over this branch, then re-validated every finding against the current code and cross-checked each contract claim against the local posthog and wizard repos. The inline comments below are the kept findings; each has a threaded reply labeled 🤖 Agent response (the reply text is agent-drafted — I did not hand-write it).

Method: one validator + one adversarial skeptic per finding (the skeptic tries to refute each keep, correcting the raw review's "everything is valid" bias). Bar: keep if it could plausibly happen in a real wizard run; drop if theoretical, already-handled, or wrong about the code.

⚠️ Two things invalidated many raw findings: (1) the review was stale — the old 3-ai-approval.md step was deleted and later files renumbered; (2) it judged the skill markdown in isolation, missing harness controls (BASE_ALLOWED_TOOLS, the AI-opt-in gate, fail-closed YARA scanning).

Disposition Count
Kept (real, production-plausible) 14
Dropped — adversarial skeptic refuted 12
Dropped at validation (fixed / wrong / mitigated) 10
Total 36

Notably, none of the 7 findings the raw review rated must_fix survived as must_fix — 4 dropped, 3 downgraded. Kept set: 7 should_fix + 7 nice_to_have.

Kept findings (see inline comments)

id severity finding
1-3-1 should_fix GitHub gate accepts integrations that may not cover this repository
1-4-1 should_fix Existing source detection can mark unusable warehouse sources as connected
1-5-1 should_fix Custom scout cadence is documented incorrectly
2-4-3 should_fix Linear source creation can sync the wrong workspace when multiple integrations exist
2-5-1 should_fix Scout tuning can disable user-managed custom scouts
3-1-1 should_fix Doc fetch retries have no per-attempt timeout
3-2-1 should_fix Source/scout write failures can still produce a completed setup
1-3-2 nice_to_have Unknown scout surfaces handed off with contradictory behavior
1-4-3 nice_to_have Connected-tools multi-select does not define the decline branch
1-5-3 nice_to_have Final report naming rule contradicts itself
2-5-3 nice_to_have Generated scout bodies lack a required prompt-injection safety guard
3-3-2 nice_to_have Project profile failures are silently swallowed
3-4-3 nice_to_have Single post-UI verification can misclassify a successful GitHub source connection
3-5-2 nice_to_have Failed scout disables can be reported as successful disables
22 dropped findings + why (click to expand)

Factually wrong against the code/contract

id why dropped
1-2-1 No bypass — the AI-opt-in gate is injected at the program layer for wizard skill <id> too; it parks the run until org approval lands.
2-5-2 signals-scout-config-sync already auto-registers configs via register_missing_configs ("author a skill, get a scout"); 6b:64 already re-runs sync.
2-4-1 Recipe step is gated on "Row exists" — the agent already holds the row id from the step-1 list; a missing id errors loudly, not a silent no-op.
2-4-2 Hard-coded payload is correct against ExternalDataSourceCreatePayload (schemas under payload, access_method defaults to warehouse).
1-4-2 Bash is in BASE_ALLOWED_TOOLS — always live; ToolSearch defers only the large MCP schemas, not core tools.
2-5-4 Write is in BASE_ALLOWED_TOOLS — always live; the read is a benign idempotency hint, no broader than the step-2 read.
3-5-3 Premise false — the general scout shares the same discipline, and 6b:62 independently mandates the full quality bar regardless of template.

Already mitigated by the wizard harness

id why dropped
2-3-1 Mitigated by fail-closed YARA scanning of every Read/Grep (prompt-injection + secrets categories) before any write or authorize link.
2-3-2 Consumer is the already-OAuth-scoped agent; probe rows collapse to a boolean and are never instructed into the report.
3-3-1 Single hosted MCP versioned in lockstep with the release; no "older deployment that predates the tool."

Under-specified, but the flow already absorbs it

id why dropped
1-3-3 3-github.md:55 ("never continue without GitHub") + re-ask (never a continue branch) close the path; "exit" maps to the specified [ABORT].
3-3-3 A transient list error collapses into the same wait/re-verify/re-ask path as the eventual-consistency lag; [ABORT] only on the explicit "cant" choice.
3-2-2 evaluateAskCap returns a soft "proceed with defaults / [ABORT] requirements-incomplete"; the one artifact-writing ask (6b) is propose-first.
2-3-3 The bare blocks are schematics of one question (the array element); a competent agent builds the {questions:[…]} wrapper from the self-describing schema.
3-4-2 Every consumer absorbs a failed list — connectors re-detect, Zendesk/pganalyze are unconditional, and the report defaults to the safe "dormant" direction.
3-5-1 The partial-rows window is sub-second and self-heals on the coordinator’s ~30-min tick (missing configs default to enabled).

Build-pipeline tooling, out of the agent-run scope

id why dropped
1-1-1 Build-only — clean CI has no .docs-cache, so a dead URL fails fast; masking only in local dev, and the result is valid-old markdown + a warning.
1-1-2 Build-only internal knob; needs a deliberate set + a unit typo; worst case a slower but still-correct build.
3-1-2 Build-only — CI cache starts empty, so the first dead URL hard-fails; the multiplication only manifests locally during a live outage.
3-1-3 Build-only fetch of a cached docs site; the only consequence is a re-runnable build.
2-1-1 @posthog/warlock@0.2.2 ships prebuilt with no install/postinstall script; .modules.yaml shows nothing ignored — the scanner runs.
2-1-2 engines is advisory; no .npmrc so engine-strict is off, and the scanner runs only in CI on Node 22.22+.

Comment thread context/skills/self-driving/references/3-github.md
Comment thread context/skills/self-driving/references/5-connected-tools.md
Comment thread context/skills/self-driving/references/6b-tailor-scouts.md Outdated
Comment thread context/skills/self-driving/references/5b-linear.md
Comment thread context/skills/self-driving/references/6-scouts.md
Comment thread scripts/lib/skill-generator.js
Comment thread context/skills/self-driving/description.md
Comment thread context/skills/self-driving/references/2-read-context.md Outdated
Comment thread context/skills/self-driving/references/5-connected-tools.md
Comment thread context/skills/self-driving/references/7-report.md Outdated
Comment thread context/skills/self-driving/references/6b-tailor-scouts.md Outdated
Comment thread context/skills/self-driving/references/2-read-context.md Outdated
Comment thread context/skills/self-driving/references/5a-github.md Outdated
Comment thread context/skills/self-driving/references/7-report.md
@sortafreel sortafreel changed the title (DONT-REVIEW-YET) feat(self-driving): Self-driving setup skills for Wizard feat(self-driving): Self-driving setup skills for Wizard Jun 19, 2026

@gewenyu99 gewenyu99 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm.

Comment thread scripts/lib/skill-generator.js
Comment thread context/skills/self-driving/config.yaml
Comment thread context/skills/self-driving/references/7-report.md
Comment thread context/skills/self-driving/references/5a-github.md
@sortafreel sortafreel requested review from a team, Twixes and joshsny June 19, 2026 23:01
@sortafreel sortafreel merged commit 4aea45c into main Jun 22, 2026
13 checks passed
```
{
id: "github-connect",
prompt: "Self-driving needs GitHub access to investigate findings in your code and open fixes — setup can't finish without it.\n\nOpen this link to install the PostHog GitHub App in one click, then approve access. Grant it the repos you want Self-driving to work with — include this project's repo so step 5 can also watch its issues:\n\n<github authorize URL>\n\nThen come back here.\n\n(Need to re-link an existing installation instead? Use your integrations settings: <integrations settings URL>.)",

@Twixes Twixes Jun 22, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things about this:

  1. "Then come back here" - I did all of this, came back, and honestly didn't get for a minute that I need to click "Done" manually 😅 I'd expect this to be just like the desktop app - auto-detecting a callback, or polling.
    Relatedly, as a user, when I get a wall of text like this, I have an insatiable urge to press "Enter". Basically, at first sight, this feels like an EULA. We should make this feel more like a loading state, which is waiting until you've connected GH
  2. The <integrations settings URL> link wraps in a way that makes it useless - unable to click and unable to copy (<github authorize URL> is fine as its in its own line)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see you tackled problem 2 (link wrapping) in the Wizard PR – but this didn't seem to work for me in Cursor 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the re-link, noted the polling, will iterate.

```
{
id: "connected-tools",
prompt: "Self-driving can also watch your other tools and pull their issues into the inbox. Which of these do you use?",

@Twixes Twixes Jun 22, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I feel "<...> and pull their issues into the inbox" undersells it a bit – something like this would be more active framing: "Self-driving can also address problems surfaced in your other tools."
But I don't feel strongly here.

I do definitely believe an option like "I don't see a tool in this list" would provide really interesting data. Probs needs a wizard change to support that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. updated the copy.

As for custom options - yup, noted.

@Twixes Twixes left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments for your consideration

Comment on lines +23 to +24
1. Call `inbox-source-configs-list`.
2. **Success — including an empty list** — means the API is reachable: proceed. (The probe can't prove beta enrollment — the wizard's detect step and the beta flags own that — but it's the strongest signal available to you.) Keep the returned rows: step 2 and step 4 use them as the already-enabled baseline. Mark your access task completed and continue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but the endpoint powering inbox-source-configs-list isn't flagged. It's GET /api/projects/{team_id}/signals/source_configs/ and I don't think there's any reason why it wouldn't be reachable.

No reason, other than the scope not being available, but that would not mean "self-driving is not available for this project", that would means that something is wrong with authentication.

What's this whole step for then?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially it was about FFs, but we decided to drop all FFs. I still find it valuable, as it checks that MCP works, sources configs are pulled, and so on, before we do any edits. Plus, I 100% can imagine us adding new stuff that we would want FF in the future, so I'll keep it.

Comment on lines +22 to +23

1. **Read `./posthog-setup-report.md`.** It is ground truth for what the base integration instrumented **in this repo**: events, error tracking, feature flags. Do not re-derive what it already states. It is NOT authority over project-level facts — session replay in particular may be instrumented in another repo or via the snippet, so the report can rule replay in but never out (step 4 probes the server for that).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth clarifying that this won't be present if the codebase was set up with PostHog a while ago. (Or never)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, updated.


2. **Call `signals-scout-project-profile-get`.** It returns products in use, connected integrations, warehouse sources, and the signal source configs split enabled/disabled — one call instead of four. **Tolerate failure**: it can 404 or error on a team without a profile yet. If it fails, fall back to the step-1 source list and the report; do not retry more than once and do not abort. **Note "profile unavailable" in your checklist** — a profile 404 is expected on a first-run team, so any later decision that relies only on the profile must record "unknown", not a confident negative.

3. **Server-side product usage.** The run prompt's "Project state" block is authoritative for the opt-ins it lists (session replay recording, exception autocapture, surveys): **opt-in ON = product enabled**, even if no data has arrived yet. Where the block says OFF/unknown and the repo gave no signal, spend ONE cheap probe each for usage evidence (tolerate 403/404 → record "unknown"):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm being a bad LLM here:
What is the "run prompt" referring to here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each wizard run has a prompt (prompt.ts) that decides the order of steps and skill to follow that we use.

Comment on lines +32 to +39
4. **Light scan for what the report, profile, and server state won't cover.** Targeted lookups only — package manifests, config files, a grep or two. You are answering these questions:
- **Revenue**: is there a payment SDK (Stripe, Paddle, LemonSqueezy, RevenueCat…) or revenue events?
- **Surveys**: does the code or profile show PostHog surveys in use?
- **AI/LLM**: are there `$ai_*` events, an LLM SDK, or LLM analytics in the profile?
- **Logs**: is the PostHog logs product in use (per the profile)?
- **CSP**: is a Content-Security-Policy with PostHog CSP reporting configured?
- **Support**: does the team use PostHog support/conversations (per the profile)?
- **Issue trackers**: any hints of Linear, Zendesk, or pganalyze (you will still ask in step 5 — hints only shape the question, they never authorize enabling).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think error tracking, replay, and analytics are all worth including in the code scan as well, as you might e.g. not yet have errors in PostHog, but indeed have error tracking hooked up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, not sure what "analytics" means (general scout should cover regular analytics), but added bias toward error tracking/replay.


Load via `ToolSearch select:mcp__posthog-wizard__integrations-github-repos-retrieve,mcp__posthog-wizard__external-data-sources-create`.

If `integrations-github-repos-retrieve` or `external-data-sources-create` isn't available (older server), skip the auto-create and record GitHub Issues as a dormant source (the dormant fallback below). **Not an abort.**

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we won't need to worry about an older server here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, removed

next_step: 6b-tailor-scouts.md
---

# Step 6 — Configure the scout fleet

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now a scout troop in the UI! (that's also what we'll go with marketing-wise, like hedgehog boyscouts troop, literally)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, renamed.


1. **Materialize**: call `signals-scout-config-sync`. It is idempotent — it seeds the canonical scout skills for this team and creates any missing configs, then returns the fleet.

**Soft-degrade if the tool is missing or fails**: fall back to `signals-scout-config-list`. If that returns rows, tune those. If it returns nothing, the fleet hasn't been materialized yet — record a follow-up ("the scout fleet materializes automatically within ~30 minutes; tune it later in PostHog or re-run this setup") and continue to step 7. **Not an abort.**

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting, does it really happen that you have to wait 30 minutes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You kinda do. Quoting from other PR:

Scout runs immediately, the sources should fire right away too, but it'll take 30 minutes or so for "get signal -> emit signal -> build repo cache -> pick the repo -> research the signal -> return the report". We can say 20 if you want.

- `llm_analytics` (internal-only, not a user-facing responder)
- `logs` (not a v1 responder)
- Anything with `source_type` `evaluation` or `alert_state_change`
- The connected-tool sources (`github`, `linear`, `zendesk`, `pganalyze`) — those are step 5, ask-first.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking we should def also ask them about non-signal sources, but ones we can connect via MCP, and are context-rich. E.g. Notion is a prime suspect here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, surely need to be one of the next iterations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants