Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions context/skills/self-driving/description.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Each step file points to the next. Run them in order. **Start by reading `refere
- **Never disable a source the user already enabled.** You only switch things on (and tune scouts off); existing enabled rows are someone's deliberate choice.
- **Never enable a connected-tool source the user hasn't confirmed they use.** GitHub Issues, Linear, Zendesk, and pganalyze are ask-then-connect, never blind.
- **Stay off the internal surfaces.** Don't call `signals-scout-emit-signal` or any scratchpad-write tool, and don't change a scout's `emit` flag or `run_interval_minutes` — on configs, this skill only flips `enabled`. **Canonical scout bodies are never edited.** New scout skills are created in exactly one place: step 6b, and only ones the user approved there.
- **Keep the scout fleet small.** Every enabled scout is a recurring LLM spend. Step 6 enables only `signals-scout-general` plus the **one or two** specialists for the products this project uses most — never error tracking or session replay (those reach the inbox as native sources) — and step 6b adds **at most two** custom scouts. Everything else stays disabled.
- **Batch your questions.** `wizard_ask` has a small per-run budget; one multi-select beats four yes/nos. Don't skip a step or drop a connector (e.g. Linear) or custom scouts setup to save calls.
- **Decline goes first.** Every `wizard_ask` that offers choices must include a plain-language decline option (skip / none / "keep what's there"), and it must be the **first** option so it is the default highlight — an accidental `enter` then declines instead of committing the user to something. The **one exception is step 3's GitHub gate**: the run cannot proceed without GitHub, so there the affirmative ("Done — I've installed it") stays first and the decline ("I can't connect right now", which aborts) stays last.

Expand Down
6 changes: 3 additions & 3 deletions context/skills/self-driving/references/2-read-context.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Load via `ToolSearch select:Read,Glob,Grep,mcp__posthog-wizard__signals-scout-pr

1. **Read `./posthog-setup-report.md`.** It is ground truth for what the base integration instrumented **in this repo**: events, error tracking, feature flags. Do not re-derive what it already states. It is NOT authority over project-level facts — session replay in particular may be instrumented in another repo or via the snippet, so the report can rule replay in but never out (step 4 probes the server for that).

2. **Call `signals-scout-project-profile-get`.** It returns products in use, connected integrations, warehouse sources, and the signal source configs split enabled/disabled — one call instead of four. **Tolerate failure**: it can 404 or error on a team without a profile yet. If it fails, fall back to the step-1 source list and the report; do not retry more than once and do not abort. **Note "profile unavailable" in your checklist** — a profile 404 is expected on a first-run team, so any later decision that relies only on the profile must record "unknown", not a confident negative.
2. **Call `signals-scout-project-profile-get`.** It returns products in use, connected integrations, warehouse sources, and the signal source configs split enabled/disabled — one call instead of four. It also carries **relative usage magnitude**: `top_events` (per-event count + distinct users), `recent_activity` (edits per scope), and per-entity active counts (feature flags, experiments, surveys, dashboards). Capture a rough sense of **which products this project uses most** — step 6 enables a scout only for the one or two most-used products, so a usage ranking matters, not just a binary in/out. **Tolerate failure**: it can 404 or error on a team without a profile yet. If it fails, fall back to the step-1 source list and the report; do not retry more than once and do not abort. **Note "profile unavailable" in your checklist** — a profile 404 is expected on a first-run team, so any later decision that relies only on the profile must record "unknown", not a confident negative.

3. **Server-side product usage.** The run prompt's "Project state" block is authoritative for the opt-ins it lists (session replay recording, exception autocapture, surveys): **opt-in ON = product enabled**, even if no data has arrived yet. Where the block says OFF/unknown and the repo gave no signal, spend ONE cheap probe each for usage evidence (tolerate 403/404 → record "unknown"):
- `query-session-recordings-list` — any recording → replay in use
Expand All @@ -38,6 +38,6 @@ Load via `ToolSearch select:Read,Glob,Grep,mcp__posthog-wizard__signals-scout-pr
- **Support**: does the team use PostHog support/conversations (per the profile)?
- **Issue trackers**: any hints of Linear, Zendesk, or pganalyze (you will still ask in step 5 — hints only shape the question, they never authorize enabling).

Do NOT crawl the whole source tree. If a question can't be answered cheaply, record "unknown" and move on — unknowns default to asking the user about sources; for surface-specific scouts, an unconfirmed surface is not justification to keep them on (step 6 disables them without evidence).
Do NOT crawl the whole source tree. If a question can't be answered cheaply, record "unknown" and move on — unknowns default to asking the user about sources; for scouts, an unconfirmed surface won't rank among the most-used products, so step 6 won't enable its scout.

5. **Write down your working checklist** (in your own notes, not a file): candidate native sources, candidate connected tools, candidate scout disables, GitHub status if the profile revealed it. Steps 4–6 consume this.
5. **Write down your working checklist** (in your own notes, not a file): candidate native sources, candidate connected tools, which products this project uses most (drives step 6's pick: `general` + the 1–2 most-used specialists), GitHub status if the profile revealed it. Steps 4–6 consume this.
47 changes: 24 additions & 23 deletions context/skills/self-driving/references/6-scouts.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ next_step: 6b-tailor-scouts.md

# Step 6 — Configure the scout fleet

Scouts are the pull side of Signals: scheduled agents that scan the project on an interval and emit findings as `signals_scout` / `cross_source_issue` signals (which step 4's scout gate lets into the inbox). Materialize the fleet, then switch off the scouts whose product surface this project doesn't have.
Scouts are the pull side of Signals: scheduled agents that scan the project on an interval and emit findings as `signals_scout` / `cross_source_issue` signals (which step 4's scout gate lets into the inbox). Every enabled scout is a recurring LLM spend — it costs a full run every tick even when it finds nothing — so the fleet is kept **deliberately small**: the `general` scout, plus the **one or two specialists** for the products this project uses most. Everything else is disabled.

## Status

Expand All @@ -24,47 +24,48 @@ Load via `ToolSearch select:mcp__posthog-wizard__signals-scout-config-sync,mcp__

**Soft-degrade if the tool is missing or fails**: fall back to `signals-scout-config-list`. If that returns rows, tune those. If it returns nothing, the fleet hasn't been materialized yet — record a follow-up ("the scout fleet materializes automatically within ~30 minutes; tune it later in PostHog or re-run this setup") and continue to step 7. **Not an abort.**

2. **Tune — classify every scout the sync returned; don't assume a fixed list.** The fleet is seeded from posthog and grows over time (it's ~19 scouts today), so always work from the rows `signals-scout-config-sync` actually returned, not a hardcoded set. For each scout, read its name/description and ask **"does this project have the surface this scout watches?"** — that sorts it into one of two buckets:
2. **Decide the enabled set — the whole point of this step is to enable FEW scouts, not many.** Work from the rows `signals-scout-config-sync` actually returned (the fleet grows over time — ~19 scouts today — so never hardcode a list). The enabled set has exactly three parts:

**Always-on (cross-product).** Its surface is "any project with data," so it self-closes cheaply when there's nothing to say. Keep enabled. Examples (illustrative, not exhaustive):
**(a) `general` — always enabled.** `signals-scout-general` watches cross-product correlations and the surfaces no specialist covers; it self-closes cheaply when there's nothing to say. Keep it on for every project.

- `signals-scout-general` — cross-product correlations and uncovered surfaces
- `signals-scout-anomaly-detection` — anomalies in whatever time series exist
- `signals-scout-observability-gaps` — events with no insight coverage
- `signals-scout-health-checks` — PostHog setup health
- `signals-scout-inbox-validation` — whether shipped fixes actually held
**(b) Never enable the `error-tracking` or `session-replay` scouts.** Step 4 already enables error tracking and session replay as native **sources** — their findings reach the inbox through that pipeline, so a scout on the same surface only duplicates it. Disable `signals-scout-error-tracking` and `signals-scout-session-replay` unconditionally, regardless of evidence. This is an **intentional** exclusion, not an evidence gap, so do **not** record a re-enable follow-up for them — note them as "covered by the native source".

**Surface-specific (conditional).** Tied to a product or surface a project may not have. **Enable ONLY when step 2 found positive evidence the surface is in use** — evidence on EITHER side counts: the repo scan OR the server-side state (project-state opt-ins and usage probes). A product enabled at the project level is evidence even when this repo shows nothing. No evidence → disable. Examples of surface → evidence (illustrative, not exhaustive):
**(c) One or two specialists — for the products this project uses MOST.** This is a judgment call, not a checklist: weigh ALL the step-2 evidence together — the profile's `top_events` (volume + distinct users), recent activity, the active counts for feature flags / experiments / surveys / dashboards, plus any repo signals — and pick the **one or two** product surfaces that are most actually used, then enable each one's scout. The candidate pool is the entire fleet **except** `general` and the two excluded in (b); it includes both the surface-specific scouts and the remaining cross-product ones:

| Scout | Enable only with evidence of |
| Scout | Specialist for |
|---|---|
| `signals-scout-error-tracking` | error tracking in use — exception autocapture ON, error issues exist, or the repo instruments it (the same evidence step 4 uses for the error-tracking source) |
| `signals-scout-session-replay` | session recording enabled (opt-in ON or recordings exist) |
| `signals-scout-product-analytics` | funnels / retention / lifecycle insights or product events in use |
| `signals-scout-product-analytics` | funnels / retention / lifecycle insights or heavy product-event usage |
| `signals-scout-web-analytics` | web traffic / pageviews with referrer or UTM tracking |
| `signals-scout-feature-flags` | feature flags in use (frontend or backend) |
| `signals-scout-surveys` | surveys opt-in ON or surveys found (step 2) |
| `signals-scout-feature-flags` | feature flags in active use (frontend or backend) |
| `signals-scout-surveys` | surveys in use |
| `signals-scout-revenue-analytics` | a payment SDK / revenue data |
| `signals-scout-ai-observability` | `$ai_*` events / LLM usage |
| `signals-scout-logs` | the PostHog logs product in use |
| `signals-scout-csp-violations` | CSP reporting configured |
| `signals-scout-experiments` | active A/B experiments |
| `signals-scout-customer-analytics` | group / accounts analytics (B2B), not a pure B2C app |
| `signals-scout-customer-analytics` | group / accounts analytics (B2B) |
| `signals-scout-data-pipelines` | CDP destinations, batch exports, or hog flows |
| `signals-scout-replay-vision` | Replay Vision scanners configured |
| `signals-scout-anomaly-detection` | (cross-product) anomalies in whatever time series exist |
| `signals-scout-observability-gaps` | (cross-product) events with no insight coverage |
| `signals-scout-health-checks` | (cross-product) PostHog setup health |
| `signals-scout-inbox-validation` | (cross-product) whether shipped fixes actually held |

**A scout neither list names** (posthog keeps adding them): classify it by the same question — read its description and decide whether its surface is product-agnostic (→ always-on) or tied to a surface you must confirm (→ conditional, evidence required). When unsure whether a surface-specific scout's surface exists, treat that as no evidence.
Rules for the pick:
- **At most two.** Even if three or more surfaces look used, keep only the two most-used. Enabling more re-creates the cost problem this step exists to prevent.
- **At least one.** Always end with a specialist enabled. If no product surface clearly stands out — e.g. the only products in use are error tracking / session replay (excluded in (b)), or the profile was unavailable and nothing is rankable — **fall back to one universal cross-product scout** (`signals-scout-anomaly-detection` or `signals-scout-health-checks`) as the stand-in. Avoid `signals-scout-inbox-validation` as the fallback on a fresh setup — there are no shipped fixes for it to validate yet.
- **A scout the table doesn't name** (posthog keeps adding them): treat it as a specialist candidate — read its description, judge whether its surface is among this project's most-used, and enable it only if it earns one of the ≤2 slots.

**"Unknown" is not evidence → disable the scout.** Unlike a dormant warehouse responder (gated on a sync, so it never fires for free), a scout runs on its schedule and costs a full LLM run every tick even when it finds nothing — so never pay for a surface you can't confirm exists. For every conditional scout you disable, record a re-enable follow-up so the user can switch it on if they do use that surface (e.g. "enable `signals-scout-logs` in PostHog if you use the logs product").
3. **Disable every scout you did NOT enable** in (a)–(c) — this is now most of the fleet. Disable via `signals-scout-config-update` with the config `id` and `{ enabled: false }` — **nothing else**. Don't touch `emit` (dry-run posture) or `run_interval_minutes`; the defaults are correct. A failed update is a follow-up, not an abort.

3. Disable via `signals-scout-config-update` with the config `id` and `{ enabled: false }` — **nothing else**. Don't touch `emit` (dry-run posture) or `run_interval_minutes`; defaults are correct for a fresh fleet. A failed update is a follow-up, not an abort.
For each **surface-specific** scout you disabled, record a re-enable follow-up so the user can switch it on if they do use that surface later (e.g. "enable `signals-scout-logs` in PostHog if you use the logs product"). The error-tracking / session-replay disables are intentional (see (b)) — note them as "covered by the native source", not as a re-enable follow-up.

4. **Show the result.** This step asks the user nothing, so the only in-run visibility is the status line — after tuning, emit one with the outcome (short scout names, no `signals-scout-` prefix):
4. **Show the result.** This step asks the user nothing, so the only in-run visibility is the status line — after tuning, emit one naming the enabled set (short names, no `signals-scout-` prefix):

```
[STATUS] Scout fleet: 12 active, disabled: ai-observability, revenue-analytics, logs, csp-violations, customer-analytics, data-pipelines, experiments, replay-vision
[STATUS] Scout fleet: 3 active (general, product-analytics, feature-flags); 16 disabled
```

(Adjust counts and names to the actual fleet the sync returned and the decisions you made — fleet size varies as posthog adds scouts. If nothing was disabled, say "N active, none disabled".)
(Adjust counts and names to the actual fleet and your decisions — the enabled set is always `general` + the 1–2 specialists, so "2 active" or "3 active" is expected; error-tracking and session-replay are deliberately among the disabled.)

Fresh configs have never run, so they're due immediately — the first scans fire on the next coordinator tick, within ~30 minutes. Record per-scout decisions (kept / disabled + why) for the report.
Fresh configs have never run, so they're due immediately — the first scans fire on the next coordinator tick, within ~30 minutes. Record per-scout decisions (enabled / disabled + why) for the report.
2 changes: 1 addition & 1 deletion context/skills/self-driving/references/6b-tailor-scouts.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Load via `ToolSearch select:mcp__posthog-wizard__llma-skill-get,mcp__posthog-wiz

2. **Do the gap analysis — this is the thinking step, take it seriously.** Lay the project evidence (the setup report's event taxonomy above all, plus the step-2 checklist: funnel structure, payment/LLM/survey surfaces, warehouse sources, integrations) against what the canonical fleet already watches. For each candidate surface ask, in order:
- **Is it watchable?** Concrete events with names you can list, a funnel with ordered steps, a domain loop with a success/failure pair. "It's a web app" is not a surface.
- **Is it uncovered?** A canonical scout that step 6 kept enabled may already own it — error bursts belong to `signals-scout-error-tracking`, generic anomalies to `signals-scout-anomaly-detection`. A custom scout that duplicates an enabled canonical adds noise, not coverage.
- **Is it uncovered?** Three things can already own a surface, and a custom scout that duplicates any of them adds noise, not coverage: (1) a canonical scout step 6 kept enabled — the 1–2 specialists or `signals-scout-general`, which sweeps cross-product surfaces every run (e.g. generic anomalies belong to `signals-scout-anomaly-detection` if it was picked); (2) a **native source** — error tracking and session replay are consumed as sources in step 4, so never propose a custom scout for error bursts or replay analysis even though their canonical scouts are now disabled. If the surface is only watched by a canonical scout step 6 *disabled* (not `general`, not a native source), it is genuinely uncovered and fair game.
- **Would its scout pass the quality bar?** You must be able to name its signal-vs-noise discriminator and 2–4 concrete explore patterns *before* proposing it. If you can't, the surface isn't ready for a scout — record it as a report note instead.

Typical shapes that survive all three filters: the product's core funnel (creation → completion → conversion), a domain job pipeline with success/failure events, a critical third-party dependency the events expose (e.g. an external API search that can silently degrade). **Propose at most two custom scouts — never more, even if more surfaces look watchable.** Zero is a perfectly good outcome and one or two is the norm; if three or more look worthwhile, the filters were too loose — keep only the two highest-value ones and record the rest as report notes. Every scout is a recurring scheduled LLM spend — every tick costs a full run even when it's quiet — so each must earn its keep, and the hard cap also keeps the proposal readable in the terminal, where each scout needs room for its explanation.
Expand Down
Loading