diff --git a/transformation-config/skills/omnibus/instrument-for-experiments/config.yaml b/transformation-config/skills/omnibus/instrument-for-experiments/config.yaml new file mode 100644 index 00000000..08afcb92 --- /dev/null +++ b/transformation-config/skills/omnibus/instrument-for-experiments/config.yaml @@ -0,0 +1,24 @@ +# Experiment-driven instrumentation skill. +# +# Unlike `instrument-product-analytics`, which casts a wide net (10–15 events of +# "business value"), this skill scopes instrumentation to the events one specific +# experiment needs to be measurable: hypothesis → primary metric → secondary +# metrics → capture only the gaps. Hands off to experiment creation once events +# are flowing. +type: docs-only +template: description.md +category: experiments +description: Instrument PostHog with one specific experiment in mind. Use when the user wants to A/B test a change but does not yet have the events needed to measure it. Forces a hypothesis, one primary metric, and a short list of secondary metrics, then captures only the gaps. Routes to instrument-product-analytics for general coverage and to experiment creation once events flow. +tags: [experiments, instrumentation] +shared_docs: + - https://posthog.com/docs/experiments/metrics.md + - https://posthog.com/docs/experiments/best-practices.md + - https://posthog.com/docs/experiments/exposures.md + - https://posthog.com/docs/experiments/installation.md + - https://posthog.com/docs/product-analytics/capture-events.md + - https://posthog.com/docs/getting-started/identify-users.md +variants: + - id: all + display_name: all supported frameworks + tags: [] + docs_urls: [] diff --git a/transformation-config/skills/omnibus/instrument-for-experiments/description.md b/transformation-config/skills/omnibus/instrument-for-experiments/description.md new file mode 100644 index 00000000..c2e0a4cf --- /dev/null +++ b/transformation-config/skills/omnibus/instrument-for-experiments/description.md @@ -0,0 +1,120 @@ +# Instrument PostHog for a specific experiment + +Use this skill when the user wants to run an A/B test but has not yet captured the events needed to measure it. The goal is not blanket coverage — it is the minimum instrumentation that lets one experiment answer one question. + +This skill is intentionally opinionated: + +- **One hypothesis at a time.** Refuse to instrument "in general" — anchor every event to the experiment being scoped. +- **Exactly one primary metric.** Picking two is the most common mistake. If the user names two, force a choice. +- **Few secondary metrics.** 1–3 supporting signals that explain *why* the primary moved. They are not conclusions on their own. +- **Capture what you'll measure, nothing else.** Do not propose the standard 10–15-event sweep from `instrument-product-analytics`. Stay scoped. + +If the user wants generic product analytics coverage, route them to `instrument-product-analytics` instead. If they already have events and just want to create the experiment, route them to the experiment creation flow in the PostHog MCP. + +## Instructions + +Follow these steps IN ORDER. The order is the entire point — do not skip ahead. + +STEP 1: Detect platform and existing PostHog setup. + - Inspect dependency files (package.json, requirements.txt, pyproject.toml, Gemfile, composer.json, go.mod, etc.) to identify framework and language. + - Check for existing PostHog SDK installation and initialization. + - If multiple platforms are present (e.g. a Python backend AND a JS frontend), **ask the user which side to instrument first** rather than guessing. The experiment will usually be measured primarily on one side. + +STEP 2: State the hypothesis. (BLOCKING — do not proceed without it.) + - Before any code or any event is named, get the user to write the hypothesis in a single testable sentence. A good hypothesis has three parts: + > "If we **[change]**, then **[primary metric]** will **[direction + rough size]**, because **[why]**." + - Examples: + - "If we replace the empty-state CTA with a guided template picker, then `workflow_created` within 7 days of signup will increase by ~10%, because templates remove the cold-start problem." + - "If we add a 'duplicate node' shortcut, then `workflow_published` per active user will increase, because faster iteration produces more shippable workflows." + - If the user gives you something vaguer ("we want to test the new onboarding"), push back once: ask what specific user behavior should change and how you would know. Then proceed with their refined answer. + - Capture the hypothesis verbatim. It is the load-bearing artifact for everything below. + +STEP 3: Pick exactly ONE primary metric. + - The primary metric is the single number the user will let decide ship / don't ship. Most teams pick two and end up with neither having enough statistical power. + - Rules: + - **One event, or one ratio.** Either "% of exposed users who do X", "X count per user", or "X / Y per user". Not all of them. + - **Measurable in a reasonable window.** If the natural answer needs 30+ days, the experiment is not viable as-scoped. Push for an earlier proxy. + - **Actually moveable by the change.** If the change touches signup but the metric is 90-day retention, the loop is too long for one experiment. + - Map the picked metric to one of the four PostHog experiment metric types (this matters for STEP 5 verification and STEP 9 handoff): + + | Natural-language phrasing | Metric type | Notes | + | ------------------------------------------------------------ | ----------- | ---------------------------------------------------------- | + | "% of exposed users who did X" | `funnel` | Exposure → X. Most common shape for activation-style tests.| + | "X count per user" / "revenue per user" / "minutes per user" | `mean` | Average of an event count or numeric property per user. | + | "X per Y" (e.g. revenue per pageview) | `ratio` | Two events, both needed. | + | "Do users come back and do Y after exposure?" | `retention` | Use sparingly — needs a long enough window. | + + - Write the chosen metric as `[metric_type]: [event] [+ property if mean/ratio]`. This becomes the spec for STEPs 5 and 7. + +STEP 4: Pick 1–3 secondary metrics. + - Secondary metrics explain the primary's movement. They are NOT used to declare victory on their own. + - Two flavors to look for: + 1. **Counterbalancing / guardrail metrics.** Things that should NOT regress (e.g. "time-to-first-workflow", "support tickets opened"). If the primary goes up but a guardrail tanks, the change did not actually win. + 2. **Mechanism metrics.** The step between exposure and primary that explains *how* the change worked. If the primary is `workflow_published`, a mechanism might be `template_selected` — it tells the user whether the change worked the way they expected. + - Cap at 3. More than 3 secondary metrics is a sign the user is hedging on the primary — push them back to STEP 3 instead. + +STEP 5: Inventory what is already captured. + - Use the PostHog MCP `read-data-schema` tool with `kind: events` to list events already in the project. For each metric named in STEPs 3 and 4, check: + - Does the exact event already exist? + - Does an event with the same intent exist under a different name? + - Are the required properties already on that event (for `mean` and `ratio` metrics)? + - Report back to the user, per metric: + - ✅ Already captured — no instrumentation needed. + - 🟡 Similar event exists — confirm whether to reuse or add a new one. + - ❌ Missing — instrument it in STEP 7. + - If everything is already captured, skip to STEP 8. + +STEP 6: Install and initialize PostHog. (Skip if PostHog is already set up.) + - Follow the framework-specific install and init pattern from `instrument-product-analytics`. The setup itself is the same; only the choice of *what* to capture is different in this skill. + - For server-side experiments, the server SDK must also be installed — the exposure event (`$feature_flag_called`) needs to be capturable wherever the variant is evaluated. + +STEP 7: Capture only the planned events. + - For each missing event from STEP 5: + - Locate the single code path where the event would fire. (For a `workflow_published` event, find the publish handler, not every place that touches workflows.) + - Add one `posthog.capture()` call with the event name and the properties required by the metric (e.g. `amount` for revenue, `template_id` for breakdowns). + - Pass through `distinct_id` from whatever identity mechanism the project uses. If users are logged in, identify them on login (see `identify-users.md` in the reference list). + - Do **not** add capture calls for events outside the planned metric list. If a tempting "while we're here" event surfaces, log it as a follow-up — not in this change. + - You must read a file immediately before attempting to write it. Do not alter the fundamental architecture of existing files. Make additions minimal and targeted. + +STEP 8: Build the verification insight per metric. + - Before the user creates the experiment, they need to see events arriving. Use the PostHog MCP to create one insight per metric so they can watch the live data: + - **Funnel metric** → `query-funnel` with the exposure event followed by the conversion step. + - **Mean metric** → `query-trends` of the event count or summed property, per user. + - **Ratio metric** → `query-trends` with a formula for numerator / denominator. + - **Retention metric** → `query-retention` with the start and return events. + - Save the insights. Optionally roll them into a small dashboard named after the hypothesis (e.g. "Template picker experiment — readiness"). + - Tell the user: "Once you see real numbers on these tiles for at least a day or two, you're ready to create the experiment." + +STEP 9: Hand off to experiment creation. + - Once events flow, the user is ready. Point them at the PostHog MCP's experiment creation flow with the following pre-filled context to pass forward: + - Hypothesis (from STEP 2 — copy verbatim into the experiment `description`) + - Primary metric (event + metric_type from STEP 3) + - Secondary metrics (from STEP 4) + - Feature flag key (kebab-case, derived from the hypothesis) + - Do NOT call `experiment-create` directly from this skill — the dedicated experiment creation flow covers rollout, variant split, and the draft-first pattern, and is the right surface for it. + +STEP 10: Verify and clean up. + - Run the project's type-check / build / lint scripts (look in package.json or framework equivalents). + - Confirm that the capture calls you added are reachable from the code paths you expect to instrument the experiment on. + +## What this skill explicitly does NOT do + +- Blanket-instrument 10–15 files of "business value" — that's `instrument-product-analytics`. +- Decide rollout percentages or variant splits — that's part of the experiment creation flow. +- Create the experiment object itself — that's the experiment creation flow. +- Configure exposure criteria or `allow_unknown_events` — that's the analytics-side experiment configuration. + +If the user's actual question is one of those, route them. + +## Reference files + +{references} + +## Key principles + +- **Anchor every event to a metric.** If you cannot say which metric an event powers, do not add it. +- **One primary, always.** Two primaries means no decision rule. +- **Verify before you experiment.** A day of live data on the insight beats a week of debugging an empty experiment. +- **Stay scoped.** The customer asked for an experiment, not for a product-analytics audit. +- **Environment variables**: Always use environment variables for PostHog keys. Never hardcode them. +- **Minimal changes**: Add PostHog code alongside existing integrations. Don't replace or restructure existing code.