feat(ingestion): verify events captured with the secret API key#62888
feat(ingestion): verify events captured with the secret API key#62888Gilbert09 wants to merge 9 commits into
Conversation
👀 Auto-assigned reviewersThese soft owners were skipped because they only have minor changes here. Nothing blocks merge, so self-assign if you'd like a look:
Soft owners come from |
|
Size Change: +391 kB (+0.61%) Total Size: 64.4 MB 📦 View Changed
ℹ️ View Unchanged
|
|
Reviews (1): Last reviewed commit: "feat(flags): authenticate /flags request..." | Re-trigger Greptile |
PR overviewAll previously flagged issues have been addressed. No open security concerns remain on this pull request. Security reviewNo open security issues remain on this pull request. Fixed/addressed: 3 · PR risk: 0/10 |
|
Sorry, just noticed this. Will take a look from the feature flags side. |
haacked
left a comment
There was a problem hiding this comment.
Looks good from the feature flags side. Just some non-blocking suggestions.
🔒 Security review notes (via Claude)I reviewed this PR against its core invariant — 1.
|
|
👋 hey folks
Looks like this isn't an immediate blocker to this PR (no capture change yet) but FYI: Just to confirm - at the moment no If we opt to move quota checks entirely downstream, we will ship those events (and future ones...sometimes this can really spike) into the topic just to drop them downstream. Not a hard blocker, but something to consider |
992e917 to
c877c2d
Compare
🕸️ Eager graphHow much code each root forces the browser to download and decode through static imports — the regression class total bundle size can't see.
✅ Largest files eagerly reachable from
|
| Size | File |
|---|---|
| 899.9 KiB | src/styles/global.scss |
| 609.0 KiB | public/hedgehog/burning-money-hog.png |
| 541.9 KiB | public/hedgehog/waving-hog.png |
| 448.2 KiB | public/hedgehog/stop-sign-hog.png |
| 362.0 KiB | public/hedgehog/phone-pair-hogs.png |
| 355.9 KiB | ../node_modules/.pnpm/@posthog+icons@0.37.3_react-dom@18.3.1_react@18.3.1__react@18.3.1/node_modules/@posthog/icons/dist/posthog-icons.es.js |
| 343.5 KiB | src/taxonomy/core-filter-definitions-by-group.json |
| 335.6 KiB | public/hedgehog/desk-hog.png |
| 323.2 KiB | public/hedgehog/3-bears-hogs.png |
| 298.6 KiB | src/lib/api.ts |
Largest files eagerly reachable from src/scenes/AuthenticatedShell.tsx
| Size | File |
|---|---|
| 899.9 KiB | src/styles/global.scss |
| 764.5 KiB | src/queries/validators.js |
| 609.0 KiB | public/hedgehog/burning-money-hog.png |
| 541.9 KiB | public/hedgehog/waving-hog.png |
| 448.2 KiB | public/hedgehog/stop-sign-hog.png |
| 362.0 KiB | public/hedgehog/phone-pair-hogs.png |
| 355.9 KiB | ../node_modules/.pnpm/@posthog+icons@0.37.3_react-dom@18.3.1_react@18.3.1__react@18.3.1/node_modules/@posthog/icons/dist/posthog-icons.es.js |
| 343.5 KiB | src/taxonomy/core-filter-definitions-by-group.json |
| 335.6 KiB | public/hedgehog/desk-hog.png |
| 323.2 KiB | public/hedgehog/3-bears-hogs.png |
Posted automatically by check-eager-graph · sizes are input-source bytes from the esbuild metafile · part of #32479
|
👋 Visual changes detected for this PR. Review and approve in PostHog Visual Review If these changes are unexpected, they may be caused by a flaky test or a broken snapshot on master. Don't approve — rerun the job or wait for a fix. |
|
Addressed in 49816d9: |
CI statusThe branch is up to date with The only remaining red check is Product tests (data-warehouse, workflows) (and the Django Tests Pass gate that depends on it). This is a pre-existing, master-wide failure unrelated to this PR — this PR touches no Evidence:
This needs the master CI Docker Hub auth issue to be resolved (infra-side). Once master is green again, re-running / re-merging this branch will clear the check. Everything within this PR's control is green. |
Capture enforces quota limits by the raw ingestion token, so a team sending events with its phs_ secret API token would bypass billing limits keyed only on the public api_token. Write every ingestion token (api_token, secret_api_token, secret_api_token_backup) to the quota-limit and quota-limiting-suspended zsets; change detection stays keyed on the always-present public token.
Events captured with a team's secret API token (phs_, primary or backup) now resolve to the team during ingestion instead of being dropped as invalid_token, and get a server-set $verified: true property. $verified is server-controlled: any client-supplied value on public-token events is stripped, so it cannot be forged. Applied in the shared resolve-team step so both the analytics and error-tracking pipelines (and the heatmap subpipeline) are covered. Also registers $verified in the taxonomy so it renders in the UI.
… token An SDK configured with the phs_ secret API token as its api_key (for verified event capture) would otherwise lose feature flags: the team metadata HyperCache and PG lookup are keyed by the public token only, so phs_ tokens landed in the negative cache and 401'd. Branch on the phs_ prefix and reuse the cached secret-token validation already used by /flag_definitions, resolving the team via its public token (or by id for legacy cache entries).
/flags now accepts a phs_ secret token as api_key, so the decoded request body can carry one. The body logger recorded the decoded body verbatim for opted-in teams, which would let a log reader recover the secret. Redact phs_-prefixed tokens before logging; public phc_ tokens are world-readable by design and left intact. Generated-By: PostHog Code Task-Id: c79724ba-e6b2-4100-ab76-9e05351b46bf
Generated-By: PostHog Code Task-Id: c79724ba-e6b2-4100-ab76-9e05351b46bf
The scheduled quota refresh already writes every ingestion token (public + secret) to Redis, but the manual limit/unlimit and token-change paths did not, so they could drift from the quota contract this PR introduces. - Route `Organization.limit_product_until_end_of_billing_cycle` / `unlimit_product` through `get_team_ingestion_tokens` so secret tokens are limited/unlimited alongside the public token. - Add `sync_team_quota_limited_tokens`, a shared helper that re-points a team's active quota limits onto its current ingestion tokens, and call it from the public token reset and the secret token rotation/backup-deletion paths. This closes the gap where a newly rotated token could ingest freely until the next scheduled refresh. Generated-By: PostHog Code Task-Id: a7bc377f-4621-42b8-846e-fdbec703e2b7
- quota_limiting: re-chunk the flat zscore pipeline results into a per-zset map in one place so the slice windows can't drift out of sync with the build loop. - team model: resolve the secret-token backup slot on the Django auth path too, mirroring the ingestion (TeamManager) and /flags resolvers, so a rotated-out token keeps authenticating during the grace period; add a backup-token auth test. - feature-flags body_logger: add a log_response test that feeds a phs_ body and asserts the emitted line is redacted, guarding the call-site redaction. - team-manager: add a test asserting a rotated-out secret token stops resolving once the cache refreshes. Generated-By: PostHog Code Task-Id: 68621483-6e60-4d42-8f26-88533fd62ad4
A `/flags` caller can send a valid secret token as JSON unicode escapes (`"api_key":"phs_..."`). serde decodes it for authentication, but the raw-text regex never sees the literal `phs_` prefix, so the body log stored a recoverable representation. Redact structurally when the body parses as JSON: walk the decoded values, redact any string carrying a `phs_` token, and re-serialize, then run the regex over the result as a backstop. Non-JSON bodies fall back to the raw-text regex as before. Adds a test feeding a `\u`-escaped token. Generated-By: PostHog Code Task-Id: 68621483-6e60-4d42-8f26-88533fd62ad4
A sender using the public project token could smuggle a forged "$verified" person property through $set/$set_once/$unset, since the previous code only stripped the top-level properties.$verified on non-secret-token events. Scrub $verified from those person-property operations too, so the server-controlled "Verified" marker cannot be forged without the secret API token. Generated-By: PostHog Code Task-Id: d0e82832-8e45-4e3a-b495-f393aee5649e
059b901 to
543b294
Compare
|
🎭 Playwright report
These issues are not necessarily caused by your changes. |
Problem
Server-side SDKs authenticate capture with the public project API key (
phc_), which is world-readable by design — so nothing distinguishes an event genuinely sent by trusted server infrastructure from one forged by anyone holding the public key. For workflows that act automatically on ingested events (e.g. error-tracking automations), we need a way to know an event came from a holder of a server-side secret.An earlier approach signed individual
$exceptionevents with per-team Ed25519 keys (#62750, #62751, posthog-python#657). This PR supersedes that with a much simpler transport-level mechanism: send events with the project's secret API token (phs_, already used for flags local evaluation and the conversations API), and every event ingested that way is marked verified.Changes
TeamManagernow also resolves teams bysecret_api_tokenandsecret_api_token_backup(so token rotation keeps working), and the shared resolve-team step sets a server-controlled$verified: trueproperty on events captured with a secret token. Any client-supplied$verifiedon public-token events is stripped, so the property cannot be forged. Covers the analytics, error-tracking, and heatmap pipelines; a counter metric tracks verified/stripped events.phs_capture would bypass billing limits./flagsrequests now accept aphs_token as theapi_key(reusing the existing cached secret-token validation used by/flag_definitions), so an SDK configured with the secret key keeps evaluating flags. Unknownphs_tokens 401 without polluting the public-token negative cache.$verifiedregistered as a boolean event property.No SDK changes are required — server SDKs send
api_keyverbatim, so settingapi_key="phs_..."works as-is. Capture already acceptsphs_tokens at the edge (format-only validation) and the token is not persisted to ClickHouse.Everything is inert until a team generates a secret token (existing rotate endpoint) and points an SDK at it; teams with
NULLsecret tokens see byte-identical behaviour. No migrations — unique indexes on both secret-token columns already exist.Notes for reviewers:
phc_andphs_as separate keys — accepted.phs_token too to fully block a team.phs_tokens (its team service is public-token-only; server SDKs don't do replay).How did you test this code?
nodejs: new parameterized unit tests forapplyVerifiedPropertyand the resolve-team step (secret/backup/forged/no-properties/null-secret cases), team-manager integration tests for secret-token resolution and cache warming, and two end-to-end ingestion tests (phs_capture lands in ClickHouse with$verified: true; forged$verifiedviaphc_is stripped). Fullevent-preprocessingand error-tracking pipeline suites pass.ee/billing: parameterized pytest coverage over no-secret / primary-only / primary+backup token sets forupdate_org_billing_quotas,update_all_orgs_billing_quotas, and the token helpers; fulltest_quota_limiting.pysuite passes (74 tests).rust/feature-flags: integration tests for/flagsauthenticated with the primary and backup secret token (200 + flags evaluated) and an unknownphs_token (401 with Django-compatible error body).Automatic notifications
Docs update