feat: add two robust PostHog PII rules#33
Conversation
Extends PII coverage beyond the single posthog_pii_in_capture_call rule
(which matches PII by key name in capture/identify):
- posthog_pii_in_person_properties: sensitive PII passed to person-property
call-sites (register / register_once / setPersonProperties /
setPersonPropertiesForFlags). Mirrors the identify() split — email and
names are standard person properties and are NOT flagged; only regulated
PII (SSN, DOB, financial, gov ID, medical) is.
- posthog_pii_value_in_tracking_call: PII-shaped literal VALUES (email, US
SSN, separator-grouped card number) under any property key — catches PII
hidden behind innocuous keys like { referrer: 'jane@example.com' }.
Both follow house conventions: one rule per file, all six meta fields,
no regex lookarounds, same first-object FP-avoidance scoping as the sibling
rule. Email values are flagged in capture() only (email is valid in
identify/person-properties). Card detection requires separators since
YARA-X can't run a Luhn check. 19 tests; full suite green (535).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
sarahxsanders
left a comment
There was a problem hiding this comment.
one missing test then you're a warlock harry >:)
| // - email: real address shape, and only in capture() — an email value in | ||
| // identify()/person-properties is the standard, correct pattern. |
There was a problem hiding this comment.
tiny nit. this rule only catches emails written as literal strings, not when passed as a variable. it's fine as a limit if that's intended, but might be worth adding a comment here for context:
// Known limit: only catches quoted email values, not variables
triage will catch most of this at runtime but it could potentially be a gap and having the context for robots and humans is helpful!
There was a problem hiding this comment.
Added in 7ffc998 — a 'Known limit' note in the rule header: only catches quoted/literal values, not variables (e.g. { email: userEmail }); runtime triage covers most of those, this rule is the static-analysis backstop for hardcoded leaks.
| }); | ||
| }); | ||
|
|
||
| describe('negative cases – should NOT match', () => { |
There was a problem hiding this comment.
rule 1 has a test that confirms it correctly doesn't fire on { $set: { ssn: '...' } }
rule 2 behaves the same way but there's no test for it. could we add a matching negative test? something like
it('does NOT match an SSN value nested inside a $set object', async () => {
await expectRuleDidNotMatch(
`posthog.setPersonProperties({ $set: { ssn: '123-45-6789' } })`,
RULE,
);
});
There was a problem hiding this comment.
Added your suggested test verbatim in 7ffc998 — confirms the value rule stays silent on setPersonProperties({ $set: { ssn: '123-45-6789' } }). The nested object isn't matched because the [^{}]* scope stops at the inner brace, same FP-avoidance as rule 1. Suite green (536).
Per review (#510): warlock is being wired to the wizard directly, so the inline copy of the rules is throwaway. The two rules live in warlock (PostHog/warlock#33) — the source of truth — and reach the wizard through that wiring, not an inline mirror. Removes the mirrored rules and their tests; rule count back to 15. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Update: the wizard-side inline mirror referenced in the PR description was dropped (PostHog/wizard#510) at review — the wizard will consume these rules through the warlock→wizard scanner wiring instead of a hand-maintained copy. These two rules remain the source of truth here. |
Review feedback (#33): - Document that posthog_pii_value_in_tracking_call only catches quoted/ literal values, not variables (runtime triage covers the rest). - Add a negative test confirming the rule does not fire on an SSN nested inside a $set object — matching the coverage the person-properties rule already has. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks @sarahxsanders — both addressed in
Full suite green (536). Heads up: the wizard-side inline mirror these rules referenced was dropped at review (PostHog/wizard#510) — the wizard will consume them through the warlock→wizard wiring instead. |
Summary
Hardens PII detection beyond the single
posthog_pii_in_capture_callrule. Sarah flagged it could be more robust — these fill the two biggest gaps: more call-sites and value-shaped detection.posthog_pii_in_person_propertiesSensitive PII passed to person-property call-sites:
register,register_once,setPersonProperties,setPersonPropertiesForFlags. Mirrors theidentify()half of the existing rule — email and names are standard, expected person properties (NOT flagged); only regulated PII is (SSN, DOB, financial, government ID, medical).posthog_pii_value_in_tracking_callPII-shaped literal values, regardless of key name — catches PII hidden under innocuous keys like
posthog.capture('signup', { referrer: 'jane@example.com' }), which the key-name rule misses. Matches:capture()only (an email value inidentify()/person-properties is the standard, correct pattern)123-45-6789) in any tracking call4111 1111 1111 1111) in any tracking callConventions followed
metafields (description,remediation,severity,category,action,scan_context)category = posthog_pii,severity = high,action = remediate,scan_context = output(matches the existing PII rule){}-object FP-avoidance scoping as the sibling ruleTest plan
__tests__/rules/pnpm test— full suite green (535 tests, 32 files)Note
Independent of the in-flight
roin-orca-rulesbranch (it doesn't touch PII rules). The PostHog wizard ships an inline mirror of these patterns (PostHog/wizard#510) until it consumes warlock directly.🤖 Generated with Claude Code