Skip to content

feat: add two robust PostHog PII rules#33

Draft
joethreepwood wants to merge 2 commits into
mainfrom
posthog-pii-rules
Draft

feat: add two robust PostHog PII rules#33
joethreepwood wants to merge 2 commits into
mainfrom
posthog-pii-rules

Conversation

@joethreepwood

Copy link
Copy Markdown

Summary

Hardens PII detection beyond the single posthog_pii_in_capture_call rule. Sarah flagged it could be more robust — these fill the two biggest gaps: more call-sites and value-shaped detection.

posthog_pii_in_person_properties

Sensitive PII passed to person-property call-sites: register, register_once, setPersonProperties, setPersonPropertiesForFlags. Mirrors the identify() half of the existing rule — email and names are standard, expected person properties (NOT flagged); only regulated PII is (SSN, DOB, financial, government ID, medical).

posthog_pii_value_in_tracking_call

PII-shaped literal values, regardless of key name — catches PII hidden under innocuous keys like posthog.capture('signup', { referrer: 'jane@example.com' }), which the key-name rule misses. Matches:

  • email literal — capture() only (an email value in identify()/person-properties is the standard, correct pattern)
  • US SSN (123-45-6789) in any tracking call
  • separator-grouped card number (4111 1111 1111 1111) in any tracking call

Conventions followed

  • One rule per file, filename === rule name; all six meta fields (description, remediation, severity, category, action, scan_context)
  • category = posthog_pii, severity = high, action = remediate, scan_context = output (matches the existing PII rule)
  • No regex lookarounds (YARA-X rejects them); same first-{}-object FP-avoidance scoping as the sibling rule
  • Deliberate precision over recall to limit false positives: card detection requires separators (YARA-X can't run a Luhn check, so an unseparated 16-digit run is intentionally not matched)

Test plan

  • 19 new tests (positive / negative / metadata per rule) under __tests__/rules/
  • pnpm test — full suite green (535 tests, 32 files)
  • Negative coverage locks in the FP guards: email-in-identify, name in person properties, unseparated 16-digit runs, and capture-owned PII not double-firing the person-property rule

Note

Independent of the in-flight roin-orca-rules branch (it doesn't touch PII rules). The PostHog wizard ships an inline mirror of these patterns (PostHog/wizard#510) until it consumes warlock directly.

🤖 Generated with Claude Code

Extends PII coverage beyond the single posthog_pii_in_capture_call rule
(which matches PII by key name in capture/identify):

- posthog_pii_in_person_properties: sensitive PII passed to person-property
  call-sites (register / register_once / setPersonProperties /
  setPersonPropertiesForFlags). Mirrors the identify() split — email and
  names are standard person properties and are NOT flagged; only regulated
  PII (SSN, DOB, financial, gov ID, medical) is.

- posthog_pii_value_in_tracking_call: PII-shaped literal VALUES (email, US
  SSN, separator-grouped card number) under any property key — catches PII
  hidden behind innocuous keys like { referrer: 'jane@example.com' }.

Both follow house conventions: one rule per file, all six meta fields,
no regex lookarounds, same first-object FP-avoidance scoping as the sibling
rule. Email values are flagged in capture() only (email is valid in
identify/person-properties). Card detection requires separators since
YARA-X can't run a Luhn check. 19 tests; full suite green (535).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@sarahxsanders sarahxsanders left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one missing test then you're a warlock harry >:)

Comment on lines +9 to +10
// - email: real address shape, and only in capture() — an email value in
// identify()/person-properties is the standard, correct pattern.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit. this rule only catches emails written as literal strings, not when passed as a variable. it's fine as a limit if that's intended, but might be worth adding a comment here for context:

// Known limit: only catches quoted email values, not variables

triage will catch most of this at runtime but it could potentially be a gap and having the context for robots and humans is helpful!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 7ffc998 — a 'Known limit' note in the rule header: only catches quoted/literal values, not variables (e.g. { email: userEmail }); runtime triage covers most of those, this rule is the static-analysis backstop for hardcoded leaks.

});
});

describe('negative cases – should NOT match', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rule 1 has a test that confirms it correctly doesn't fire on { $set: { ssn: '...' } }

rule 2 behaves the same way but there's no test for it. could we add a matching negative test? something like

it('does NOT match an SSN value nested inside a $set object', async () => {
  await expectRuleDidNotMatch(
    `posthog.setPersonProperties({ $set: { ssn: '123-45-6789' } })`,
    RULE,
  );
});

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added your suggested test verbatim in 7ffc998 — confirms the value rule stays silent on setPersonProperties({ $set: { ssn: '123-45-6789' } }). The nested object isn't matched because the [^{}]* scope stops at the inner brace, same FP-avoidance as rule 1. Suite green (536).

joethreepwood added a commit to PostHog/wizard that referenced this pull request Jun 8, 2026
Per review (#510): warlock is being wired to the wizard
directly, so the inline copy of the rules is throwaway. The two rules
live in warlock (PostHog/warlock#33) — the source of truth — and reach
the wizard through that wiring, not an inline mirror. Removes the mirrored
rules and their tests; rule count back to 15.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@joethreepwood

Copy link
Copy Markdown
Author

Update: the wizard-side inline mirror referenced in the PR description was dropped (PostHog/wizard#510) at review — the wizard will consume these rules through the warlock→wizard scanner wiring instead of a hand-maintained copy. These two rules remain the source of truth here.

Review feedback (#33):

- Document that posthog_pii_value_in_tracking_call only catches quoted/
  literal values, not variables (runtime triage covers the rest).
- Add a negative test confirming the rule does not fire on an SSN nested
  inside a $set object — matching the coverage the person-properties rule
  already has.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@joethreepwood

Copy link
Copy Markdown
Author

Thanks @sarahxsanders — both addressed in 7ffc998:

  • Literal-only limit documented — added a // Known limit: note to posthog_pii_value_in_tracking_call: it only catches quoted/literal values, not variables (e.g. { email: userEmail }); runtime triage covers most of those, this rule is the static-analysis backstop for hardcoded leaks.
  • Nested $set negative test — added your suggested test verbatim; confirms the value rule stays silent on setPersonProperties({ $set: { ssn: '…' } }), matching the coverage the person-properties rule already had.

Full suite green (536). Heads up: the wizard-side inline mirror these rules referenced was dropped at review (PostHog/wizard#510) — the wizard will consume them through the warlock→wizard wiring instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants