pk-doctor: sensitive_data._credit_cards false-positives on numeric IDs inside social-media URLs

## Summary

\`pk-doctor\`'s \`sensitive_data\` check runs every 13–19-digit numeric run through a Luhn checksum and flags Luhn-valid hits as ERROR-class credit-card incidents. Public content IDs from social media — notably 19-digit Twitter/X status IDs — frequently pass Luhn and produce ERROR-class \"privacy incident\" findings on content that is just a tweet/video/post link.

## Affected file

\`context/skills/processkit/pk-doctor/scripts/checks/sensitive_data.py\` — \`_credit_cards\` (line ~389) called from \`run\` (line ~265).

## Repro

Any artifact/document linking a Twitter/X status URL with a Luhn-valid status ID, e.g.

\`\`\`markdown
- [Riley Goodside ASCII-Smuggling thread](https://x.com/goodside/status/1745511940351287394)
\`\`\`

\`1745511940351287394\` is a 19-digit Luhn-valid public tweet ID. Doctor reports:

\`\`\`
ERROR sensitive-data.credit-card | credit-card-like number with valid checksum in path/to/file.md:NN: 174551...7394
  fix: Remove the card-like value and treat the commit as a privacy incident.
\`\`\`

In this derived project the bug produced 3 ERRORs in security-research artifacts that all cite the same Riley Goodside tweet.

## Suggested fix

Skip matches whose immediate left-context is a URL path on a known content-ID host. Minimal patch:

\`\`\`python
_URL_NUMERIC_ID_HOST_RE = re.compile(
    r\"(?:twitter\\.com|x\\.com|t\\.co|youtu\\.be|youtube\\.com|facebook\\.com|\"
    r\"linkedin\\.com|instagram\\.com|tiktok\\.com|reddit\\.com|status|\"
    r\"posts?|videos?|watch|tweet|toot)/[^\\s)\\\"']*\$\",
    re.IGNORECASE,
)

def _credit_cards(text: str):
    for match in _CREDIT_CARD_RE.finditer(text):
        raw = match.group(0)
        digits = re.sub(r\"\\D\", \"\", raw)
        if 13 <= len(digits) <= 19 and _luhn(digits):
            left = text[max(0, match.start() - 120) : match.start()]
            if _URL_NUMERIC_ID_HOST_RE.search(left):
                continue
            line = text.count(\"\\n\", 0, match.start()) + 1
            yield line, raw
\`\`\`

A more principled fix would scope the check to runs whose surrounding character class isn't a URL path (anchor on \`https?://\` within N chars of the match start).

## Severity

ERROR-class false positive on benign content (public social-media links). Doctor's suggested fix is \"Remove the card-like value and treat the commit as a privacy incident\" — actively misleading remediation advice.

🤖 Reported via [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pk-doctor: sensitive_data._credit_cards false-positives on numeric IDs inside social-media URLs #75

Summary

Affected file

Repro

Suggested fix

Severity

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

pk-doctor: sensitive_data._credit_cards false-positives on numeric IDs inside social-media URLs #75

Description

Summary

Affected file

Repro

Suggested fix

Severity

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions