Skip to content

pk-doctor: sensitive_data._credit_cards false-positives on numeric IDs inside social-media URLs #75

@projectious

Description

@projectious

Summary

`pk-doctor`'s `sensitive_data` check runs every 13–19-digit numeric run through a Luhn checksum and flags Luhn-valid hits as ERROR-class credit-card incidents. Public content IDs from social media — notably 19-digit Twitter/X status IDs — frequently pass Luhn and produce ERROR-class "privacy incident" findings on content that is just a tweet/video/post link.

Affected file

`context/skills/processkit/pk-doctor/scripts/checks/sensitive_data.py` — `_credit_cards` (line ~389) called from `run` (line ~265).

Repro

Any artifact/document linking a Twitter/X status URL with a Luhn-valid status ID, e.g.

```markdown

`1745511940351287394` is a 19-digit Luhn-valid public tweet ID. Doctor reports:

```
ERROR sensitive-data.credit-card | credit-card-like number with valid checksum in path/to/file.md:NN: 174551...7394
fix: Remove the card-like value and treat the commit as a privacy incident.
```

In this derived project the bug produced 3 ERRORs in security-research artifacts that all cite the same Riley Goodside tweet.

Suggested fix

Skip matches whose immediate left-context is a URL path on a known content-ID host. Minimal patch:

```python
_URL_NUMERIC_ID_HOST_RE = re.compile(
r"(?:twitter\.com|x\.com|t\.co|youtu\.be|youtube\.com|facebook\.com|"
r"linkedin\.com|instagram\.com|tiktok\.com|reddit\.com|status|"
r"posts?|videos?|watch|tweet|toot)/[^\\s)\\\"']*$",
re.IGNORECASE,
)

def _credit_cards(text: str):
for match in _CREDIT_CARD_RE.finditer(text):
raw = match.group(0)
digits = re.sub(r"\D", "", raw)
if 13 <= len(digits) <= 19 and _luhn(digits):
left = text[max(0, match.start() - 120) : match.start()]
if _URL_NUMERIC_ID_HOST_RE.search(left):
continue
line = text.count("\n", 0, match.start()) + 1
yield line, raw
```

A more principled fix would scope the check to runs whose surrounding character class isn't a URL path (anchor on `https?://` within N chars of the match start).

Severity

ERROR-class false positive on benign content (public social-media links). Doctor's suggested fix is "Remove the card-like value and treat the commit as a privacy incident" — actively misleading remediation advice.

🤖 Reported via Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions