Summary
`pk-doctor`'s `sensitive_data` check runs every 13–19-digit numeric run through a Luhn checksum and flags Luhn-valid hits as ERROR-class credit-card incidents. Public content IDs from social media — notably 19-digit Twitter/X status IDs — frequently pass Luhn and produce ERROR-class "privacy incident" findings on content that is just a tweet/video/post link.
Affected file
`context/skills/processkit/pk-doctor/scripts/checks/sensitive_data.py` — `_credit_cards` (line ~389) called from `run` (line ~265).
Repro
Any artifact/document linking a Twitter/X status URL with a Luhn-valid status ID, e.g.
```markdown
`1745511940351287394` is a 19-digit Luhn-valid public tweet ID. Doctor reports:
```
ERROR sensitive-data.credit-card | credit-card-like number with valid checksum in path/to/file.md:NN: 174551...7394
fix: Remove the card-like value and treat the commit as a privacy incident.
```
In this derived project the bug produced 3 ERRORs in security-research artifacts that all cite the same Riley Goodside tweet.
Suggested fix
Skip matches whose immediate left-context is a URL path on a known content-ID host. Minimal patch:
```python
_URL_NUMERIC_ID_HOST_RE = re.compile(
r"(?:twitter\.com|x\.com|t\.co|youtu\.be|youtube\.com|facebook\.com|"
r"linkedin\.com|instagram\.com|tiktok\.com|reddit\.com|status|"
r"posts?|videos?|watch|tweet|toot)/[^\\s)\\\"']*$",
re.IGNORECASE,
)
def _credit_cards(text: str):
for match in _CREDIT_CARD_RE.finditer(text):
raw = match.group(0)
digits = re.sub(r"\D", "", raw)
if 13 <= len(digits) <= 19 and _luhn(digits):
left = text[max(0, match.start() - 120) : match.start()]
if _URL_NUMERIC_ID_HOST_RE.search(left):
continue
line = text.count("\n", 0, match.start()) + 1
yield line, raw
```
A more principled fix would scope the check to runs whose surrounding character class isn't a URL path (anchor on `https?://` within N chars of the match start).
Severity
ERROR-class false positive on benign content (public social-media links). Doctor's suggested fix is "Remove the card-like value and treat the commit as a privacy incident" — actively misleading remediation advice.
🤖 Reported via Claude Code
Summary
`pk-doctor`'s `sensitive_data` check runs every 13–19-digit numeric run through a Luhn checksum and flags Luhn-valid hits as ERROR-class credit-card incidents. Public content IDs from social media — notably 19-digit Twitter/X status IDs — frequently pass Luhn and produce ERROR-class "privacy incident" findings on content that is just a tweet/video/post link.
Affected file
`context/skills/processkit/pk-doctor/scripts/checks/sensitive_data.py` — `_credit_cards` (line ~389) called from `run` (line ~265).
Repro
Any artifact/document linking a Twitter/X status URL with a Luhn-valid status ID, e.g.
```markdown
```
`1745511940351287394` is a 19-digit Luhn-valid public tweet ID. Doctor reports:
```
ERROR sensitive-data.credit-card | credit-card-like number with valid checksum in path/to/file.md:NN: 174551...7394
fix: Remove the card-like value and treat the commit as a privacy incident.
```
In this derived project the bug produced 3 ERRORs in security-research artifacts that all cite the same Riley Goodside tweet.
Suggested fix
Skip matches whose immediate left-context is a URL path on a known content-ID host. Minimal patch:
```python
_URL_NUMERIC_ID_HOST_RE = re.compile(
r"(?:twitter\.com|x\.com|t\.co|youtu\.be|youtube\.com|facebook\.com|"
r"linkedin\.com|instagram\.com|tiktok\.com|reddit\.com|status|"
r"posts?|videos?|watch|tweet|toot)/[^\\s)\\\"']*$",
re.IGNORECASE,
)
def _credit_cards(text: str):
for match in _CREDIT_CARD_RE.finditer(text):
raw = match.group(0)
digits = re.sub(r"\D", "", raw)
if 13 <= len(digits) <= 19 and _luhn(digits):
left = text[max(0, match.start() - 120) : match.start()]
if _URL_NUMERIC_ID_HOST_RE.search(left):
continue
line = text.count("\n", 0, match.start()) + 1
yield line, raw
```
A more principled fix would scope the check to runs whose surrounding character class isn't a URL path (anchor on `https?://` within N chars of the match start).
Severity
ERROR-class false positive on benign content (public social-media links). Doctor's suggested fix is "Remove the card-like value and treat the commit as a privacy incident" — actively misleading remediation advice.
🤖 Reported via Claude Code