feat: entropy-based secret detection for exception code variables by ablaszkiewicz · Pull Request #692 · PostHog/posthog-python

ablaszkiewicz · 2026-06-23T11:29:16Z

Currently we detect secrets based on keys and values having common secret phrases like api-key, password, etc...

@hpouillot got an amazing idea to do an entropy based secrets detection. This PR does that and extends our detection with popular secrets format like sk-ant, ey..., gh_..., glpat_....

There is a problem however - sometimes high entropy strings are classified as secrets. We do our best to detect genuine high entropy non-secrets and stop the entropy detection process if it's:

UUID,
file or URL path,
something [in these brackets] or in {these brackets}.

There are much more cases and rules - it's in the PR.

I also tightened our search limits based on real world exceptions.

Did a benchmark on 3 real exceptions. This computes how long does it take to capture variables from frames

Case	Before (master) (ms)	After (ms)	Δ (ms)
eventA	0.062	0.083	+0.021 (+34%)
eventB	0.040	0.053	+0.013 (+31%)
eventC	0.040	0.053	+0.012 (+31%)

greptile-apps · 2026-06-23T11:35:32Z

Comments Outside Diff (3)

posthog/test/test_code_variables.py, line 684-688 (link)

_is_high_entropy_secret called outside its documented pre-condition

_is_high_entropy_secret is documented as "Assumes len(value) >= _SECRET_MIN_LENGTH" and the only call site in production code respects this (the length guard in _looks_like_secret runs first). The test calls _is_high_entropy_secret("xK9#mP2$") — an 8-character string — directly, bypassing that guard. The function happens to return False because the entropy of 8 unique characters is ≈ 3.0 bits/char (below the 3.8 threshold), but the test is exercising behaviour that lies outside the function's contract. A future maintainer who tightens the generic gate could inadvertently shift the function's short-string behaviour without any failing tests.

Consider testing the cheap-exit via the public _looks_like_secret instead, which is the only intended caller and enforces the length precondition explicitly.
posthog/test/test_code_variables.py, line 726-744 (link)

No test for the per-context override of detect_secrets

The per-context toggle set_code_variables_detect_secrets_context is a new public API entry point wired through contexts.py and client.py, but TestSecretDetection only exercises the client-level option (code_variables_detect_secrets=False in make_client). A test that wraps a capture in a with posthog.new_context(): set_code_variables_detect_secrets_context(False) block and asserts the secret leaks through would close the remaining gap and mirror the coverage pattern the other context-level toggles already have.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
posthog/test/test_code_variables.py, line 679-692 (link)

Duplicate coverage of already-parameterised values

test_uuids_and_object_ids_are_never_flagged and test_pure_hex_is_treated_as_an_id_not_a_secret assert on exact string values (the UUID, Mongo ObjectId, and md5 hex digest) that are already present in NON_SECRETS and exercised by the parameterised test_non_secrets_are_not_flagged. This violates the OnceAndOnlyOnce rule and the team preference for parameterised tests over dedicated single-case tests. Either remove these methods and let the parameterised suite carry the coverage, or pull these "must-never-flag" entries into a named sub-list with their own parameterised test and remove the duplicates from NON_SECRETS.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "feat: entropy-based secret detection for..." | Re-trigger Greptile}

github-actions · 2026-06-23T11:38:42Z

posthog-python Compliance Report

Date: 2026-06-23 12:16:28 UTC
Duration: 530095ms

✅ All Tests Passed!

45/45 tests passed

Capture Tests

✅ 29/29 tests passed

View Details

Test	Status	Duration
Format Validation.Event Has Required Fields	✅	517ms
Format Validation.Event Has Uuid	✅	10007ms
Format Validation.Event Has Lib Properties	✅	10008ms
Format Validation.Distinct Id Is String	✅	10007ms
Format Validation.Token Is Present	✅	10007ms
Format Validation.Custom Properties Preserved	✅	10007ms
Format Validation.Event Has Timestamp	✅	10007ms
Retry Behavior.Retries On 503	✅	18019ms
Retry Behavior.Does Not Retry On 400	✅	12005ms
Retry Behavior.Does Not Retry On 401	✅	10005ms
Retry Behavior.Respects Retry After Header	✅	16015ms
Retry Behavior.Implements Backoff	✅	30018ms
Retry Behavior.Retries On 500	✅	13011ms
Retry Behavior.Retries On 502	✅	16011ms
Retry Behavior.Retries On 504	✅	16011ms
Retry Behavior.Max Retries Respected	✅	30018ms
Deduplication.Generates Unique Uuids	✅	7002ms
Deduplication.Preserves Uuid On Retry	✅	16015ms
Deduplication.Preserves Uuid And Timestamp On Retry	✅	23011ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry	✅	16014ms
Deduplication.No Duplicate Events In Batch	✅	10002ms
Deduplication.Different Events Have Different Uuids	✅	10007ms
Compression.Sends Gzip When Enabled	✅	10007ms
Batch Format.Uses Proper Batch Structure	✅	10007ms
Batch Format.Flush With No Events Sends Nothing	✅	5005ms
Batch Format.Multiple Events Batched Together	✅	10005ms
Error Handling.Does Not Retry On 403	✅	12008ms
Error Handling.Does Not Retry On 413	✅	10008ms
Error Handling.Retries On 408	✅	14013ms

Feature_Flags Tests

✅ 16/16 tests passed

View Details

Test	Status	Duration
Request Payload.Request With Person Properties Device Id	✅	9502ms
Request Payload.Flags Request Uses V2 Query Param	✅	10008ms
Request Payload.Flags Request Hits Flags Path Not Decide	✅	10007ms
Request Payload.Flags Request Omits Authorization Header	✅	10007ms
Request Payload.Token In Flags Body Matches Init	✅	10007ms
Request Payload.Groups Round Trip	✅	10007ms
Request Payload.Groups Default To Empty Object	✅	10007ms
Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It	✅	10007ms
Request Payload.Disable Geoip False Propagates As Geoip Disable False	✅	10006ms
Request Payload.Disable Geoip Omitted Defaults To False	✅	10007ms
Request Payload.Flag Keys To Evaluate Contains Only Requested Key	✅	10008ms
Request Lifecycle.No Flags Request On Init Alone	✅	5003ms
Request Lifecycle.No Flags Request On Normal Capture	✅	10508ms
Request Lifecycle.Two Flag Calls Produce Two Remote Requests	✅	9511ms
Request Lifecycle.Mock Response Value Is Returned To Caller	✅	10002ms
Side Effect Events.Get Feature Flag Captures Feature Flag Called Event	✅	10509ms

Add a last-resort entropy-based detector that redacts high-entropy, secret-looking values (API keys, tokens, strong passwords) sitting in innocuously-named code variables, after the existing name-pattern and URL-credential checks. - Known vendor key formats (OpenAI, Anthropic, AWS, Stripe, GitHub, GitLab, Slack, Google, JWT, PEM private keys) are matched directly. - Structured identifiers (UUIDs, Mongo ObjectIds, hashes), object reprs, file paths and URLs are never flagged. - Exposed as the `code_variables_detect_secrets` option (default True) with a per-context override, threaded through client/contexts. - Tighten the masking size caps to keep capture cost bounded. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ablaszkiewicz · 2026-06-23T12:07:30Z

+
+# Synthetic, format-correct fakes (no real credentials). Vendor keys are assembled from
+# prefix + body so no complete secret literal lives in source (which trips secret scanners).
+def _key(prefix, body):


I had to split secrets into two parts. Otherwise GitHub wasn't happy

greptile-apps · 2026-06-23T12:16:04Z

_{Reviews (2): Last reviewed commit: "feat: entropy-based secret detection for..." | Re-trigger Greptile}

cat-ph

love this!! 🚀🚀

cat-ph · 2026-06-23T12:53:27Z

    ``repr(value)`` but fails closed: redact entirely on any mask match, over-length
    repr, or a raising ``__repr__``."""
    try:
        rendered = repr(value)


do we want to run _looks_like_secret(rendered) here too maybe? 🤔 otherwise something like this bypasses it:

from posthog.exception_utils import _MaskingConfig, _encode_variable, VariableSizeLimiter class OpaqueToken: __slots__ = () def __repr__(self): return "n8fK2pQ9vX7mL4wR8tY3uZ6bC1dE5gH" # high-entropy fake token config = _MaskingConfig.build( [], [], mask_url_credentials=False, detect_secrets=True, ) print(_encode_variable(OpaqueToken(), config, VariableSizeLimiter()))

ablaszkiewicz force-pushed the code-variables-entropy-detection branch from a245e6e to d206ea9 Compare June 23, 2026 11:51

ablaszkiewicz force-pushed the code-variables-entropy-detection branch from d206ea9 to 615244d Compare June 23, 2026 12:07

ablaszkiewicz commented Jun 23, 2026

View reviewed changes

ablaszkiewicz marked this pull request as ready for review June 23, 2026 12:08

ablaszkiewicz requested a review from a team as a code owner June 23, 2026 12:08

ablaszkiewicz requested review from a team, cat-ph and hpouillot June 23, 2026 12:08

ablaszkiewicz mentioned this pull request Jun 23, 2026

docs: automatic secret detection for Python code variables PostHog/posthog.com#17848

Draft

cat-ph approved these changes Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: entropy-based secret detection for exception code variables#692

feat: entropy-based secret detection for exception code variables#692
ablaszkiewicz wants to merge 1 commit into
mainfrom
code-variables-entropy-detection

ablaszkiewicz commented Jun 23, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading

Comments Outside Diff (3)

Uh oh!

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

ablaszkiewicz Jun 23, 2026

Uh oh!

greptile-apps Bot commented Jun 23, 2026

Uh oh!

cat-ph left a comment

Uh oh!

cat-ph Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ablaszkiewicz commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (3)

Uh oh!

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

posthog-python Compliance Report

✅ All Tests Passed!

Capture Tests

Feature_Flags Tests

Uh oh!

ablaszkiewicz Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 23, 2026

Uh oh!

cat-ph left a comment

Choose a reason for hiding this comment

Uh oh!

cat-ph Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ablaszkiewicz commented Jun 23, 2026 •

edited

Loading

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading

github-actions Bot commented Jun 23, 2026 •

edited

Loading