Skip to content

Add shared PostHog telemetry reporter#10

Open
mariusvniekerk wants to merge 8 commits into
mainfrom
shared-posthog-telemetry
Open

Add shared PostHog telemetry reporter#10
mariusvniekerk wants to merge 8 commits into
mainfrom
shared-posthog-telemetry

Conversation

@mariusvniekerk
Copy link
Copy Markdown
Contributor

Summary

  • add a shared PostHog telemetry reporter with caller-owned API key, application, env prefix, anonymous distinct ID, and option-based event allowlists
  • enforce privacy defaults and property sanitization for daemon telemetry captures
  • add build-tag/process-level telemetry disabling plus a daemon_active usage example

Validation

  • go test ./telemetry
  • go test -tags kit_posthog_disabled ./telemetry
  • golangci-lint run
  • go run ./cmd/testify-helper-check ./...
  • go tool nilaway -include-pkgs=go.kenn.io/kit ./...
  • go tool gotestsum --format pkgname-and-test-fails -- ./...

Several Kenn daemons need the same anonymous PostHog telemetry behavior, but keeping separate internal implementations makes it easy for privacy defaults, opt-out handling, and event allowlists to drift between apps.

This adds a shared reporter that requires callers to provide their own PostHog project key, anonymous distinct ID, env prefix, and allowed events. The reusable package owns the sanitization and privacy defaults while callers keep app-specific keys and event semantics outside kit.

The option-based event allowlist keeps the public API usable without exposing the internal map representation, and the build-tag/process disable hook preserves the existing test-binary exclusion pattern for daemon integrations.

Validation: go test ./telemetry; go test -tags kit_posthog_disabled ./telemetry; golangci-lint run; go run ./cmd/testify-helper-check ./...; go tool nilaway -include-pkgs=go.kenn.io/kit ./...; go tool gotestsum --format pkgname-and-test-fails -- ./...

🤖 Generated with [OpenAI Codex](https://openai.com/codex)
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 4, 2026

roborev: Combined Review (ee83fbf)

Medium issue found: the telemetry allowlist handling drops valid events with no custom properties.

Medium

  • telemetry/posthog.go:389 - cloneAllowedTelemetryEvents drops allowlisted events that have no custom properties, even though WithAllowedEvent(event string, properties ...AllowedTelemetryProperty) permits callers to define such events. A caller trying to capture a default-properties-only event may fail reporter construction or receive ErrUnsupportedTelemetryEvent.
    • Fix: preserve events with an empty property map and add a unit test for WithAllowedEvent("event_without_properties").

Panel: ci_default_security | Synthesis: codex, 11s | Members: codex_default (codex/default, done, 2m38s), codex_security (codex/security, done, 1m22s) | Total: 4m11s

The option-based PostHog API allows callers to register events whose payload is only the shared default metadata. Dropping those events during allowlist cloning made that API contract inconsistent and caused construction or capture to fail for valid default-properties-only daemon events.

Preserving empty property maps keeps event authorization separate from property authorization, while the regression test proves unallowlisted submitted properties are still stripped from such events.

Validation: go test ./telemetry; go test -tags kit_posthog_disabled ./telemetry; golangci-lint run; go run ./cmd/testify-helper-check ./...; go tool gotestsum --format pkgname-and-test-fails -- ./...; go tool nilaway -include-pkgs=go.kenn.io/kit ./...

🤖 Generated with [OpenAI Codex](https://openai.com/codex)
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 4, 2026

roborev: Combined Review (885d12c)

Telemetry is mostly sound, but one medium bug allows already-created reporters to keep sending after global disable.

Medium

  • telemetry/posthog.go:288 - DisablePostHogTelemetry() is documented as disabling telemetry for the process, but existing enabled reporters can continue to enqueue events because Capture only checks the reporter’s local enabled/client state. Have Enabled() or Capture() also check PostHogTelemetryDisabled(), and add a test that disables telemetry after reporter creation.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 2m22s), codex_security (codex/security, done, 1m2s) | Total: 3m30s

DisablePostHogTelemetry is a process-wide switch, so callers should not need to recreate already-initialized reporters for it to take effect. The previous implementation only consulted that switch while constructing reporters, leaving long-running daemons able to enqueue after a build-tag or runtime disable path fired.

This makes submission eligibility check the process disable state on every capture while keeping Close tied to the underlying client state so existing clients can still flush or release resources.

Validation: go test ./telemetry; go test -tags kit_posthog_disabled ./telemetry; golangci-lint run; go run ./cmd/testify-helper-check ./...; go tool nilaway -include-pkgs=go.kenn.io/kit ./...; go tool gotestsum --format pkgname-and-test-fails -- ./...

🤖 Generated with [OpenAI Codex](https://openai.com/codex)
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 4, 2026

roborev: Combined Review (ac8ae18)

PR has one medium-severity issue to address before merge.

Medium

  • telemetry/posthog.go:311: Close ignores the process-level telemetry disable and still calls the PostHog client's Close, which can flush pending telemetry. If DisablePostHogTelemetry() is called after events are queued, those events may still be sent even though Enabled() is false.
    • Fix: Gate Close on r.Enabled() or use a non-flushing shutdown/discard path when PostHogTelemetryDisabled() is true.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 3m34s), codex_security (codex/security, done, 14s) | Total: 3m54s

The process-level telemetry disable path should stop outbound telemetry altogether, including buffered events that might otherwise be flushed during reporter shutdown.

This keeps Close aligned with Enabled so disabling telemetry after reporter creation prevents both new captures and shutdown flushes.

Validation: go test ./telemetry; go test -tags kit_posthog_disabled ./telemetry; golangci-lint run; go run ./cmd/testify-helper-check ./...; go tool nilaway -include-pkgs=go.kenn.io/kit ./...; go tool gotestsum --format pkgname-and-test-fails -- ./...

🤖 Generated with [OpenAI Codex](https://openai.com/codex)
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 4, 2026

roborev: Combined Review (6c88dcd)

Verdict: One medium issue needs attention before merge.

Medium

  • [telemetry/posthog.go:312] Close() can skip closing an already-created PostHog client after DisablePostHogTelemetry() is called. Since Close is the only shutdown hook in postHogEnqueueCloser, this may leak client resources and allow queued async events to continue sending.
    • Fix: Separate submit gating from shutdown behavior, or otherwise ensure active clients are released even when telemetry has been disabled.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 4m24s), codex_security (codex/security, done, 2m29s) | Total: 7m1s

PostHog Close is the only available client shutdown path, but it drains queued messages. Skipping Close avoided a flush but left the async client alive, while calling Close without a guard could send after process-level telemetry disable.

This installs a transport guard in the shared reporter so queued flush attempts after DisablePostHogTelemetry fail before network I/O, allowing Close to release the client without submitting disabled telemetry.

Validation: go test ./telemetry; go test -tags kit_posthog_disabled ./telemetry; golangci-lint run; go run ./cmd/testify-helper-check ./...; go tool nilaway -include-pkgs=go.kenn.io/kit ./...; go tool gotestsum --format pkgname-and-test-fails -- ./...

🤖 Generated with [OpenAI Codex](https://openai.com/codex)
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 4, 2026

roborev: Combined Review (a9e8291)

Telemetry shutdown has one medium issue; no high or critical findings were reported.

Medium

  • [telemetry/posthog.go:317] Close() still calls client.Close() after DisablePostHogTelemetry() makes Enabled() false. An active reporter with queued events may try to flush through PostHog and surface ErrPostHogTelemetryDisabled during shutdown even though telemetry is disabled.
    • Fix: Gate Close() on r.Enabled() or return nil when PostHogTelemetryDisabled() is true, and add a test where an event is queued before the process-level disable.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 1m57s), codex_security (codex/security, done, 14s) | Total: 2m19s

Process-level telemetry disable is a stronger guarantee than graceful delivery. Once DisablePostHogTelemetry has been called, reporter shutdown should not ask the PostHog SDK to drain a queue that may contain pre-disable events.

Close now deactivates the shared wrapper locally when telemetry is disabled, so callers can still use Close as their shutdown hook without triggering a final flush. The regression test covers the queued-before-disable case called out by review.

Validation: go test ./telemetry; go test -tags kit_posthog_disabled ./telemetry; golangci-lint run; go run ./cmd/testify-helper-check ./...; go tool nilaway -include-pkgs=go.kenn.io/kit ./...; go tool gotestsum --format pkgname-and-test-fails -- ./...

Generated with OpenAI Codex
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 4, 2026

roborev: Combined Review (f0a1ec1)

Summary verdict: One medium concurrency issue should be fixed before merge.

Medium

  • telemetry/posthog.go:308: Capture checks Enabled() and then dereferences r.client without synchronization, while Close can concurrently set r.enabled false and r.client nil. A shutdown racing with a telemetry capture can cause a data race, enqueue on a closing client, or panic after the enabled check has passed.
    • Fix: Add lifecycle synchronization around enabled and client, such as a mutex that serializes Capture with Close and makes Close idempotent.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 3m26s), codex_security (codex/security, done, 1m12s) | Total: 4m45s

The reporter lifecycle can be touched by daemon activity and shutdown paths at the same time. After Close started deactivating the wrapper, Capture needed to share the same synchronization boundary so it cannot enqueue through a client that Close is concurrently clearing.

The mutex keeps Enabled, Capture, and Close consistent while preserving the process-disable guarantee that Close does not flush queued telemetry after DisablePostHogTelemetry. The regression test forces Capture and Close to overlap and verifies Close waits for the in-flight enqueue.

Validation: go test ./telemetry; go test -race ./telemetry; go test -tags kit_posthog_disabled ./telemetry; golangci-lint run; go run ./cmd/testify-helper-check ./...; go tool nilaway -include-pkgs=go.kenn.io/kit ./...; go tool gotestsum --format pkgname-and-test-fails -- ./...

Generated with OpenAI Codex
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 4, 2026

roborev: Combined Review (eebba58)

Medium issues remain in the telemetry implementation; no critical or high findings were reported.

Medium

  • telemetry/posthog.go:341
    When telemetry is disabled before Close, the reporter drops its client reference without closing the underlying PostHog client. Since events are enqueued through a closeable client, this can leave client goroutines, timers, and queued messages alive after Close returns.
    Fix: Add a shutdown path that stops the PostHog client while still preventing network sends, such as closing through a disabled/no-op transport and then deactivating the reporter.

  • telemetry/posthog.go:411
    AllowTelemetryToken accepts arbitrary short identifier-like strings, which can still include sensitive slugs, usernames, branch names, hostnames, or project names. As a built-in “safe” telemetry filter, this can lead callers to send user-controlled identifiers that are not anonymous.
    Fix: Replace this with an explicit enum/set allowlist helper, or require callers to provide a predicate for known-safe values instead of accepting any token-shaped string.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 3m42s), codex_security (codex/security, done, 1m55s) | Total: 5m47s

Process-level telemetry disable should not choose between leaking the SDK client and sending queued telemetry during shutdown. The reporter can close the SDK safely by routing disabled HTTP attempts to a local 204 response, which lets the SDK drain and stop without network I/O.

The string property helper also now requires an explicit allowed-value set. That keeps the shared API from treating arbitrary token-shaped user data, such as slugs or names, as anonymous telemetry by default.

Validation: go test ./telemetry; go test -race ./telemetry; go test -tags kit_posthog_disabled ./telemetry; golangci-lint run; go run ./cmd/testify-helper-check ./...; go tool nilaway -include-pkgs=go.kenn.io/kit ./...; go tool gotestsum --format pkgname-and-test-fails -- ./...

Generated with OpenAI Codex
Co-authored-by: OpenAI Codex <noreply@openai.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Jun 4, 2026

roborev: Combined Review (64ab6d5)

Summary verdict: No Medium, High, or Critical findings to report.

Both reviews found no reportable issues at Medium severity or above.


Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 3m25s), codex_security (codex/security, done, 1m47s) | Total: 5m19s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant