feat: panic autocapture#157
Conversation
Install a process-wide panic hook with `install_panic_hook(Arc<Client>)` that captures panics as personless `$exception` events: the panic payload as the exception value, panic-site `$exception_panic_file` / `_line` / `_column` from `PanicInfo::location()`, and a call-site stacktrace honoring the client's `capture_stacktrace`. Unlike the original draft (#130), the hook routes through the client's transport — a non-blocking enqueue plus a synchronous flush — rather than a dedicated synchronous HTTP path with its own timeout. The actual send and `before_send` run on the transport's background worker (a std::thread), where the panic count is zero, so a panic there is a joinable thread panic instead of the process abort a panic on the hook thread would cause. This lets `before_send` run (no longer bypassed) and drops the separate `panic_capture_timeout_seconds` knob; teardown is bounded by the worker's flush. Installing twice returns `Error::PanicHookAlreadyInstalled`. Also strips v0 crate disambiguators (`std[hex]::...`) from demangled frame names so internal-frame matching and server-side grouping see stable symbols.
A user `before_send` hook that panics fires the panic hook on the transport worker thread itself (the hook runs before `catch_unwind` catches the `before_send` panic). There, `flush_blocking()` enqueued a `Flush` to the worker and blocked on `recv` — but the worker can't process its own `Flush` mid-hook, so it deadlocked, stalling delivery and hanging later `flush`/`shutdown`. `flush_blocking` now returns early when called from the worker thread (matched by its name): the panic event is already enqueued and ships on the worker's next cycle. Adds a watchdog-bounded regression test.
The round-1 fix detected the worker thread by its name and only skipped the synchronous self-flush. That was both imprecise and incomplete: - Imprecise: the thread name is a shared string, so any unrelated thread named `posthog-transport` would also be treated as the worker. - Incomplete: it stopped the self-flush deadlock but not recursion. When a `before_send` hook panics, the panic hook fires on the worker; still building and enqueuing the `$exception` re-enters `before_send`, which panics again, forever. Capture the worker's `ThreadId` at spawn and compare against it, then skip the panic capture entirely (not just the flush) when running on this client's worker. That fixes both the deadlock and the recursion at the source. Strengthen the regression test to panic unconditionally in `before_send` so it exercises the recursion path, not just the flush.
The panic hook runs on the panicking thread before unwinding releases any locks held at the panic site, then blocks in flush_blocking() waiting for the worker. The worker runs before_send for the queued events. If a before_send hook needs a lock the panic site still holds, the worker blocks on that lock, the panicking thread blocks on the worker, and the process hangs forever instead of crashing and reporting. Bound flush_blocking with rx.recv_timeout(shutdown_timeout) instead of an unbounded recv. The handle captures the (clamped) shutdown_timeout at spawn, reusing the caller's existing delivery budget rather than adding a knob. On the deadlock the hook now returns once the budget elapses and the panic proceeds; the worker unblocks on its own once unwinding frees the lock. Factor the clamp ceiling into MAX_SHUTDOWN_TIMEOUT, shared with the worker drain. Add a regression test that panics while holding a lock a before_send hook also takes.
capture_panic routed the $exception through Client::capture, which is once on a full queue. Both run arbitrary tracing-subscriber code on the already-panicking thread: a subscriber that panics aborts the process (nested panic), and one that waits on a lock the panic site holds hangs it before the previous hook runs. That contradicts the hook's own invariant of doing only panic-safe work. Add a tracing-free enqueue path for the panic hook: TransportHandle gains enqueue_panic (no full-queue warn!), and each Client gains a non-instrumented enqueue_panic_event. capture_panic now uses it instead of capture. Factor the pure slot reservation into reserve_slot, shared by the warning try_reserve and the silent panic path, and pull the shared stamp-and-send into send_reserved so enqueue and enqueue_panic stay DRY. The panicking thread now only builds the event, reserves a slot, sends on the channel, and waits on the bounded flush — no subscriber code.
The previous commit bounded the panic flush by reusing shutdown_timeout_ms (default 30s). A cross-engine review flagged that this is the wrong budget for a crash path: the hook flushes on the panicking thread before the panic message prints, so when PostHog is slow or unreachable - or a backlog is queued at panic time - every panic freezes the process and delays its own crash message for up to 30s. The rare before_send-lock deadlock is only one way to hit that bound; an unreachable host hits it on every panic. Add a dedicated panic_flush_timeout_ms to ErrorTrackingOptions (default 2000ms), decoupled from shutdown_timeout_ms, so the dying process stays responsive and operators can tune it. The transport gains a flush_blocking_timeout(timeout) (clamped to MAX_SHUTDOWN_TIMEOUT); the no-argument flush_blocking becomes test-only. capture_panic now flushes with the configured budget. Regenerate the public-api snapshot for the new builder setter.
Per review feedback, panic autocapture shouldn't require finding and calling a standalone function. init_global now installs the panic hook automatically — crash reporting, on by default, like Sentry. This is the clean path because the global client lives in a process `static`, whose lifetime matches the `'static` panic hook exactly: the hook borrows it directly, no Arc needed. - Add `ErrorTrackingOptions::capture_panics` (default `true`). When set, `init_global` installs a hook that captures panics through the global client. Set it `false` to opt out. - Keep `install_panic_hook(Arc<Client>)` as the explicit path for a standalone (non-global) client — a non-`'static` client can only feed a process-global hook via an `Arc` the caller keeps alive, so this stays a separate, opt-in entry point. The flag does not affect standalone clients. - Factor the latch + take/set/chain logic into a shared `install_hook`; both entry points are thin wrappers over it. The global hook reads the client from the `OnceLock` at panic time, so it owns no handle. - Teach the stacktrace stripper about the new `install_hook` and `maybe_install_global_panic_hook` closure frames so the first application frame is still surfaced. Tests: the flag default/builder, and an end-to-end init_global panic capture (the global client is a set-once OnceLock, so that is the only test that initializes it). Regenerate the public-api snapshot for the new builder setter.
…pture Two autoreview findings on the global panic-capture path: - The panic hook enqueued the `$exception` then sent a normal `Flush`, which services older held retries before the live buffer. With a retry backlog behind a slow endpoint, the hook's short bounded wait could be spent on stale retries and the crash event never attempted before the process exits. Add a dedicated `Control::FlushCaptures` that drains the live buffer (the `$exception`) first, then older retries best-effort; the panic flush uses it. - `maybe_install_global_panic_hook` only checked `capture_panics`, so a disabled global client (blank key / `disabled(true)`) still latched the single process-wide hook — which then no-ops — blocking a later `install_panic_hook` on an enabled standalone client. Gate install on a new `should_capture_global_panics` predicate (`!is_disabled && flag`), unit-tested across enabled / disabled / opted-out clients.
…t best-effort - `install_panic_hook` now returns `Ok(())` without installing when the client is disabled, mirroring the global path's guard. A disabled client can't send, so latching the single process-wide hook would only block a later enabled install (and silently skip `init_global`'s best-effort auto-install). Regression-tested. - Document that panic delivery is best-effort: the just-captured `$exception` is flushed ahead of queued retries, but under sustained backpressure (slow/unreachable endpoint or a large capture backlog) it may not be sent within `panic_flush_timeout_ms` before the process exits. Prioritizing past a saturated channel would need a transport-level priority path, which belongs with the sender-pool rework rather than bolted onto the std-mpsc worker.
posthog-rs-v0 Compliance ReportDate: 2026-06-29 00:08:50 UTC
|
| Test | Status | Duration |
|---|---|---|
| Format Validation.Event Has Required Fields | ✅ | 128ms |
| Format Validation.Event Has Uuid | ✅ | 138ms |
| Format Validation.Event Has Lib Properties | ✅ | 141ms |
| Format Validation.Distinct Id Is String | ✅ | 140ms |
| Format Validation.Token Is Present | ✅ | 136ms |
| Format Validation.Custom Properties Preserved | ✅ | 142ms |
| Format Validation.Event Has Timestamp | ✅ | 139ms |
| Retry Behavior.Retries On 503 | ✅ | 5134ms |
| Retry Behavior.Does Not Retry On 400 | ✅ | 2140ms |
| Retry Behavior.Does Not Retry On 401 | ✅ | 2134ms |
| Retry Behavior.Respects Retry After Header | ✅ | 5094ms |
| Retry Behavior.Implements Backoff | ✅ | 15088ms |
| Retry Behavior.Retries On 500 | ✅ | 5092ms |
| Retry Behavior.Retries On 502 | ✅ | 5091ms |
| Retry Behavior.Retries On 504 | ✅ | 5091ms |
| Retry Behavior.Max Retries Respected | ✅ | 15086ms |
| Deduplication.Generates Unique Uuids | ✅ | 92ms |
| Deduplication.Preserves Uuid On Retry | ✅ | 5021ms |
| Deduplication.Preserves Uuid And Timestamp On Retry | ✅ | 10033ms |
| Deduplication.Preserves Uuid And Timestamp On Batch Retry | ✅ | 5036ms |
| Deduplication.No Duplicate Events In Batch | ✅ | 23ms |
| Deduplication.Different Events Have Different Uuids | ✅ | 22ms |
| Compression.Sends Gzip When Enabled | ✅ | 23ms |
| Batch Format.Uses Proper Batch Structure | ✅ | 23ms |
| Batch Format.Flush With No Events Sends Nothing | ✅ | 24ms |
| Batch Format.Multiple Events Batched Together | ✅ | 101ms |
| Error Handling.Does Not Retry On 403 | ✅ | 2082ms |
| Error Handling.Does Not Retry On 413 | ✅ | 2067ms |
| Error Handling.Retries On 408 | ✅ | 5071ms |
Feature_Flags Tests
View Details
| Test | Status | Duration |
|---|---|---|
| Request Payload.Request With Person Properties Device Id | ❌ | 67ms |
| Request Payload.Flags Request Uses V2 Query Param | ❌ | 41ms |
| Request Payload.Flags Request Hits Flags Path Not Decide | ❌ | 26ms |
| Request Payload.Flags Request Omits Authorization Header | ❌ | 26ms |
| Request Payload.Token In Flags Body Matches Init | ❌ | 24ms |
| Request Payload.Groups Round Trip | ❌ | 31ms |
| Request Payload.Groups Default To Empty Object | ❌ | 44ms |
| Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It | ❌ | 21ms |
| Request Payload.Disable Geoip False Propagates As Geoip Disable False | ❌ | 45ms |
| Request Payload.Disable Geoip Omitted Defaults To False | ❌ | 23ms |
| Request Payload.Flag Keys To Evaluate Contains Only Requested Key | ❌ | 53ms |
| Request Lifecycle.No Flags Request On Init Alone | ✅ | 20ms |
| Request Lifecycle.No Flags Request On Normal Capture | ✅ | 68ms |
| Request Lifecycle.Two Flag Calls Produce Two Remote Requests | ❌ | 24ms |
| Request Lifecycle.Mock Response Value Is Returned To Caller | ❌ | 26ms |
| Side Effect Events.Get Feature Flag Captures Feature Flag Called Event | ❌ | 35ms |
Failures
request_payload.request_with_person_properties_device_id
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-6cfddce4-c1e8-4866-a1d6-eb536e73cd04'
request_payload.flags_request_uses_v2_query_param
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-5515de72-511d-41c8-8efd-f6f6aa0bb15c'
request_payload.flags_request_hits_flags_path_not_decide
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-eef1202f-e786-43a9-a4b8-dccfe765d568'
request_payload.flags_request_omits_authorization_header
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-b368f3a6-95c7-4cf5-a148-486bd363f4b6'
request_payload.token_in_flags_body_matches_init
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-d5a53c77-4867-4e9d-a4c8-51b5de61e113'
request_payload.groups_round_trip
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-2e4023e2-68d4-4ad0-bb45-dbdadac8c713'
request_payload.groups_default_to_empty_object
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-b0d8b39b-a257-49e2-8503-95bbbd0bf622'
request_payload.person_properties_distinct_id_auto_populated_when_caller_omits_it
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-eed3c43a-b4d6-4f01-bcc8-d9b9a13d5b84'
request_payload.disable_geoip_false_propagates_as_geoip_disable_false
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-610506a2-65d2-4b29-a02a-cf77d0e2a796'
request_payload.disable_geoip_omitted_defaults_to_false
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-603cb943-5166-46ac-8de1-6504a8c12920'
request_payload.flag_keys_to_evaluate_contains_only_requested_key
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-dabec705-8e7a-47cf-b973-2c9a27e3a5dc'
request_lifecycle.two_flag_calls_produce_two_remote_requests
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-24d3dc19-f4a9-4054-b50b-005edeac7396'
request_lifecycle.mock_response_value_is_returned_to_caller
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-77635b63-28b2-40a9-809e-1e3e34c9eea7'
side_effect_events.get_feature_flag_captures_feature_flag_called_event
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-99bdf335-edd6-4d51-b917-0e79fe710df7'
posthog-rs-v1 Compliance ReportDate: 2026-06-29 00:09:37 UTC
|
| Test | Status | Duration |
|---|---|---|
| Endpoint And Method.Targets V1 Endpoint | ✅ | 148ms |
| Endpoint And Method.Does Not Use Legacy Endpoints | ✅ | 147ms |
| Required Headers.Has Authorization Bearer Header | ✅ | 147ms |
| Required Headers.Has Content Type Json | ✅ | 148ms |
| Required Headers.Has Posthog Sdk Info Format | ✅ | 147ms |
| Required Headers.Has Posthog Attempt Header | ✅ | 147ms |
| Required Headers.Has Posthog Request Id | ✅ | 146ms |
| Required Headers.Has Posthog Request Timestamp | ✅ | 147ms |
| Required Headers.Has User Agent | ✅ | 146ms |
| Body Format.Body Has Created At And Batch | ✅ | 149ms |
| Body Format.No Api Key In Body | ✅ | 133ms |
| Body Format.No Sent At In Body | ✅ | 132ms |
| Event Format.Event Has Required Root Fields | ✅ | 134ms |
| Event Format.Event Uuid Is Valid | ✅ | 131ms |
| Event Format.Event Timestamp Is Rfc3339 | ✅ | 142ms |
| Event Format.Distinct Id Is String | ✅ | 133ms |
| Event Format.Distinct Id At Root Not Properties | ✅ | 134ms |
| Event Format.Custom Properties Preserved | ✅ | 133ms |
| Event Format.Set Properties Preserved | ✅ | 134ms |
| Event Format.Set Once Properties Preserved | ✅ | 130ms |
| Event Format.Groups Properties Preserved | ✅ | 123ms |
| Event Format.Sdk Generates Uuid If Not Provided | ✅ | 122ms |
| Event Format.Event Has Required Root Fields Batch | ✅ | 169ms |
| Event Format.Event Uuid Is Valid Batch | ✅ | 169ms |
| Event Format.Event Timestamp Is Rfc3339 Batch | ✅ | 168ms |
| Event Format.Distinct Id Is String Batch | ✅ | 169ms |
| Event Format.Distinct Id At Root Not Properties Batch | ✅ | 169ms |
| Event Format.Custom Properties Preserved Batch | ✅ | 168ms |
| Event Format.Set Properties Preserved Batch | ✅ | 166ms |
| Event Format.Set Once Properties Preserved Batch | ✅ | 161ms |
| Event Format.Groups Properties Preserved Batch | ✅ | 147ms |
| Event Format.Sdk Generates Uuid If Not Provided Batch | ✅ | 139ms |
| Batch Behavior.Multiple Events In Single Batch | ✅ | 182ms |
| Batch Behavior.Batch Envelope Smoke | ✅ | 168ms |
| Batch Behavior.Flush With No Events Sends Nothing | ✅ | 91ms |
| Batch Behavior.Flush At Triggers Batch | ✅ | 1113ms |
| Batch Behavior.Created At Reflects Batch Creation Time | ✅ | 101ms |
| Deduplication.Generates Unique Uuids | ✅ | 174ms |
| Deduplication.Different Events Same Content Different Uuids | ✅ | 112ms |
| Deduplication.Preserves Uuid On Retry | ✅ | 5115ms |
| Deduplication.Preserves Timestamp On Retry | ✅ | 5070ms |
| Deduplication.Preserves Uuid And Timestamp On Batch Retry | ✅ | 5106ms |
| Deduplication.No Duplicate Events In Batch | ✅ | 139ms |
| Header Behavior On Retry.Attempt Header Starts At One | ✅ | 74ms |
| Header Behavior On Retry.Attempt Header Increments On Retry | ✅ | 10074ms |
| Header Behavior On Retry.Request Id Preserved On Retry | ✅ | 5067ms |
| Header Behavior On Retry.Different Requests Have Different Request Ids | ✅ | 2069ms |
| Header Behavior On Retry.Request Timestamp Changes On Retry | ✅ | 5065ms |
| Response Format Validation.Success Response Has Uuid Keyed Results | ✅ | 60ms |
| Response Format Validation.Success Response Has Ok For Each Event | ✅ | 30ms |
| Response Format Validation.Success No Retry After When All Ok | ✅ | 34ms |
| Response Format Validation.Success Retry After Present When Retry Events | ✅ | 31ms |
| Response Format Validation.Success No Retry After When Drop Only | ✅ | 33ms |
| Response Format Validation.Response Echoes Request Id | ✅ | 28ms |
| Retry Behavior.Retries On 408 | ✅ | 5038ms |
| Retry Behavior.Retries On 500 | ✅ | 5028ms |
| Retry Behavior.Retries On 503 | ✅ | 5024ms |
| Retry Behavior.Retries On 504 | ✅ | 5023ms |
| Retry Behavior.Retryable Errors Have Retry After | ✅ | 2025ms |
| Retry Behavior.Respects Retry After On Retryable Error | ✅ | 8024ms |
| Retry Behavior.Does Not Retry On 400 | ✅ | 2027ms |
| Retry Behavior.Does Not Retry On 401 | ✅ | 2031ms |
| Retry Behavior.Does Not Retry On 402 | ✅ | 2033ms |
| Retry Behavior.Does Not Retry On 413 | ✅ | 2027ms |
| Retry Behavior.Does Not Retry On 415 | ✅ | 2027ms |
| Retry Behavior.Non Retryable Errors Have No Retry After | ✅ | 2026ms |
| Retry Behavior.Implements Backoff | ✅ | 15030ms |
| Retry Behavior.Max Retries Respected | ✅ | 15024ms |
| Partial Batch Handling.Handles 200 Full Success | ✅ | 2042ms |
| Partial Batch Handling.Handles 200 With All Ok | ✅ | 3048ms |
| Partial Batch Handling.Does Not Retry Dropped Events | ✅ | 3046ms |
| Partial Batch Handling.Does Not Retry Limited Events | ✅ | 3028ms |
| Partial Batch Handling.Prunes Ok Events On Partial Retry | ✅ | 5030ms |
| Partial Batch Handling.Prunes Dropped Events On Partial Retry | ✅ | 5025ms |
| Partial Batch Handling.Retries Only Retry Events From Partial | ✅ | 5026ms |
| Partial Batch Handling.Partial Retry Preserves Uuids | ✅ | 5024ms |
| Partial Batch Handling.Partial Retry Attempt Header Increments | ✅ | 5032ms |
| Partial Batch Handling.Partial Retry Request Id Preserved | ✅ | 5031ms |
| Partial Batch Handling.Respects Retry After On Partial | ✅ | 5024ms |
| Partial Batch Handling.Unknown Result Treated As Terminal | ✅ | 3023ms |
| Partial Batch Handling.Mixed Ok Drop Limited No Retry | ✅ | 3038ms |
| Compression.Sends Gzip Content Encoding | ✅ | 39ms |
| Compression.No Content Encoding When Disabled | ✅ | 21ms |
| Compression.Compressed Body Is Decompressible | ✅ | 22ms |
| Error Handling.Does Not Retry On Unknown 4Xx | ✅ | 2022ms |
| Event Options.Cookieless Mode Override | ✅ | 23ms |
| Event Options.Disable Skew Correction Override | ✅ | 22ms |
| Event Options.Process Person Profile Override | ✅ | 22ms |
| Event Options.Product Tour Id Override | ✅ | 21ms |
| Event Options.Unset Options Omitted | ✅ | 23ms |
| Event Options.Options Override In Batch | ✅ | 22ms |
| Geoip And Historical Migration.Geoip Disable Injected Into Properties | ✅ | 20ms |
| Geoip And Historical Migration.Historical Migration Set In Body | ✅ | 22ms |
| Geoip And Historical Migration.Historical Migration Absent By Default | ✅ | 22ms |
Feature_Flags Tests
View Details
| Test | Status | Duration |
|---|---|---|
| Request Payload.Request With Person Properties Device Id | ❌ | 12ms |
| Request Payload.Flags Request Uses V2 Query Param | ❌ | 13ms |
| Request Payload.Flags Request Hits Flags Path Not Decide | ❌ | 12ms |
| Request Payload.Flags Request Omits Authorization Header | ❌ | 15ms |
| Request Payload.Token In Flags Body Matches Init | ❌ | 15ms |
| Request Payload.Groups Round Trip | ❌ | 15ms |
| Request Payload.Groups Default To Empty Object | ❌ | 13ms |
| Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It | ❌ | 12ms |
| Request Payload.Disable Geoip False Propagates As Geoip Disable False | ❌ | 13ms |
| Request Payload.Disable Geoip Omitted Defaults To False | ❌ | 12ms |
| Request Payload.Flag Keys To Evaluate Contains Only Requested Key | ❌ | 14ms |
| Request Lifecycle.No Flags Request On Init Alone | ✅ | 11ms |
| Request Lifecycle.No Flags Request On Normal Capture | ✅ | 24ms |
| Request Lifecycle.Two Flag Calls Produce Two Remote Requests | ❌ | 12ms |
| Request Lifecycle.Mock Response Value Is Returned To Caller | ❌ | 15ms |
| Side Effect Events.Get Feature Flag Captures Feature Flag Called Event | ❌ | 12ms |
Failures
request_payload.request_with_person_properties_device_id
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-7d154a82-5052-498d-9371-1ae9f47b991a'
request_payload.flags_request_uses_v2_query_param
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-76a70551-8047-4113-b3f3-f76a6d4c4ea8'
request_payload.flags_request_hits_flags_path_not_decide
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-6091aff5-9b11-468d-91a7-93ec11f040d4'
request_payload.flags_request_omits_authorization_header
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-c7397306-d943-4631-b558-1a4ce08d77a4'
request_payload.token_in_flags_body_matches_init
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-fa8064ee-f8b9-4d67-8832-03f26a321830'
request_payload.groups_round_trip
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-50f9d928-5a88-41c4-8aca-448833ba4665'
request_payload.groups_default_to_empty_object
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-bc8d7f42-9918-4b7e-88f1-47e157df6996'
request_payload.person_properties_distinct_id_auto_populated_when_caller_omits_it
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-58c903c2-2cdc-437d-bd76-7ef0024125b1'
request_payload.disable_geoip_false_propagates_as_geoip_disable_false
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-5b7cf57c-01f7-4d92-acd4-a2901d38fba3'
request_payload.disable_geoip_omitted_defaults_to_false
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-318c9720-51df-4590-be0e-b8c9b4ca5bbc'
request_payload.flag_keys_to_evaluate_contains_only_requested_key
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-43102c95-8951-4029-ae62-2d26966f18ad'
request_lifecycle.two_flag_calls_produce_two_remote_requests
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-bb878c68-dc2c-48dc-83e0-25d0a0be6ba3'
request_lifecycle.mock_response_value_is_returned_to_caller
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-ad68e721-bfa0-418c-93b3-6365e25ca61e'
side_effect_events.get_feature_flag_captures_feature_flag_called_event
404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-c3a04cbc-8266-4280-ab9e-133cb3b432b6'
|
|
|
|
@cat-ph left a bunch of comments/suggestions mainly to keep a cleaner public API and some 'better' behaviors imo, feel free to push back if you dont think we need that |
|
something my agent told: Global panic-hook tests look fragile The test initializes the real process-wide global client: GLOBAL_CLIENT: OnceLock<Client>That cannot be reset. The comment says it is the only test doing that, but this is easy to break later and can be Better options:
and: Maybe avoid “first frame must be app frame” tests Tests currently assert that frame 0 is the user panic site. That encourages brittle stripping logic. A more robust test would assert:
and: Consider whether standalone clients should have clearer docs capture_panics only affects init_global. Standalone clients require: install_panic_hook(Arc<Client>)That is reasonable, but easy to miss. The docs/changelog should be very explicit:
|
Panics are unrecoverable (the process is unwinding/aborting), so the `$exception_level` of the captured `$exception` is now `fatal` instead of `error`, matching how the other SDKs classify crashes. The shared panic matcher now asserts it.
Per review: the panic stacktrace dropped its leading frames using a hand-maintained function-name list (the panic hook + capture machinery). That list was brittle — it drifted whenever internals were renamed or inlined, and it threw away frames the error-tracking UI can collapse on its own. Stop dropping. `capture_raw_panic_frames` now keeps the full backtrace, and in-app classification marks the machinery as `in_app = false`: extend `default_in_app_function` to cover bare unwind symbols (`rust_begin_unwind`, the `__rust`/`___rust` shims) and the async runtime crates (`futures_core`/`futures_util`), on top of the existing `std`/`core`/`alloc`/`backtrace`/`posthog_rs`/`reqwest`/`tokio`. The user's panic site stays in-app; everything above it is kept but flagged out of app. Drop the now-dead `is_internal_panic_frame` and reframe its test around classification; the panic matcher now asserts the machinery is present and out-of-app rather than expecting frame 0 to be the panic site. The manual `capture_exception` path still trims its own capture frames (separate behavior, left as-is for now).
The `init_global_installs_panic_capture_by_default` unit test initialized the real process-wide `GLOBAL_CLIENT` (a set-once `OnceLock` that can't be reset), which made it order-dependent: any future test that touched the global, or a reordering, could break the suite. It can't be fixed with dependency injection either — the panic hook needs a `'static` client, so only the global `OnceLock` qualifies and there's nothing to mock. Move that end-to-end check to its own integration-test binary (`tests/test_panic_global.rs`), where the `OnceLock` is naturally isolated, and drop the in-process test plus its now-unused `init_global_test` helper. The gate (`should_capture_global_panics`) and the hook mechanism (`install_panic_hook`) stay covered by the in-process unit tests; the integration test only exercises the `init_global` -> hook wiring, and runs across all error-tracking feature configs (V0/V1 x async/blocking).
|
I resolved some previous thread pre-emptively, but some updates:
and them some remaining discussions/thoughts:
|
cool stuff, lets keep that way then |
|
Add a test that the actual panic-site frame is in_app = true The current test checks:
But it does not verify that the user panic-site frame is marked in_app = true. I’d ask for this, ideally in tests/test_panic_global.rs, because that panic originates from an integration-test crate, Something like: frames.iter().any(|frame| {
frame["function"].as_str().is_some_and(|f| f.contains("...panic site..."))
&& frame["in_app"] == true
}) |
|
@cat-ph great work and nice addition |
Addresses review feedback on the panic-autocapture API surface: - Global-only: a panic hook is process-global (std::panic::set_hook), so it pairs with the process-global client. Dropped the public install_panic_hook(Arc<Client>) — it exposed an Arc/'static lifetime wart and muddy multi-client semantics for a niche case. The hook now installs solely via the global client; install_panic_hook is kept #[cfg(test)] to exercise the shared hook path against a standalone client. (Re-exposable later without a breaking change if a real need appears.) - Opt-in: ErrorTrackingOptions::capture_panics now defaults to false. Panic autocapture is enabled explicitly, matching PostHog's other SDKs (e.g. Android exception autocapture is opt-in) and avoiding seizing the process panic hook unsolicited. - Fixed flush timeout: removed the public panic_flush_timeout_ms option in favor of a private 2s PANIC_FLUSH_TIMEOUT const. The dying-thread flush budget isn't something callers should tune for now; easy to expose later. Regenerated the public-api snapshot (drops install_panic_hook and the panic_flush_timeout_ms builder setter), updated the changeset and the init_global / option docs, and adjusted the affected tests (the integration test now opts in via capture_panics(true); the bounded-flush test relies on the fixed 2s timeout).
…ite is in-app - Extend the default in-app classifier so framework/error-plumbing frames (`tracing`, `tracing_core`, `log`, `anyhow`) are marked `in_app = false`, alongside the existing std/core/alloc/backtrace/futures/posthog_rs/reqwest/ tokio set, per review. - Add the missing positive assertion: the integration test now panics from a named site in the test crate (not `posthog_rs`) and verifies that frame is kept and marked `in_app = true` — the in-process unit tests can't show this because a panic there originates inside `posthog_rs` (classified out-of-app).
Removing the `panic_flush_timeout_ms` option meant the bounded-flush tests (which deadlock `before_send` on purpose) waited the full production budget, adding ~2s of wall-clock each. Split `PANIC_FLUSH_TIMEOUT` by `cfg(test)`: 200ms under test, 2s in production. The tests only need to prove the flush is bounded, not the production duration, so this restores their speed without weakening coverage.
|
aight last few small changes to simplify actually, I think it's better:
will test again thoroughly tomorrow (today? 😆) & merge unless any objections 🚀 |
Panic autocapture (#157) was squash-merged and released as v0.15.0, and its final form diverged from the snapshot this branch was built on. Reconcile so main's panic implementation is authoritative while keeping this branch's native instruction-address + debug-image capture: - Panic capture keeps every frame (in_app marks SDK/runtime), per main, but now also collects the $debug_images those frames reference so the server can symbolicate native panic frames. - Manual capture keeps the address+name SDK-frame stripping plus native capture. - Drop this branch's stale panic variants superseded by main: opt-out default, configurable panic_flush_timeout_ms, public install_panic_hook (now #[cfg(test)]), and is_internal_panic_frame (panics no longer strip). - Drop the already-released panic-autocapture changeset; keep the unreleased native-symbolication changeset. - Tests: union of main's panic/in-app tests and this branch's native tests; public-api.txt matches main (native adds no public symbols).
|
|
||
| let mut event = Event::new_anon("$exception"); | ||
| if let Some(location) = panic_info.location() { | ||
| event.insert_prop("$exception_panic_file", location.file())?; |
There was a problem hiding this comment.
Just to understand, why adding those properties ? They won't be displayed on the UI and they are not processed, are they ?
What
Adds panic autocapture to the Rust SDK, rebuilt on top of the now-merged runtime-independent transport.
When the global client is initialized with
init_global, a process-widestd::panichook is installed by default — crash reporting, on by default. It captures each panic as a personless$exceptionevent (the panic payload, panic-site$exception_panic_file/_line/_column, and a call-site stack trace honoringcapture_stacktrace), then chains the previously installed hook.Capture routes through the transport's background worker, so:
before_sendhooks still apply; andbefore_sendis a joinable worker-thread panic, not a double-panic process abort.Public API
ErrorTrackingOptions::capture_panics(defaulttrue) — setfalseto opt out of the automatic global install.ErrorTrackingOptions::panic_flush_timeout_ms(default2000) — how long the hook blocks the panicking thread waiting for the flush before letting the panic proceed. Kept short and separate fromshutdown_timeout_msso a slow/unreachable endpoint can't freeze the crash or delay its panic message.install_panic_hook(Arc<Client>)— install the hook for a standalone (non-global) client. It takes anArcbecause the hook is'static; a disabled client is a no-op and never latches the single process-wide hook.Error::PanicHookAlreadyInstalled.Notes
$exceptionis flushed ahead of queued retries, but under sustained transport backpressure (slow/unreachable endpoint or a large capture backlog) it may not be sent within the bound before the process exits. A true queue-jumping priority path is better suited to the sender-pool rework than this std-mpsc worker.before_sendrun on the worker.Testing
cargo testacross the four feature configs (default,capture-v1,--no-default-features --features error-tracking±capture-v1);clippy -D warnings;cargo doc; public-api snapshot regenerated.tested on a devbox