Skip to content

feat: add on_error client hook for terminal batch delivery failures#159

Open
eli-r-ph wants to merge 2 commits into
mainfrom
feat/on-error-hook
Open

feat: add on_error client hook for terminal batch delivery failures#159
eli-r-ph wants to merge 2 commits into
mainfrom
feat/on-error-hook

Conversation

@eli-r-ph

@eli-r-ph eli-r-ph commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

💡 Motivation and Context

This PR adds an additive, observability-only on_error hook so callers can react to terminal delivery failures without reverting to a blocking capture(). Purely additive change. It fires only if a handler is registered by the caller, and only when a response is non-2xx or all retries are exhausted. The v1 variant will expose the rich detail that endpoint returns so the caller can debug event delivery issues quickly and at the granularity they want.

The background transport (introduced in 0.14.0) made capture()/capture_batch() fire-and-forget. That decoupled the API call from network I/O, but it also removed the only signal callers had that a batch was permanently lost — a permanent reject (e.g. 400/402) or a transient failure that exhausts the retry budget. Today the SDK only emits a WARN and drops the events; there is no programmatic hook to log, count, or alert on dropped data.

  • New ClientOptionsBuilder::on_error(...), plus public OnErrorHook and CaptureFailure types.
  • CaptureFailure exposes error() (the batch cause), event_count(), attempt(), and historical_migration().
  • The hook fires once per terminal batch on the background worker:
    • permanent reject, or exhausted retry budget (error() is Some);
    • on the V1 capture pipeline, also when a request was 2xx but per-event verdicts (retry/drop) leave events unpersisted after the retry budget (error() is None). In that case the feature-gated CaptureFailure::event_results() exposes the per-event server verdicts.
  • Registering at least one hook silences only the default drop WARN it replaces; all other drop warnings (shutdown, serialization, queue-full) are unchanged.
  • Hook panics are caught so a misbehaving hook cannot wedge the worker thread; multiple hooks fire in registration order.

This is purely additive — no existing signature changes. V1-specific behavior is implemented and tested but its public rustdoc is cfg-gated (invisible on the default docs.rs build); full v1 documentation is deferred to the v1 cutover.

💚 How did you test it?

New unit tests in src/client/transport.rs (driven by the existing ManualClock/httpmock harness) and src/client/common.rs, plus a public-API integration test in tests/test_transport.rs. Coverage:

  • fires exactly once on a permanent reject and on retry exhaustion (correct attempt/error/event_count);
  • does not fire on success or on a retry that eventually succeeds;
  • V1: fires on a drop and on an exhausted retry within a 2xx (error() is None, event_results() populated), and does not fire for ok/warning verdicts;
  • hooks run in registration order and a panicking hook is isolated; the worker survives and keeps delivering.

Ran the full CI matrix locally (default, capture-v1, the --no-default-features combos, and the e2e-test config) — all green — plus cargo fmt --check, cargo clippy -- -D warnings, cargo doc, and regenerated api/public-api.txt via scripts/check-public-api.sh --update.

📝 Checklist

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • I updated the docs if needed.
  • No breaking change or entry added to the changelog.

If releasing new changes

  • Ran sampo add to generate a changeset file

The background transport is fire-and-forget, so capture() callers have no
way to observe when a batch is permanently rejected or exhausts its retry
budget. Add an additive on_error hook on ClientOptionsBuilder, invoked
once per terminal batch with a CaptureFailure (cause, event count,
attempt). On the V1 capture pipeline it also fires for a 2xx whose
per-event verdicts leave events unpersisted after retries, exposing the
per-event results. Registering a hook silences only the default drop WARN
that it replaces; all other drop warnings remain. Hook panics are caught
so a misbehaving hook cannot wedge the worker thread.
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

posthog-rs-v0 Compliance Report

Date: 2026-06-25 22:09:27 UTC
Duration: 15562ms

⚠️ Some Tests Failed

31/45 tests passed, 14 failed


Capture Tests

29/29 tests passed

View Details
Test Status Duration
Format Validation.Event Has Required Fields 149ms
Format Validation.Event Has Uuid 149ms
Format Validation.Event Has Lib Properties 158ms
Format Validation.Distinct Id Is String 158ms
Format Validation.Token Is Present 158ms
Format Validation.Custom Properties Preserved 158ms
Format Validation.Event Has Timestamp 159ms
Retry Behavior.Retries On 503 5165ms
Retry Behavior.Does Not Retry On 400 2161ms
Retry Behavior.Does Not Retry On 401 2149ms
Retry Behavior.Respects Retry After Header 5094ms
Retry Behavior.Implements Backoff 15112ms
Retry Behavior.Retries On 500 5098ms
Retry Behavior.Retries On 502 5098ms
Retry Behavior.Retries On 504 5097ms
Retry Behavior.Max Retries Respected 15100ms
Deduplication.Generates Unique Uuids 103ms
Deduplication.Preserves Uuid On Retry 5024ms
Deduplication.Preserves Uuid And Timestamp On Retry 10035ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry 5036ms
Deduplication.No Duplicate Events In Batch 26ms
Deduplication.Different Events Have Different Uuids 26ms
Compression.Sends Gzip When Enabled 24ms
Batch Format.Uses Proper Batch Structure 71ms
Batch Format.Flush With No Events Sends Nothing 24ms
Batch Format.Multiple Events Batched Together 130ms
Error Handling.Does Not Retry On 403 2091ms
Error Handling.Does Not Retry On 413 2089ms
Error Handling.Retries On 408 5095ms

Feature_Flags Tests

⚠️ 2/16 tests passed, 14 failed

View Details
Test Status Duration
Request Payload.Request With Person Properties Device Id 54ms
Request Payload.Flags Request Uses V2 Query Param 45ms
Request Payload.Flags Request Hits Flags Path Not Decide 31ms
Request Payload.Flags Request Omits Authorization Header 30ms
Request Payload.Token In Flags Body Matches Init 38ms
Request Payload.Groups Round Trip 37ms
Request Payload.Groups Default To Empty Object 26ms
Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It 52ms
Request Payload.Disable Geoip False Propagates As Geoip Disable False 25ms
Request Payload.Disable Geoip Omitted Defaults To False 43ms
Request Payload.Flag Keys To Evaluate Contains Only Requested Key 50ms
Request Lifecycle.No Flags Request On Init Alone 24ms
Request Lifecycle.No Flags Request On Normal Capture 63ms
Request Lifecycle.Two Flag Calls Produce Two Remote Requests 26ms
Request Lifecycle.Mock Response Value Is Returned To Caller 38ms
Side Effect Events.Get Feature Flag Captures Feature Flag Called Event 22ms

Failures

request_payload.request_with_person_properties_device_id

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-6e61ff87-75e2-456d-b74e-737e77c0744e'

request_payload.flags_request_uses_v2_query_param

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-c20db351-cbf9-467e-ae17-0748575c0ecf'

request_payload.flags_request_hits_flags_path_not_decide

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-8f7957ce-426d-4891-8ca9-986f9b4f723e'

request_payload.flags_request_omits_authorization_header

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-98e4af7a-e8b1-4530-83d4-5fc2c031a008'

request_payload.token_in_flags_body_matches_init

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-4b949964-06ff-4bdc-a533-ad3ceeba9c5c'

request_payload.groups_round_trip

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-cea226c3-3d8c-4e51-ad46-81584e3e2f73'

request_payload.groups_default_to_empty_object

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-9b4af4a8-98c8-4484-b3ec-265ffbc24d83'

request_payload.person_properties_distinct_id_auto_populated_when_caller_omits_it

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-dbb22101-a114-4727-9e62-d218ad7c2937'

request_payload.disable_geoip_false_propagates_as_geoip_disable_false

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-123f795d-3d3b-49e6-b2ae-0048cb256309'

request_payload.disable_geoip_omitted_defaults_to_false

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-8b48fabb-c1d1-4d64-9b44-afaa3da7ed14'

request_payload.flag_keys_to_evaluate_contains_only_requested_key

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-9b8cd05e-b450-4219-901a-b42e9723a338'

request_lifecycle.two_flag_calls_produce_two_remote_requests

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-8ac3f49a-72bc-4e0e-a63a-1b4492cdd79f'

request_lifecycle.mock_response_value_is_returned_to_caller

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-71a1bf5b-79f3-4ebf-bd44-8625cf77f0fe'

side_effect_events.get_feature_flag_captures_feature_flag_called_event

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-95593813-19fc-4a4e-a37e-d08d2f0158f2'

@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown

Comments Outside Diff (1)

  1. src/client/mod.rs, line 280-289 (link)

    P2 event_results() may include ok/warning verdicts when error is None

    event_count() counts only undelivered events (Retry | Drop), but results is built from batch.final_results, which accumulates all verdicts including ok and warning. In a mixed batch (some events succeed, some drop), failure.event_results().len() will be greater than failure.event_count(), and a hook iterating event_results() to process failures will also process successful events unexpectedly.

    The doc says "Complete when error is None" but doesn't note that it includes non-failure verdicts. No test covers the mixed-verdict case either. Consider filtering final_results to only Retry | Drop entries before passing to CaptureFailure, or clarifying in the doc that callers must filter by verdict status themselves.

Reviews (1): Last reviewed commit: "feat: add on_error client hook for termi..." | Re-trigger Greptile

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

posthog-rs-v1 Compliance Report

Date: 2026-06-25 22:09:25 UTC
Duration: 23000ms

⚠️ Some Tests Failed

96/110 tests passed, 14 failed


Capture_V1 Tests

94/94 tests passed

View Details
Test Status Duration
Endpoint And Method.Targets V1 Endpoint 136ms
Endpoint And Method.Does Not Use Legacy Endpoints 134ms
Required Headers.Has Authorization Bearer Header 136ms
Required Headers.Has Content Type Json 135ms
Required Headers.Has Posthog Sdk Info Format 135ms
Required Headers.Has Posthog Attempt Header 134ms
Required Headers.Has Posthog Request Id 134ms
Required Headers.Has Posthog Request Timestamp 135ms
Required Headers.Has User Agent 133ms
Body Format.Body Has Created At And Batch 137ms
Body Format.No Api Key In Body 141ms
Body Format.No Sent At In Body 141ms
Event Format.Event Has Required Root Fields 140ms
Event Format.Event Uuid Is Valid 140ms
Event Format.Event Timestamp Is Rfc3339 140ms
Event Format.Distinct Id Is String 142ms
Event Format.Distinct Id At Root Not Properties 140ms
Event Format.Custom Properties Preserved 141ms
Event Format.Set Properties Preserved 140ms
Event Format.Set Once Properties Preserved 142ms
Event Format.Groups Properties Preserved 127ms
Event Format.Sdk Generates Uuid If Not Provided 129ms
Event Format.Event Has Required Root Fields Batch 165ms
Event Format.Event Uuid Is Valid Batch 164ms
Event Format.Event Timestamp Is Rfc3339 Batch 164ms
Event Format.Distinct Id Is String Batch 172ms
Event Format.Distinct Id At Root Not Properties Batch 164ms
Event Format.Custom Properties Preserved Batch 163ms
Event Format.Set Properties Preserved Batch 159ms
Event Format.Set Once Properties Preserved Batch 168ms
Event Format.Groups Properties Preserved Batch 141ms
Event Format.Sdk Generates Uuid If Not Provided Batch 147ms
Batch Behavior.Multiple Events In Single Batch 176ms
Batch Behavior.Batch Envelope Smoke 142ms
Batch Behavior.Flush With No Events Sends Nothing 91ms
Batch Behavior.Flush At Triggers Batch 1125ms
Batch Behavior.Created At Reflects Batch Creation Time 112ms
Deduplication.Generates Unique Uuids 199ms
Deduplication.Different Events Same Content Different Uuids 123ms
Deduplication.Preserves Uuid On Retry 5108ms
Deduplication.Preserves Timestamp On Retry 5077ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry 5139ms
Deduplication.No Duplicate Events In Batch 140ms
Header Behavior On Retry.Attempt Header Starts At One 87ms
Header Behavior On Retry.Attempt Header Increments On Retry 10088ms
Header Behavior On Retry.Request Id Preserved On Retry 5079ms
Header Behavior On Retry.Different Requests Have Different Request Ids 2076ms
Header Behavior On Retry.Request Timestamp Changes On Retry 5057ms
Response Format Validation.Success Response Has Uuid Keyed Results 46ms
Response Format Validation.Success Response Has Ok For Each Event 34ms
Response Format Validation.Success No Retry After When All Ok 34ms
Response Format Validation.Success Retry After Present When Retry Events 35ms
Response Format Validation.Success No Retry After When Drop Only 33ms
Response Format Validation.Response Echoes Request Id 27ms
Retry Behavior.Retries On 408 5035ms
Retry Behavior.Retries On 500 5027ms
Retry Behavior.Retries On 503 5024ms
Retry Behavior.Retries On 504 5025ms
Retry Behavior.Retryable Errors Have Retry After 2025ms
Retry Behavior.Respects Retry After On Retryable Error 8024ms
Retry Behavior.Does Not Retry On 400 2027ms
Retry Behavior.Does Not Retry On 401 2025ms
Retry Behavior.Does Not Retry On 402 2025ms
Retry Behavior.Does Not Retry On 413 2023ms
Retry Behavior.Does Not Retry On 415 2023ms
Retry Behavior.Non Retryable Errors Have No Retry After 2023ms
Retry Behavior.Implements Backoff 15030ms
Retry Behavior.Max Retries Respected 15024ms
Partial Batch Handling.Handles 200 Full Success 2036ms
Partial Batch Handling.Handles 200 With All Ok 3039ms
Partial Batch Handling.Does Not Retry Dropped Events 3036ms
Partial Batch Handling.Does Not Retry Limited Events 3033ms
Partial Batch Handling.Prunes Ok Events On Partial Retry 5030ms
Partial Batch Handling.Prunes Dropped Events On Partial Retry 5023ms
Partial Batch Handling.Retries Only Retry Events From Partial 5026ms
Partial Batch Handling.Partial Retry Preserves Uuids 5024ms
Partial Batch Handling.Partial Retry Attempt Header Increments 5030ms
Partial Batch Handling.Partial Retry Request Id Preserved 5031ms
Partial Batch Handling.Respects Retry After On Partial 5025ms
Partial Batch Handling.Unknown Result Treated As Terminal 3026ms
Partial Batch Handling.Mixed Ok Drop Limited No Retry 3033ms
Compression.Sends Gzip Content Encoding 34ms
Compression.No Content Encoding When Disabled 22ms
Compression.Compressed Body Is Decompressible 24ms
Error Handling.Does Not Retry On Unknown 4Xx 2023ms
Event Options.Cookieless Mode Override 23ms
Event Options.Disable Skew Correction Override 22ms
Event Options.Process Person Profile Override 24ms
Event Options.Product Tour Id Override 22ms
Event Options.Unset Options Omitted 22ms
Event Options.Options Override In Batch 22ms
Geoip And Historical Migration.Geoip Disable Injected Into Properties 21ms
Geoip And Historical Migration.Historical Migration Set In Body 22ms
Geoip And Historical Migration.Historical Migration Absent By Default 21ms

Feature_Flags Tests

⚠️ 2/16 tests passed, 14 failed

View Details
Test Status Duration
Request Payload.Request With Person Properties Device Id 13ms
Request Payload.Flags Request Uses V2 Query Param 12ms
Request Payload.Flags Request Hits Flags Path Not Decide 12ms
Request Payload.Flags Request Omits Authorization Header 14ms
Request Payload.Token In Flags Body Matches Init 13ms
Request Payload.Groups Round Trip 12ms
Request Payload.Groups Default To Empty Object 12ms
Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It 13ms
Request Payload.Disable Geoip False Propagates As Geoip Disable False 15ms
Request Payload.Disable Geoip Omitted Defaults To False 15ms
Request Payload.Flag Keys To Evaluate Contains Only Requested Key 13ms
Request Lifecycle.No Flags Request On Init Alone 13ms
Request Lifecycle.No Flags Request On Normal Capture 23ms
Request Lifecycle.Two Flag Calls Produce Two Remote Requests 12ms
Request Lifecycle.Mock Response Value Is Returned To Caller 12ms
Side Effect Events.Get Feature Flag Captures Feature Flag Called Event 15ms

Failures

request_payload.request_with_person_properties_device_id

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-802d8bf2-bdce-4055-bcb5-ff7a5feb483c'

request_payload.flags_request_uses_v2_query_param

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-d44a26b1-9b49-46a8-b956-43800ebf4acf'

request_payload.flags_request_hits_flags_path_not_decide

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-9f3cfb7d-7d0d-433e-a659-e88124abdca3'

request_payload.flags_request_omits_authorization_header

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-ce5ac4dd-2563-4419-b03c-4c18b7b0603c'

request_payload.token_in_flags_body_matches_init

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-10e846d5-6aea-4f17-90a9-174e17076aef'

request_payload.groups_round_trip

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-53d2ab22-5370-4963-b45b-ecd80971b1db'

request_payload.groups_default_to_empty_object

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-1bf61230-4aed-4e12-92a6-5b6cab99d233'

request_payload.person_properties_distinct_id_auto_populated_when_caller_omits_it

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-9edd5342-3e31-481b-b3a3-1ec48f9298d8'

request_payload.disable_geoip_false_propagates_as_geoip_disable_false

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-a5b693b4-5f1d-47df-b714-8d3bd4a66fd0'

request_payload.disable_geoip_omitted_defaults_to_false

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-2da109df-daf4-4e7e-a04b-978f96ab4417'

request_payload.flag_keys_to_evaluate_contains_only_requested_key

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-8e822219-cfd7-4224-9269-599212427d76'

request_lifecycle.two_flag_calls_produce_two_remote_requests

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-01aeddb2-2229-46e2-8def-69638dda1e7e'

request_lifecycle.mock_response_value_is_returned_to_caller

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-5f22d857-338a-48e1-bfc3-c5ba6de4e36c'

side_effect_events.get_feature_flag_captures_feature_flag_called_event

404, message='Not Found', url='http://sdk-adapter:8080/get_feature_flag?test_id=t-4f6fbd29-2917-4203-9814-cb5de0ab2014'

Comment thread src/client/transport.rs
@eli-r-ph eli-r-ph self-assigned this Jun 25, 2026
…tests

Address review feedback on the on_error hook:
- Document that CaptureFailure::event_results() reports all per-event
  verdicts (persisted ok/warning included), so it can be larger than
  event_count(), which counts only lost (retry/drop) events. Callers
  must filter by status.
- Add a V1 mixed-verdict test (ok + warning + drop) locking that
  event_count counts only the loss while event_results carries every
  verdict.
- Collapse the two "fires once on terminal failure" transport tests into
  one table-driven test (permanent reject vs exhausted retries), matching
  the repo preference for parameterized tests.
@eli-r-ph

Copy link
Copy Markdown
Contributor Author

Thanks @greptile-apps — addressed both in 09a702d:

  1. event_results() includes ok/warning: intentional — the map is meant to expose the full per-event server response so callers can introspect every verdict, not just losses. Rather than filter it down (which would discard that signal), I clarified the docs: event_results() now documents that it reports all verdicts and can be larger than event_count() (which counts only retry/drop), and event_count() cross-references it. Added a V1 mixed-verdict test (on_error_v1_event_results_includes_persisted_verdicts: ok + warning + drop) asserting event_count == 1 while event_results() carries all three.

  2. Parameterize the transport tests: collapsed the two "fires once on terminal failure" tests into one table-driven on_error_fires_once_on_terminal_failure (permanent-reject vs exhausted-retries). Left the two "not fired" tests separate — they have genuinely different orchestration (one swaps the mock mid-flight to a 200 to prove a retry that succeeds does not fire), so a shared table would obscure rather than clarify.

Note: the failing Feature_Flags compliance checks are unrelated to this PR — they are 404s from the test adapter /get_feature_flag route; the v0/v1 capture suites are 29/29 and 94/94 green.

@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown

Both items are properly addressed:

  1. event_results() docs — the clarification is solid. Documenting that event_results() carries all verdicts and can exceed event_count() (which only counts losses) is the right call, and the mixed-verdict test (ok + warning + drop, event_count == 1, all three in event_results()) locks in the contract precisely.

  2. Parameterized tests — collapsing the two "fires once" cases into a table-driven test makes sense, and the reasoning for keeping the "not fired" tests separate is sound. The mid-flight mock swap in on_error_not_fired_when_retry_eventually_succeeds really does have distinct orchestration that a shared table would obscure.

LGTM on both counts. 👍

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

@eli-r-ph eli-r-ph marked this pull request as ready for review June 25, 2026 22:24
@eli-r-ph eli-r-ph requested a review from a team as a code owner June 25, 2026 22:24
@eli-r-ph eli-r-ph requested review from a team and cat-ph and removed request for a team June 25, 2026 22:24
@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown

Reviews (2): Last reviewed commit: "chore: clarify event_results vs event_co..." | Re-trigger Greptile

@dustinbyrne dustinbyrne left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of small comments below, otherwise looks good

Comment thread src/client/mod.rs
Comment on lines +108 to +117
pub struct OnErrorHook(SharedOnErrorHook);

impl OnErrorHook {
/// Create a new on-error hook.
pub fn new<F>(hook: F) -> Self
where
F: FnMut(&CaptureFailure<'_>) + Send + 'static,
{
Self(Arc::new(Mutex::new(Box::new(hook))))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like callers need to reference or instantiate OnErrorHook. can the struct/new visibility be downgraded to pub(crate)?

Comment thread src/client/transport.rs
Step::Fail(e) => {
warn!("posthog-rs: dropping {} event(s): {e}", batch.pending.len());
dec_len(&self.len, batch.pending.len());
let count = batch.pending.len();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: a mixed retry + drop response followed by a Step::Fail result can make event_count() under-report total lost events, because earlier drop verdicts are removed from pending and only remain in event_results().

@marandaneto marandaneto left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should make this more generic since we have more APIs and more types of errors
@ioannisj since hes planning to work on SDK observability
otherwise we'd need one hook per API call, or error type
JS has on_request_error
RN has:

  const unsubscribe = posthog.on('error', (err) => {
    console.log('PostHog SDK error', err)
  })

so not sold on either of those but i think we'd need to think about that first
whats your goal here @eli-r-ph ?
what will you do with this data?
we should also not emit the errors and then capture the error ourselves, this would go into an endless loop if our SDK is buggy or having an incident

@ioannisj

Copy link
Copy Markdown

i think we should make this more generic since we have more APIs and more types of errors @ioannisj since hes planning to work on SDK observability otherwise we'd need one hook per API call, or error type JS has on_request_error RN has:

  const unsubscribe = posthog.on('error', (err) => {
    console.log('PostHog SDK error', err)
  })

so not sold on either of those but i think we'd need to think about that first whats your goal here @eli-r-ph ? what will you do with this data?

Agreed on making this hook more generic. I think it's already wide enough (we can call on_error from other places as well), we maybe just need to rename CaptureFailure to a more neutral type?

we should also not emit the errors and then capture the error ourselves, this would go into an endless loop if our SDK is buggy or having an incident

I don't see how this PR will suffer from this since this is just calling a non-blocking hook for SDK use observability. For now I think we are okay here

@eli-r-ph

Copy link
Copy Markdown
Contributor Author

👋 hey folks. I am just hoping to restore the error visibility we had prior to the transport change in the 0.14.x release. My e2e smoke tests broke for capture v1 on this release because of the loss of client-side error exposure.

I think the error hook pattern is nicer than what we had prior to the transport change anyway, well suited to the new queued async batch submissions in this SDK, and would allow us to expose more actionable/granular fail modes to the end user that capture v1 offers.

A more generic error handler for the whole SDK makes sense to me too. If we could at least surface a request-level error, it gets us back to parity with what we had before on 0.13.x release, and would solve my immediate testing gap.

I agree with @ioannisj about the feedback loop risk. posthog-go already has a similar mechanism with Callback.Failure and no one's run afoul of the loop risk yet AFAIK (?) but I agree it's worth documenting these risks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants