From 0edece42beac258733b04aca1f6893cc8a085d51 Mon Sep 17 00:00:00 2001 From: araf-statsig Date: Tue, 24 Feb 2026 17:40:58 -0800 Subject: [PATCH 1/3] client event logger --- messages/clientEventLogger.mdx | 63 ++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 messages/clientEventLogger.mdx diff --git a/messages/clientEventLogger.mdx b/messages/clientEventLogger.mdx new file mode 100644 index 000000000..a401eb9cf --- /dev/null +++ b/messages/clientEventLogger.mdx @@ -0,0 +1,63 @@ +--- +title: Client Event Logger Redesign +--- + +Version 3.31.3 of all client SDK's introduces a new event logger architecture focused on bounded queues, smart retry/backoff, and improved flush mechanisms. + +## What Changed + +- Logging now uses coordinated batching/scheduling instead of a single queue. +- Retry behavior is split into: + - request-level retries (inside one send) + - batch-level retries (requeue + scheduled retry) +- Queue growth is explicitly bounded. Under sustained pressure, events may be dropped by design to protect stability. + +## Architecture (High Level) + +- `PendingEvents`: in-memory collection of newly logged events. +- `BatchQueue`: queue of batched events waiting to send. +- `FlushCoordinator`: controls flush timing and modes. +- `EventSender`: performs network sends and emits flush lifecycle events. +- `FlushInterval`: manages cooldown/backoff timing. + +## Flush Mechanisms + +- `Limit flush`: flush when a full batch is reached. Performs opportunistic draining. Limit flush will keep flushing as long as each send over network is succesful. It will fall into backoff upon failure. +- `Scheduled:full-batch flush`: scheduler flushes full batches when cooldown allows. +- `Scheduled:max-time flush`: scheduler flushes partial batches when max interval (60s) is reached. +- `Manual flush`: explicit `client.flush()`. +- `Shutdown flush`: best-effort on shutdown on explicit `client.shutdown()`, with persistence for shutdown-failed events in local storage. +- `Quick flush`: startup optimization for first-event latency. + +## Retry Nuances + +- A single send can retry at the network layer. +- Failed batches are also requeued and retried later. + - Non-retryable errors are not requeued and are dropped + - Each batch gets 3 retries and are dropped after exceeding that threshold +- Backoff adjusts with success/failure and affects scheduled flush timing. + +## Drop Scenarios (Important) + +- Batch queue overflow during batching/requeue. if there are more events in the batch queue than the capacity, the oldest batches are dropped. + - queue capacity is batch size (default: 100) * max number of batches (30) + - You can increase the queue capacity by increasing the batch size with the option `loggingBufferMaxSize` +- Non-retryable network failure. +- Max reties exceeded. +- Storage persistence failure (disabled/shutdown paths). +- Persisted-event cap exceeded (oldest events trimmed). Local storage has a hard maximum of 500 events. + +This is an intentional tradeoff: bounded memory and predictable behavior over unbounded queue growth. + +## Behavioral Impact for Upgrades + +- Under very high throughput or long outages, event loss is possible by design. +- Logging-disabled behavior defers/stores events and loads them when logging is re-enabled. + +## Recommended Post-Upgrade Validation + +- Validate flush modes in your environment: `limit`, `scheduled (full + max-time)`, `manual`, `shutdown`. +- Validate lifecycle/error observability: + - `pre_logs_flushed` + - `logs_flushed` + - error/dropped-event signals From 134cabab7a1577e58b808aa6c354bc0a2c7c0d56 Mon Sep 17 00:00:00 2001 From: araf-statsig Date: Tue, 24 Feb 2026 19:27:58 -0800 Subject: [PATCH 2/3] edits --- messages/clientEventLogger.mdx | 33 +++++++++++++-------------------- 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/messages/clientEventLogger.mdx b/messages/clientEventLogger.mdx index a401eb9cf..ecffd8b42 100644 --- a/messages/clientEventLogger.mdx +++ b/messages/clientEventLogger.mdx @@ -2,15 +2,16 @@ title: Client Event Logger Redesign --- -Version 3.31.3 of all client SDK's introduces a new event logger architecture focused on bounded queues, smart retry/backoff, and improved flush mechanisms. +Version 3.32.0 of all client SDK's introduces a new event logger architecture focused on smart retry/backoff, improved batching, bounded queues, and new flush mechanisms. ## What Changed - Logging now uses coordinated batching/scheduling instead of a single queue. -- Retry behavior is split into: - - request-level retries (inside one send) +- Retry behavior is now coordinated between batched events - batch-level retries (requeue + scheduled retry) - Queue growth is explicitly bounded. Under sustained pressure, events may be dropped by design to protect stability. +- Limit Flushing and Scheduled Flushing due to size and time have been added +- The `loggingIntervalMs` option has been deprecated. ## Architecture (High Level) @@ -22,18 +23,17 @@ Version 3.31.3 of all client SDK's introduces a new event logger architecture fo ## Flush Mechanisms -- `Limit flush`: flush when a full batch is reached. Performs opportunistic draining. Limit flush will keep flushing as long as each send over network is succesful. It will fall into backoff upon failure. +- `Limit flush`: flushes when a full batch is reached and backoff is satisfied. Limit flush performs opportunistic draining. Limit flush will keep flushing as long as each send over network is succesful. It will fall into backoff upon failure. - `Scheduled:full-batch flush`: scheduler flushes full batches when cooldown allows. - `Scheduled:max-time flush`: scheduler flushes partial batches when max interval (60s) is reached. - `Manual flush`: explicit `client.flush()`. - `Shutdown flush`: best-effort on shutdown on explicit `client.shutdown()`, with persistence for shutdown-failed events in local storage. -- `Quick flush`: startup optimization for first-event latency. +- `Quick flush`: startup optimization for first-event latency. Flushes within 200ms window. ## Retry Nuances -- A single send can retry at the network layer. -- Failed batches are also requeued and retried later. - - Non-retryable errors are not requeued and are dropped +- Failed batches are requeued and retried. + - Non-retryable errors are **not** requeued and are dropped. - Each batch gets 3 retries and are dropped after exceeding that threshold - Backoff adjusts with success/failure and affects scheduled flush timing. @@ -42,22 +42,15 @@ Version 3.31.3 of all client SDK's introduces a new event logger architecture fo - Batch queue overflow during batching/requeue. if there are more events in the batch queue than the capacity, the oldest batches are dropped. - queue capacity is batch size (default: 100) * max number of batches (30) - You can increase the queue capacity by increasing the batch size with the option `loggingBufferMaxSize` +- If the batch queue is full and we fail to requeue a failed batch, the entire batch is dropped. - Non-retryable network failure. -- Max reties exceeded. +- Max retries exceeded. - Storage persistence failure (disabled/shutdown paths). - Persisted-event cap exceeded (oldest events trimmed). Local storage has a hard maximum of 500 events. -This is an intentional tradeoff: bounded memory and predictable behavior over unbounded queue growth. - ## Behavioral Impact for Upgrades - Under very high throughput or long outages, event loss is possible by design. -- Logging-disabled behavior defers/stores events and loads them when logging is re-enabled. - -## Recommended Post-Upgrade Validation - -- Validate flush modes in your environment: `limit`, `scheduled (full + max-time)`, `manual`, `shutdown`. -- Validate lifecycle/error observability: - - `pre_logs_flushed` - - `logs_flushed` - - error/dropped-event signals + - According to your throughput and logging volume, adjust your batch size to avoid drooping events due to queue size limits. If you require higher throughput, contact Statsig Support. +- Non-retryable errors will drop events. +- Flushing cadence has changed. Instead of an adjustable background tick controlled flush, there are new flushing mechanisms which changes the flushing cadence and pattern. \ No newline at end of file From f1c28ac4c1d316c65a62c6dd29133404211fd94d Mon Sep 17 00:00:00 2001 From: araf-statsig Date: Wed, 25 Feb 2026 11:41:46 -0800 Subject: [PATCH 3/3] update call sites --- client/introduction.mdx | 2 +- sdks/client-vs-server.mdx | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/client/introduction.mdx b/client/introduction.mdx index 9c08bfaf0..8e894ae62 100644 --- a/client/introduction.mdx +++ b/client/introduction.mdx @@ -79,7 +79,7 @@ If you have critical fields which define a user uniquely and are not in the `use - The client SDK's `logEvent` call takes a custom event that you want to record to analyze the impact of the experiment on your end-user experience -- The client SDK automatically flushes all accumulated logged custom events to the Statsig servers every 10 seconds +- The client SDK automatically flushes all accumulated logged custom events to the Statsig servers every 60 seconds. [More Details] (https://docs.statsig.com/messages/clientEventLogger) - Statsig uses these custom events to compute metrics as part of your experiment **Results**; Statsig automatically updates the **Metrics Lift** panel in the experiment **Results** tab daily around 9am PST diff --git a/sdks/client-vs-server.mdx b/sdks/client-vs-server.mdx index 8a3e21c70..e009421f8 100644 --- a/sdks/client-vs-server.mdx +++ b/sdks/client-vs-server.mdx @@ -42,7 +42,7 @@ After initialization, both Client and Server SDKs evaluate experiments/gates *wi |Checking an Experiment| Requires a user object which is evaluated locally (without a network request) against a ruleset persisted in memory | Does not require a user object, uses a dictionary lookup for values fetched during initialize() | |User Identifiers| Pass any and all useful user identifiers | Pass any useful identifiers, the SDK also generates a "StableID", Statsig's anonymous ID you can use to experiment on a user per-device| |Logging Events| Requires a user object | Does not require a user object. Note, there is some risk of adblocking log events on client SDKs, which can be minimized by setting up a [Custom Proxy](../custom_proxy)| -|Flushing Events| Batched and flushed by the SDK every 60 seconds | Batched and flushed by the SDK every 10 seconds| +|Flushing Events| Batched and flushed by the SDK every 60 seconds | Batched and flushed by the SDK every 60 seconds [More Details] (https://docs.statsig.com/messages/clientEventLogger)| |Updating Configurations| Poll Statsig servers for updates every 10 seconds by default (configurable), Streaming possible with some Server SDKs and the [Statsig Forward Proxy](/server/concepts/forward_proxy) | Configuration persists until next `initialize` or `updateUser` call, recommended to call `initialize` at the start of each user session| ## Difference in initialize/update logic: