feat(trace sampler): implement error tracking standalone mode by thieman · Pull Request #1314 · DataDog/saluki

thieman · 2026-04-06T17:21:28Z

Summary

Implements Error Tracking Standalone (ETS) sampling mode, matching datadog-agent/pkg/trace/agent/agent.go runSamplers.

Changes

transforms/trace_sampler/mod.rs — ETS check runs at the top of run_samplers before all other samplers. Traces containing errors (including exception span events) are routed to the error sampler; non-error traces are dropped immediately. Dropped non-error ETS traces are forwarded to intake with DroppedTrace=true (suppressing SSS and analytics events), matching Go agent behavior.
- For OTLP traces when the probabilistic sampler is disabled, _dd.p.dm and _sampling_priority_v1 are pre-assigned inside the ETS block via otlp_pre_sample() — mirroring OTLPReceiver.createChunks in the Go agent. User-set priorities receive dm="-4" (manual); probabilistically-sampled traces receive dm="-9".
encoders/datadog/traces/mod.rs — emits _dd.error_tracking_standalone.error = "true" as a chunk tag (only for traces that actually contain an error span or exception span event) and sets X-Datadog-Error-Tracking-Standalone: true on outbound requests when ETS is enabled. The HTTP header is pre-built at construction time to avoid per-request allocation.
common/datadog/mod.rs — adds DECISION_MAKER_MANUAL constant ("-4") for user/manual sampling decisions.
common/datadog/apm.rs — ETS enabled via apm_config.error_tracking_standalone.enabled (YAML) or DD_APM_ERROR_TRACKING_STANDALONE_ENABLED (env var), following the same alias pattern as the rare sampler. Bool method docstrings standardized to "Returns true if ...".
common/datadog/request_builder.rs — adds additional_headers() hook to EndpointEncoder trait for encoder-specific request headers.
test/correctness/otlp-traces-ets/ — correctness test comparing ETS behavior against a DDA baseline, with error_rate: 0.1 to ensure a meaningful number of error traces pass through.

Behavioral notes

ETS takes priority over all other samplers (rare, probabilistic, priority)
Error detection includes both span.error != 0 and _dd.span_events.has_exception = "true"
Error traces: routed through error sampler (TPS-limited); if kept, tagged with ETS chunk tag and forwarded with ETS request header
Non-error traces: forwarded with DroppedTrace=true, no SSS or analytics event fallback; ETS chunk tag is not written (tag is only present on error traces)
OTLP pre-sampling (dm + priority) is computed inside the ETS block via a dedicated otlp_pre_sample() method, keeping the computation co-located with its only consumer

Test plan

Stacked on #1311.

🤖 Generated with Claude Code

pr-commenter · 2026-04-06T17:21:57Z

Binary Size Analysis (Agent Data Plane)

Target: bdcdc6c (baseline) vs b22e492 (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 26.47 MiB
Comparison Size: 26.47 MiB
Size Change: -3.78 KiB (-0.01%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module	File Size	Symbols
`core`	-4.17 KiB	46
`[Unmapped]`	-1.70 KiB	1
`saluki_components::common::datadog`	+694 B	22
`saluki_components::encoders::datadog`	+578 B	5
`saluki_components::transforms::trace_sampler`	+568 B	1
`serde_core`	-153 B	11
`http`	+148 B	2
`agent_data_plane::cli::run`	+124 B	1
`saluki_core::data_model::event`	+83 B	2
`[sections]`	+46 B	5
`saluki_core::topology::shutdown`	+24 B	1
`unicode_segmentation`	+16 B	1
`tokio`	+8 B	28
`agent_data_plane::components::tag_filterlist`	+4 B	1
`figment`	+4 B	1
`saluki_common::task::instrument`	+0 B	2
`anyhow`	+0 B	17
`tracing_core`	+0 B	4

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +17.3Ki  [NEW] +17.2Ki    saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::he766a0eb48dc2ca7
  [NEW] +13.2Ki  [NEW] +13.0Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h521a21c05bafb1bd
  [NEW] +10.5Ki  [NEW] +10.3Ki    _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::h4137df811c47120b
  [NEW] +9.71Ki  [NEW] +9.60Ki    saluki_components::common::datadog::apm::ApmConfig::from_configuration::hdd7d8c6f0bfa77f8
  [NEW] +5.71Ki  [NEW] +5.55Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::h9941a28ad84c7d85
  [NEW] +3.34Ki  [NEW] +3.17Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::h7112218fdd761b82
  [NEW] +2.62Ki  [NEW] +2.46Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::h671ea09455371842
  [NEW] +2.26Ki  [NEW] +2.12Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::create_request::h3dcd64c14c05724e
  [NEW] +1.84Ki  [NEW] +1.76Ki    tokio::runtime::task::raw::poll::h5f0300b45ba2987f
  [NEW] +1.78Ki  [NEW] +1.71Ki    tokio::runtime::task::raw::poll::h44baad23952b6723
  [DEL] -1.78Ki  [DEL] -1.71Ki    tokio::runtime::task::raw::poll::h99c86f35617a7f38
  [DEL] -1.84Ki  [DEL] -1.76Ki    tokio::runtime::task::raw::poll::h22bad6a42a6d9e4c
  -0.2% -2.18Ki  -0.1%    -718    [131 Others]
  [DEL] -2.62Ki  [DEL] -2.46Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::hc874e837eec984f6
  [DEL] -3.34Ki  [DEL] -3.18Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::hde2ee702a30f52ff
  [DEL] -3.93Ki  [DEL] -3.78Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h0d466dc6294304ee
  [DEL] -5.72Ki  [DEL] -5.57Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::hdba5aa9ff3fada65
  [DEL] -9.63Ki  [DEL] -9.51Ki    saluki_components::common::datadog::apm::ApmConfig::from_configuration::hbd8a0046fecbdb97
  [DEL] -10.5Ki  [DEL] -10.3Ki    _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::hac79458d260639d8
  [DEL] -13.2Ki  [DEL] -13.1Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h1199aefcd5f2ab2d
  [DEL] -17.3Ki  [DEL] -17.1Ki    saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::hfafcbd98d1699351
  -0.0% -3.78Ki  -0.0% -2.29Ki    TOTAL

Copilot

Pull request overview

This pull request implements Error Tracking Standalone (ETS) sampling mode for the trace sampler, matching the behavior of datadog-agent/pkg/trace/agent/agent.go runSamplers. ETS is a high-priority sampling mode that:

Keeps all error traces by routing them through the error sampler (TPS-limited)
Immediately drops all non-error traces without consulting other samplers
Suppresses single-span sampling and analytics events for dropped traces
Tags kept error traces with _dd.error_tracking_standalone.error = "true" chunk tag

Changes:

Added ets_error: bool field to TraceSampling to track whether a trace was kept by ETS mode
Added sampling_mut() accessor to Trace for mutable access to sampling metadata
Integrated ETS as the highest-priority sampler in run_samplers that runs before all other samplers
Updated encoder to emit the ETS error chunk tag when ets_error is set
Added comprehensive test coverage for all ETS behavior scenarios
ETS is disabled by default and can be enabled via configuration

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
lib/saluki-core/src/data_model/event/trace/mod.rs	Added `ets_error` field to `TraceSampling` and `sampling_mut()` accessor to `Trace`
lib/saluki-components/src/transforms/trace_sampler/mod.rs	Implemented ETS logic at the top of `run_samplers` with comprehensive test coverage
lib/saluki-components/src/encoders/datadog/traces/mod.rs	Emits `_dd.error_tracking_standalone.error` chunk tag for kept ETS traces

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

thieman · 2026-04-06T17:31:22Z

lib/saluki-components/src/config.rs

+    (
+        "apm_config.error_tracking_standalone.enabled",
+        "apm_error_tracking_standalone",
+    ),


Sourced from https://github.com/DataDog/datadog-agent/blob/main/comp/trace/config/impl/setup.go#L337 and public docs https://docs.datadoghq.com/error_tracking/backend/getting_started/single_step_instrumentation/?tab=linuxhostorvm

pr-commenter · 2026-04-06T17:38:10Z

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: df7f51dd-b5d2-4e7b-87ed-842b4013705a

Baseline: bdcdc6c
Comparison: b22e492
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+0.63	[-4.30, +5.55]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	-0.02	[-0.15, +0.11]	1	(metrics) (profiles) (logs)
✅	otlp_ingest_logs_5mb_memory	memory utilization	-8.05	[-8.53, -7.57]	1	(metrics) (profiles) (logs)

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	trials	links
➖	dsd_uds_500mb_3k_contexts_cpu	% cpu utilization	+2.17	[+0.82, +3.53]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_cpu	% cpu utilization	+1.74	[-56.17, +59.65]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_cpu	% cpu utilization	+1.03	[-1.45, +3.51]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_cpu	% cpu utilization	+0.99	[-1.22, +3.21]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_cpu	% cpu utilization	+0.63	[-4.30, +5.55]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_memory	memory utilization	+0.59	[+0.33, +0.84]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_medium	memory utilization	+0.44	[+0.24, +0.63]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_throughput	ingress throughput	+0.20	[+0.07, +0.32]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_memory	memory utilization	+0.17	[-0.16, +0.50]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_memory	memory utilization	+0.16	[-0.01, +0.33]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_ultraheavy	memory utilization	+0.16	[+0.03, +0.28]	1	(metrics) (profiles) (logs)
➖	dsd_uds_500mb_3k_contexts_memory	memory utilization	+0.15	[-0.01, +0.31]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_memory	memory utilization	+0.09	[-0.08, +0.26]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_memory	memory utilization	+0.07	[-0.11, +0.25]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_cpu	% cpu utilization	+0.04	[-54.81, +54.88]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_throughput	ingress throughput	+0.02	[-0.13, +0.17]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_throughput	ingress throughput	+0.01	[-0.12, +0.14]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_throughput	ingress throughput	+0.00	[-0.02, +0.02]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_throughput	ingress throughput	+0.00	[-0.02, +0.02]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_filtering_5mb_throughput	ingress throughput	-0.00	[-0.02, +0.02]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_throughput	ingress throughput	-0.00	[-0.03, +0.03]	1	(metrics) (profiles) (logs)
➖	dsd_uds_1mb_3k_contexts_throughput	ingress throughput	-0.00	[-0.06, +0.06]	1	(metrics) (profiles) (logs)
➖	dsd_uds_512kb_3k_contexts_throughput	ingress throughput	-0.01	[-0.06, +0.05]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_logs_5mb_throughput	ingress throughput	-0.02	[-0.15, +0.11]	1	(metrics) (profiles) (logs)
➖	dsd_uds_100mb_3k_contexts_cpu	% cpu utilization	-0.09	[-6.43, +6.25]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_idle	memory utilization	-0.10	[-0.13, -0.08]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_heavy	memory utilization	-0.15	[-0.28, -0.01]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_memory	memory utilization	-0.16	[-0.34, +0.02]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_5mb_memory	memory utilization	-0.19	[-0.44, +0.06]	1	(metrics) (profiles) (logs)
➖	quality_gates_rss_dsd_low	memory utilization	-0.25	[-0.44, -0.06]	1	(metrics) (profiles) (logs)
➖	dsd_uds_10mb_3k_contexts_cpu	% cpu utilization	-0.45	[-31.98, +31.07]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_traces_ottl_transform_5mb_cpu	% cpu utilization	-0.77	[-2.95, +1.41]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_cpu	% cpu utilization	-4.23	[-11.58, +3.11]	1	(metrics) (profiles) (logs)
➖	otlp_ingest_metrics_5mb_memory	memory utilization	-4.81	[-5.00, -4.63]	1	(metrics) (profiles) (logs)
✅	otlp_ingest_logs_5mb_memory	memory utilization	-8.05	[-8.53, -7.57]	1	(metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf	experiment	bounds_check_name	replicates_passed	observed_value	links
✅	quality_gates_rss_dsd_heavy	memory_usage	10/10	114.75MiB ≤ 140MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_low	memory_usage	10/10	34.47MiB ≤ 50MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_medium	memory_usage	10/10	54.19MiB ≤ 75MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_dsd_ultraheavy	memory_usage	10/10	169.07MiB ≤ 200MiB	(metrics) (profiles) (logs)
✅	quality_gates_rss_idle	memory_usage	10/10	21.28MiB ≤ 40MiB	(metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/saluki-components/src/encoders/datadog/traces/mod.rs

lib/saluki-components/src/transforms/trace_sampler/mod.rs

thieman · 2026-04-06T18:50:00Z

[Claude Sonnet 4.6] Related to #1134.

thieman · 2026-04-06T18:51:48Z

lib/saluki-components/src/transforms/trace_sampler/mod.rs

        }

+        // ETS: suppress single span sampling and analytics events for dropped traces.
+        // logic taken from: https://github.com/DataDog/datadog-agent/blob/be33ac1490c4a34602cbc65a211406b73ad6d00b/pkg/trace/agent/agent.go#L976


[Claude Sonnet 4.6] Note: the if self.error_tracking_standalone { return false; } immediately below was a pre-existing stub from main — the field was wired to config and this SSS/analytics suppression guard was in place, but the core ETS sampling logic in run_samplers was missing. This PR completes the implementation.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

andrewqian2001datadog · 2026-04-06T20:55:56Z

lib/saluki-components/src/common/datadog/apm.rs

+    /// can be remapped via ConfigurationLoader::with_key_aliases.
+    #[serde(
+        default = "default_error_tracking_standalone_enabled",
+        rename = "apm_error_tracking_standalone"


Should this be rename = "apm_error_tracking_standalone_enabled"?

agent config

Uuuuuuugh well I got this from our public docs so I'll go see what actually works https://docs.datadoghq.com/error_tracking/backend/getting_started/single_step_instrumentation/?tab=linuxhostorvm

Ok from this code search looks like it does need to have _enabled at the end, will fix https://github.com/search?q=repo%3ADataDog%2Fdatadog-agent%20DD_APM_ERROR_TRACKING_STANDALONE&type=code

I see my mistake now, the docs I linked are for the installer, not for actually running the agent

[Claude Sonnet 4.6] Fixed — changed rename to apm_error_tracking_standalone_enabled and updated the key alias in config.rs and env var reference in tests accordingly.

thieman · 2026-04-08T18:27:00Z

[Claude Sonnet 4.6] Flagging a concern from review of the rare sampler PR (#1311) that may still apply once both PRs are merged — worth verifying.

In the Go agent (agent.go:1069-1077), the ErrorTrackingStandalone check is the first thing in runSamplers, before the rare sampler runs. Non-error traces are immediately dropped and the rare sampler never sees them.

In the current Rust implementation, run_samplers calls rare_sampler.sample() unconditionally before the ETS guard. The ETS check only appears in process_trace on the keep=false path — so a non-error trace caught by the rare sampler (keep=true) bypasses the ETS guard entirely and would be kept when Go would drop it. The rare sampler also wastefully consumes rate-limiter tokens on traces ETS will ultimately discard.

The fix would be to move the ETS guard to the top of run_samplers, before the rare sampler call.

Implements the ETS sampling path from datadog-agent/pkg/trace/agent/agent.go: - In run_samplers: ETS check runs at the very top (before rare/probabilistic). Traces with errors (including exception span events) are routed exclusively to the error sampler; traces without errors are dropped immediately. - In process_trace: dropped ETS traces suppress SSS and analytics events. - When ETS keeps a trace, sets ets_error=true on TraceSampling so the encoder emits _dd.error_tracking_standalone.error="true" as a chunk tag. - Adds ets_error field to TraceSampling and sampling_mut() accessor to Trace. - Updates the DD traces encoder to emit the ETS chunk tag when set. 6 new tests: error kept, no-error dropped, SSS suppressed, ets_error flag set, exception span events treated as errors, ETS disabled uses normal path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Follows the same pattern as enable_rare_sampler: - YAML: apm_config.error_tracking_standalone.enabled - Env var: DD_APM_ERROR_TRACKING_STANDALONE Adds enable_error_tracking_standalone to the ApmConfiguration wrapper (with rename = "apm_error_tracking_standalone"), a KEY_ALIAS mapping the nested YAML path to the flat key, and copies the value into ApmConfig in from_configuration. Removes the now-redundant ErrorTrackingStandaloneConfig struct. Adds 4 config tests mirroring the rare sampler config tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… per-request allocation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…hunk tag from config directly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… in source comments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…s, keep only ETS-added links as permalinks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…to avoid per-trace method call Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…one_enabled Fixes rename and key alias to match the actual Datadog Agent config key. Previous key was missing the `_enabled` suffix per pkg/config/setup/apm.go. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman · 2026-04-09T20:28:06Z

[Claude Sonnet 4.6] Verified — this is addressed in the current implementation. The ETS guard is at the top of run_samplers (before rare_sampler.sample()), so when ETS is enabled the function returns early on the first trace span check, and the rare sampler never runs. Non-error traces are dropped before any other sampler is consulted, matching the Go agent ordering.

… var APM_ERROR_TRACKING_STANDALONE → APM_ERROR_TRACKING_STANDALONE_ENABLED to match the corrected config key rename. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…; gate chunk tag on error presence Non-error traces in ETS mode are now forwarded to intake with DroppedTrace=true instead of being silently discarded, matching Go agent behavior. The ETS chunk tag is now only written when the trace actually contains an error span or exception span event. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman · 2026-04-09T21:12:30Z

[Claude Sonnet 4.6]

ETS dropped trace forwarding

This PR changes how non-error traces are handled in ETS mode. Previously they were silently discarded. They're now forwarded to intake with DroppedTrace=true on the chunk metadata.

How the Go agent does it

In the Go agent, this is achieved as a side effect of the SSS (Single Span Sampling) guard. The relevant code in pkg/trace/agent/agent.go:

if !keep && !a.conf.ErrorTrackingStandalone {
    // Single span sampling: keep individually-marked spans from dropped traces
    ss := a.singleSpanSampler
    if ss != nil && ss.enabled() {
        ss.applySpanSamplingRules(pt, chunks)
    }
    // Analytics events fallback
    ...
}

The !a.conf.ErrorTrackingStandalone guard skips span-stripping for ETS. As a result, when ETS is enabled and a trace is dropped (keep=false), the spans remain in the chunk. Later in appendChunks, chunks are only removed if !keep && len(spans)==0. Since ETS traces still have their spans, they pass through and are forwarded with DroppedTrace=true in the protobuf.

The Rust implementation now explicitly replicates this: when ETS is enabled and the trace is dropped, apply_sampling_metadata is called with keep=false (which sets dropped_trace=true on the TraceSampling struct), and the trace is forwarded rather than discarded.

Please verify: Is this the correct read of the Go behavior? Specifically — is DroppedTrace=true actually set on these chunks in Go, and is the intent that dropped ETS traces are forwarded to the backend for stats/analytics purposes?

thieman · 2026-04-09T21:27:39Z

@andrewqian2001datadog ready for another review here. I had some specific questions based on the Claude comment immediately above with the "ETS dropped trace forwarding" header. It seems like DDA currently forwards all traces (with DroppedTrace=True) when ETS is enabled, I wanted to see if that passed the sniff test for you.

tobz · 2026-04-10T13:01:20Z

lib/saluki-components/src/common/datadog/apm.rs

+    /// Enables Error Tracking Standalone mode. Lives here (rather than nested within `apm_config`)
+    /// so that the env var path (`DD_APM_ERROR_TRACKING_STANDALONE_ENABLED` → `apm_error_tracking_standalone_enabled`)
+    /// can be remapped via ConfigurationLoader::with_key_aliases.


I missed that we also did this pattern with the rare sampler enabled field... falls right into that very narrow space between what key aliases and env remappings give us. 😭

tobz · 2026-04-10T13:02:18Z

lib/saluki-components/src/common/datadog/apm.rs

    /// Returns if error tracking standalone mode is enabled.
    pub const fn error_tracking_standalone_enabled(&self) -> bool {
-        self.error_tracking_standalone.enabled
+        self.error_tracking_standalone


Can we actually update the doc comment for this method to say:

Returns true if error tracking standalone mode is enabled.

It's pattern matching here, I'll have it update this for the other bool-returning methods here as well

tobz · 2026-04-10T13:15:10Z

lib/saluki-components/src/transforms/trace_sampler/mod.rs

+        // ETS: forward dropped traces with DroppedTrace=true, suppressing SSS/analytics.
        if self.error_tracking_standalone {
-            return false;
+            if let Some(root_idx) = root_span_idx {
+                self.apply_sampling_metadata(trace, false, priority, decision_maker, root_idx);
+            }
+            return true;


My brain is a little mushy trying to think this one through...

On the Agent side, this method is equivalent to sample (here), where the boolean return value is whether or not to keep the trace.

Above this line, we check if keep is true and then return true if so... so if we're here, keep is false. Nowhere in sample is keep mutated after the call to a.traceSampling(now, ts, pt), so why do we return true even though we know keep is false? 🤔

If we return false here then the Trace is removed from the buffer and ultimately dropped. In ETS mode we need to forward all traces, and non-error traces get DroppedTrace=true metadata on them. Verified this behavior with the new correctness test added in this PR.

…cking Standalone mode Adds a correctness test that sends OTLP traces to both the baseline (DDA) and comparison (DDA+ADP) agents with ETS enabled, verifying that both forward the same set of spans (error traces kept, non-error traces forwarded with DroppedTrace=true). Uses a 10% error rate in the millstone corpus for meaningful error trace coverage, and disables TPS limits to prevent the error sampler rate from being a variable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Mirror DDA's OTLPReceiver.createChunks behavior: when the probabilistic sampler is disabled, assign dm/priority based on trace ID sampling before the ETS early return so non-error OTLP traces still carry the correct `_dd.p.dm` and `_sampling_priority_v1` values. Add DECISION_MAKER_MANUAL constant (-4) to common/datadog for user-set sampling decisions, and unit test all five OTLP pre-sampling ETS paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e` if" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Mirror DDA more accurately: OTLPReceiver.createChunks runs before runSamplersV1 entirely, so the dm/priority pre-assignment is not inside the ETS branch. Move the otlp_pre_sample computation above the ETS check; ETS consumes the result unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Moves the OTLP pre-sampling logic into a dedicated method and calls it from inside the ETS block, keeping the computation co-located with its only consumer and eliminating wasted work when ETS is disabled. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman · 2026-04-10T16:16:20Z

lib/saluki-components/src/transforms/trace_sampler/mod.rs

+    /// or `None` if pre-sampling does not apply.
+    ///
+    /// See: https://github.com/DataDog/datadog-agent/blob/be33ac1490c4a34602cbc65a211406b73ad6d00b/pkg/trace/api/otlp.go#L561-L585
+    fn otlp_pre_sample(&mut self, trace: &mut Trace, root_span_idx: usize) -> Option<(i32, &'static str)> {


This was a missing piece, in DDA some metadata is applied on incoming OTLP traces before the samplers are run. The ETS branch in runSamplers needs that metadata. In ADP, that same metadata is applied after run_samplers is called, so we don't have it available here. This block allows us to source the OTLP-specific information we need without changing up the order of sampling vs applying metadata, which would cause us to waste cycles on otherwise-dropped traces.

Copilot AI review requested due to automatic review settings April 6, 2026 17:21

dd-octo-sts bot added area/core Core functionality, event model, etc. area/components Sources, transforms, and destinations. encoder/datadog-traces Datadog Traces encoder. transform/trace-sampler Trace Sampler synchronous transform. labels Apr 6, 2026

Copilot started reviewing on behalf of thieman April 6, 2026 17:22 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

thieman commented Apr 6, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 6, 2026 17:41

Copilot started reviewing on behalf of thieman April 6, 2026 17:42 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

dd-octo-sts bot removed the area/core Core functionality, event model, etc. label Apr 6, 2026

Copilot AI review requested due to automatic review settings April 6, 2026 17:59

Copilot started reviewing on behalf of thieman April 6, 2026 18:00 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

thieman mentioned this pull request Apr 6, 2026

Implement missing samplers in Trace Sampler transform #1134

Closed

thieman commented Apr 6, 2026

View reviewed changes

thieman marked this pull request as ready for review April 6, 2026 19:03

thieman requested a review from a team as a code owner April 6, 2026 19:03

Copilot AI review requested due to automatic review settings April 6, 2026 19:03

Copilot started reviewing on behalf of thieman April 6, 2026 19:03 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

thieman force-pushed the thieman/rare-sampler branch from efd14f6 to af78b1a Compare April 6, 2026 20:44

andrewqian2001datadog reviewed Apr 6, 2026

View reviewed changes

thieman mentioned this pull request Apr 8, 2026

feat(trace sampler): implement rare sampler #1311

Merged

4 tasks

thieman and others added 8 commits April 9, 2026 09:57

perf(trace sampler): pre-build ETS header at construction time, avoid…

8f945b7

… per-request allocation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(trace sampler): remove ets_error from TraceSampling, drive c…

75eb2f2

…hunk tag from config directly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(trace sampler): replace branch refs with commit-hash permalinks…

9031068

… in source comments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore(trace sampler): revert permalink change on pre-existing comment…

e998f23

…s, keep only ETS-added links as permalinks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

perf(trace sampler): cache error_tracking_standalone flag on encoder …

9b106e4

…to avoid per-trace method call Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(trace sampler): add encoder tests for ETS chunk tag and HTTP header

4f0ec0c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman force-pushed the thieman/error-tracking-standalone branch from 99425a8 to 2d5cb62 Compare April 9, 2026 14:01

thieman changed the base branch from thieman/rare-sampler to main April 9, 2026 14:01

thieman and others added 2 commits April 9, 2026 16:30

fix(trace sampler): update encoder test helper to use renamed ETS env…

b936190

… var APM_ERROR_TRACKING_STANDALONE → APM_ERROR_TRACKING_STANDALONE_ENABLED to match the corrected config key rename. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman requested a review from andrewqian2001datadog April 9, 2026 21:26

tobz requested changes Apr 10, 2026

View reviewed changes

dd-octo-sts bot added area/ci CI/CD, automated testing, etc. area/test All things testing: unit/integration, correctness, SMP regression, etc. labels Apr 10, 2026

andrewqian2001datadog approved these changes Apr 10, 2026

View reviewed changes

thieman and others added 4 commits April 10, 2026 10:54

docs(apm-config): standardize bool method docstrings to "Returns `tru…

1443e4b

…e` if" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(trace-sampler): collapse keep/ETS forward into single branch

7201cd5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman commented Apr 10, 2026

View reviewed changes

Conversation

thieman commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Behavioral notes

Test plan

Uh oh!

pr-commenter bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

Changes by Module

Detailed Symbol Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pr-commenter bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent Data Plane)

Regression Detector Results

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Fine details of change detection per experiment

Bounds Checks: ✅ Passed

Explanation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thieman commented Apr 6, 2026

Uh oh!

thieman Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thieman commented Apr 8, 2026

Uh oh!

thieman commented Apr 9, 2026

Uh oh!

thieman commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ETS dropped trace forwarding

How the Go agent does it

Uh oh!

thieman commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

thieman commented Apr 6, 2026 •

edited

Loading

pr-commenter bot commented Apr 6, 2026 •

edited

Loading

pr-commenter bot commented Apr 6, 2026 •

edited

Loading

thieman Apr 6, 2026 •

edited

Loading

thieman commented Apr 9, 2026 •

edited

Loading

thieman commented Apr 9, 2026 •

edited

Loading