feat(trace sampler): implement error tracking standalone mode#1314
feat(trace sampler): implement error tracking standalone mode#1314
Conversation
Binary Size Analysis (Agent Data Plane)Target: bdcdc6c (baseline) vs b22e492 (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
core |
-4.17 KiB | 46 |
[Unmapped] |
-1.70 KiB | 1 |
saluki_components::common::datadog |
+694 B | 22 |
saluki_components::encoders::datadog |
+578 B | 5 |
saluki_components::transforms::trace_sampler |
+568 B | 1 |
serde_core |
-153 B | 11 |
http |
+148 B | 2 |
agent_data_plane::cli::run |
+124 B | 1 |
saluki_core::data_model::event |
+83 B | 2 |
[sections] |
+46 B | 5 |
saluki_core::topology::shutdown |
+24 B | 1 |
unicode_segmentation |
+16 B | 1 |
tokio |
+8 B | 28 |
agent_data_plane::components::tag_filterlist |
+4 B | 1 |
figment |
+4 B | 1 |
saluki_common::task::instrument |
+0 B | 2 |
anyhow |
+0 B | 17 |
tracing_core |
+0 B | 4 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +17.3Ki [NEW] +17.2Ki saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::he766a0eb48dc2ca7
[NEW] +13.2Ki [NEW] +13.0Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h521a21c05bafb1bd
[NEW] +10.5Ki [NEW] +10.3Ki _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::h4137df811c47120b
[NEW] +9.71Ki [NEW] +9.60Ki saluki_components::common::datadog::apm::ApmConfig::from_configuration::hdd7d8c6f0bfa77f8
[NEW] +5.71Ki [NEW] +5.55Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::h9941a28ad84c7d85
[NEW] +3.34Ki [NEW] +3.17Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::h7112218fdd761b82
[NEW] +2.62Ki [NEW] +2.46Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::h671ea09455371842
[NEW] +2.26Ki [NEW] +2.12Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::create_request::h3dcd64c14c05724e
[NEW] +1.84Ki [NEW] +1.76Ki tokio::runtime::task::raw::poll::h5f0300b45ba2987f
[NEW] +1.78Ki [NEW] +1.71Ki tokio::runtime::task::raw::poll::h44baad23952b6723
[DEL] -1.78Ki [DEL] -1.71Ki tokio::runtime::task::raw::poll::h99c86f35617a7f38
[DEL] -1.84Ki [DEL] -1.76Ki tokio::runtime::task::raw::poll::h22bad6a42a6d9e4c
-0.2% -2.18Ki -0.1% -718 [131 Others]
[DEL] -2.62Ki [DEL] -2.46Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::hc874e837eec984f6
[DEL] -3.34Ki [DEL] -3.18Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::hde2ee702a30f52ff
[DEL] -3.93Ki [DEL] -3.78Ki _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h0d466dc6294304ee
[DEL] -5.72Ki [DEL] -5.57Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::hdba5aa9ff3fada65
[DEL] -9.63Ki [DEL] -9.51Ki saluki_components::common::datadog::apm::ApmConfig::from_configuration::hbd8a0046fecbdb97
[DEL] -10.5Ki [DEL] -10.3Ki _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::hac79458d260639d8
[DEL] -13.2Ki [DEL] -13.1Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h1199aefcd5f2ab2d
[DEL] -17.3Ki [DEL] -17.1Ki saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::hfafcbd98d1699351
-0.0% -3.78Ki -0.0% -2.29Ki TOTAL
There was a problem hiding this comment.
Pull request overview
This pull request implements Error Tracking Standalone (ETS) sampling mode for the trace sampler, matching the behavior of datadog-agent/pkg/trace/agent/agent.go runSamplers. ETS is a high-priority sampling mode that:
- Keeps all error traces by routing them through the error sampler (TPS-limited)
- Immediately drops all non-error traces without consulting other samplers
- Suppresses single-span sampling and analytics events for dropped traces
- Tags kept error traces with
_dd.error_tracking_standalone.error = "true"chunk tag
Changes:
- Added
ets_error: boolfield toTraceSamplingto track whether a trace was kept by ETS mode - Added
sampling_mut()accessor toTracefor mutable access to sampling metadata - Integrated ETS as the highest-priority sampler in
run_samplersthat runs before all other samplers - Updated encoder to emit the ETS error chunk tag when
ets_erroris set - Added comprehensive test coverage for all ETS behavior scenarios
- ETS is disabled by default and can be enabled via configuration
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| lib/saluki-core/src/data_model/event/trace/mod.rs | Added ets_error field to TraceSampling and sampling_mut() accessor to Trace |
| lib/saluki-components/src/transforms/trace_sampler/mod.rs | Implemented ETS logic at the top of run_samplers with comprehensive test coverage |
| lib/saluki-components/src/encoders/datadog/traces/mod.rs | Emits _dd.error_tracking_standalone.error chunk tag for kept ETS traces |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ( | ||
| "apm_config.error_tracking_standalone.enabled", | ||
| "apm_error_tracking_standalone", | ||
| ), |
There was a problem hiding this comment.
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: df7f51dd-b5d2-4e7b-87ed-842b4013705a Baseline: bdcdc6c Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +0.63 | [-4.30, +5.55] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.02 | [-0.15, +0.11] | 1 | (metrics) (profiles) (logs) |
| ✅ | otlp_ingest_logs_5mb_memory | memory utilization | -8.05 | [-8.53, -7.57] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | +2.17 | [+0.82, +3.53] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | +1.74 | [-56.17, +59.65] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | +1.03 | [-1.45, +3.51] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +0.99 | [-1.22, +3.21] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +0.63 | [-4.30, +5.55] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | +0.59 | [+0.33, +0.84] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | +0.44 | [+0.24, +0.63] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | +0.20 | [+0.07, +0.32] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | +0.17 | [-0.16, +0.50] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | +0.16 | [-0.01, +0.33] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | +0.16 | [+0.03, +0.28] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | +0.15 | [-0.01, +0.31] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | +0.09 | [-0.08, +0.26] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | +0.07 | [-0.11, +0.25] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | +0.04 | [-54.81, +54.88] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | +0.02 | [-0.13, +0.17] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | +0.01 | [-0.12, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | -0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.03, +0.03] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | -0.01 | [-0.06, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | -0.02 | [-0.15, +0.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | -0.09 | [-6.43, +6.25] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | -0.10 | [-0.13, -0.08] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | -0.15 | [-0.28, -0.01] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | -0.16 | [-0.34, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | -0.19 | [-0.44, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | -0.25 | [-0.44, -0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | -0.45 | [-31.98, +31.07] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | -0.77 | [-2.95, +1.41] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | -4.23 | [-11.58, +3.11] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | -4.81 | [-5.00, -4.63] | 1 | (metrics) (profiles) (logs) |
| ✅ | otlp_ingest_logs_5mb_memory | memory utilization | -8.05 | [-8.53, -7.57] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 114.75MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 34.47MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 54.19MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 169.07MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 21.28MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
[Claude Sonnet 4.6] Related to #1134. |
| } | ||
|
|
||
| // ETS: suppress single span sampling and analytics events for dropped traces. | ||
| // logic taken from: https://github.com/DataDog/datadog-agent/blob/be33ac1490c4a34602cbc65a211406b73ad6d00b/pkg/trace/agent/agent.go#L976 |
There was a problem hiding this comment.
[Claude Sonnet 4.6] Note: the if self.error_tracking_standalone { return false; } immediately below was a pre-existing stub from main — the field was wired to config and this SSS/analytics suppression guard was in place, but the core ETS sampling logic in run_samplers was missing. This PR completes the implementation.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
efd14f6 to
af78b1a
Compare
| /// can be remapped via ConfigurationLoader::with_key_aliases. | ||
| #[serde( | ||
| default = "default_error_tracking_standalone_enabled", | ||
| rename = "apm_error_tracking_standalone" |
There was a problem hiding this comment.
Should this be rename = "apm_error_tracking_standalone_enabled"?
There was a problem hiding this comment.
Uuuuuuugh well I got this from our public docs so I'll go see what actually works https://docs.datadoghq.com/error_tracking/backend/getting_started/single_step_instrumentation/?tab=linuxhostorvm
There was a problem hiding this comment.
Ok from this code search looks like it does need to have _enabled at the end, will fix https://github.com/search?q=repo%3ADataDog%2Fdatadog-agent%20DD_APM_ERROR_TRACKING_STANDALONE&type=code
I see my mistake now, the docs I linked are for the installer, not for actually running the agent
There was a problem hiding this comment.
[Claude Sonnet 4.6] Fixed — changed rename to apm_error_tracking_standalone_enabled and updated the key alias in config.rs and env var reference in tests accordingly.
|
[Claude Sonnet 4.6] Flagging a concern from review of the rare sampler PR (#1311) that may still apply once both PRs are merged — worth verifying. In the Go agent (agent.go:1069-1077), the In the current Rust implementation, The fix would be to move the ETS guard to the top of |
Implements the ETS sampling path from datadog-agent/pkg/trace/agent/agent.go: - In run_samplers: ETS check runs at the very top (before rare/probabilistic). Traces with errors (including exception span events) are routed exclusively to the error sampler; traces without errors are dropped immediately. - In process_trace: dropped ETS traces suppress SSS and analytics events. - When ETS keeps a trace, sets ets_error=true on TraceSampling so the encoder emits _dd.error_tracking_standalone.error="true" as a chunk tag. - Adds ets_error field to TraceSampling and sampling_mut() accessor to Trace. - Updates the DD traces encoder to emit the ETS chunk tag when set. 6 new tests: error kept, no-error dropped, SSS suppressed, ets_error flag set, exception span events treated as errors, ETS disabled uses normal path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follows the same pattern as enable_rare_sampler: - YAML: apm_config.error_tracking_standalone.enabled - Env var: DD_APM_ERROR_TRACKING_STANDALONE Adds enable_error_tracking_standalone to the ApmConfiguration wrapper (with rename = "apm_error_tracking_standalone"), a KEY_ALIAS mapping the nested YAML path to the flat key, and copies the value into ApmConfig in from_configuration. Removes the now-redundant ErrorTrackingStandaloneConfig struct. Adds 4 config tests mirroring the rare sampler config tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… per-request allocation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hunk tag from config directly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… in source comments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s, keep only ETS-added links as permalinks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…to avoid per-trace method call Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…one_enabled Fixes rename and key alias to match the actual Datadog Agent config key. Previous key was missing the `_enabled` suffix per pkg/config/setup/apm.go. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
99425a8 to
2d5cb62
Compare
|
[Claude Sonnet 4.6] Verified — this is addressed in the current implementation. The ETS guard is at the top of |
… var APM_ERROR_TRACKING_STANDALONE → APM_ERROR_TRACKING_STANDALONE_ENABLED to match the corrected config key rename. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…; gate chunk tag on error presence Non-error traces in ETS mode are now forwarded to intake with DroppedTrace=true instead of being silently discarded, matching Go agent behavior. The ETS chunk tag is now only written when the trace actually contains an error span or exception span event. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
[Claude Sonnet 4.6] ETS dropped trace forwardingThis PR changes how non-error traces are handled in ETS mode. Previously they were silently discarded. They're now forwarded to intake with How the Go agent does itIn the Go agent, this is achieved as a side effect of the SSS (Single Span Sampling) guard. The relevant code in if !keep && !a.conf.ErrorTrackingStandalone {
// Single span sampling: keep individually-marked spans from dropped traces
ss := a.singleSpanSampler
if ss != nil && ss.enabled() {
ss.applySpanSamplingRules(pt, chunks)
}
// Analytics events fallback
...
}The The Rust implementation now explicitly replicates this: when ETS is enabled and the trace is dropped, Please verify: Is this the correct read of the Go behavior? Specifically — is |
|
@andrewqian2001datadog ready for another review here. I had some specific questions based on the Claude comment immediately above with the "ETS dropped trace forwarding" header. It seems like DDA currently forwards all traces (with DroppedTrace=True) when ETS is enabled, I wanted to see if that passed the sniff test for you. |
| /// Enables Error Tracking Standalone mode. Lives here (rather than nested within `apm_config`) | ||
| /// so that the env var path (`DD_APM_ERROR_TRACKING_STANDALONE_ENABLED` → `apm_error_tracking_standalone_enabled`) | ||
| /// can be remapped via ConfigurationLoader::with_key_aliases. |
There was a problem hiding this comment.
I missed that we also did this pattern with the rare sampler enabled field... falls right into that very narrow space between what key aliases and env remappings give us. 😭
| /// Returns if error tracking standalone mode is enabled. | ||
| pub const fn error_tracking_standalone_enabled(&self) -> bool { | ||
| self.error_tracking_standalone.enabled | ||
| self.error_tracking_standalone |
There was a problem hiding this comment.
Can we actually update the doc comment for this method to say:
Returns
trueif error tracking standalone mode is enabled.
There was a problem hiding this comment.
It's pattern matching here, I'll have it update this for the other bool-returning methods here as well
| // ETS: forward dropped traces with DroppedTrace=true, suppressing SSS/analytics. | ||
| if self.error_tracking_standalone { | ||
| return false; | ||
| if let Some(root_idx) = root_span_idx { | ||
| self.apply_sampling_metadata(trace, false, priority, decision_maker, root_idx); | ||
| } | ||
| return true; |
There was a problem hiding this comment.
My brain is a little mushy trying to think this one through...
On the Agent side, this method is equivalent to sample (here), where the boolean return value is whether or not to keep the trace.
Above this line, we check if keep is true and then return true if so... so if we're here, keep is false. Nowhere in sample is keep mutated after the call to a.traceSampling(now, ts, pt), so why do we return true even though we know keep is false? 🤔
There was a problem hiding this comment.
If we return false here then the Trace is removed from the buffer and ultimately dropped. In ETS mode we need to forward all traces, and non-error traces get DroppedTrace=true metadata on them. Verified this behavior with the new correctness test added in this PR.
…cking Standalone mode Adds a correctness test that sends OTLP traces to both the baseline (DDA) and comparison (DDA+ADP) agents with ETS enabled, verifying that both forward the same set of spans (error traces kept, non-error traces forwarded with DroppedTrace=true). Uses a 10% error rate in the millstone corpus for meaningful error trace coverage, and disables TPS limits to prevent the error sampler rate from being a variable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirror DDA's OTLPReceiver.createChunks behavior: when the probabilistic sampler is disabled, assign dm/priority based on trace ID sampling before the ETS early return so non-error OTLP traces still carry the correct `_dd.p.dm` and `_sampling_priority_v1` values. Add DECISION_MAKER_MANUAL constant (-4) to common/datadog for user-set sampling decisions, and unit test all five OTLP pre-sampling ETS paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e` if" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirror DDA more accurately: OTLPReceiver.createChunks runs before runSamplersV1 entirely, so the dm/priority pre-assignment is not inside the ETS branch. Move the otlp_pre_sample computation above the ETS check; ETS consumes the result unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Moves the OTLP pre-sampling logic into a dedicated method and calls it from inside the ETS block, keeping the computation co-located with its only consumer and eliminating wasted work when ETS is disabled. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| /// or `None` if pre-sampling does not apply. | ||
| /// | ||
| /// See: https://github.com/DataDog/datadog-agent/blob/be33ac1490c4a34602cbc65a211406b73ad6d00b/pkg/trace/api/otlp.go#L561-L585 | ||
| fn otlp_pre_sample(&mut self, trace: &mut Trace, root_span_idx: usize) -> Option<(i32, &'static str)> { |
There was a problem hiding this comment.
This was a missing piece, in DDA some metadata is applied on incoming OTLP traces before the samplers are run. The ETS branch in runSamplers needs that metadata. In ADP, that same metadata is applied after run_samplers is called, so we don't have it available here. This block allows us to source the OTLP-specific information we need without changing up the order of sampling vs applying metadata, which would cause us to waste cycles on otherwise-dropped traces.
Summary
Implements Error Tracking Standalone (ETS) sampling mode, matching
datadog-agent/pkg/trace/agent/agent.gorunSamplers.Changes
transforms/trace_sampler/mod.rs— ETS check runs at the top ofrun_samplersbefore all other samplers. Traces containing errors (including exception span events) are routed to the error sampler; non-error traces are dropped immediately. Dropped non-error ETS traces are forwarded to intake withDroppedTrace=true(suppressing SSS and analytics events), matching Go agent behavior._dd.p.dmand_sampling_priority_v1are pre-assigned inside the ETS block viaotlp_pre_sample()— mirroringOTLPReceiver.createChunksin the Go agent. User-set priorities receivedm="-4"(manual); probabilistically-sampled traces receivedm="-9".encoders/datadog/traces/mod.rs— emits_dd.error_tracking_standalone.error = "true"as a chunk tag (only for traces that actually contain an error span or exception span event) and setsX-Datadog-Error-Tracking-Standalone: trueon outbound requests when ETS is enabled. The HTTP header is pre-built at construction time to avoid per-request allocation.common/datadog/mod.rs— addsDECISION_MAKER_MANUALconstant ("-4") for user/manual sampling decisions.common/datadog/apm.rs— ETS enabled viaapm_config.error_tracking_standalone.enabled(YAML) orDD_APM_ERROR_TRACKING_STANDALONE_ENABLED(env var), following the same alias pattern as the rare sampler. Bool method docstrings standardized to "Returnstrueif ...".common/datadog/request_builder.rs— addsadditional_headers()hook toEndpointEncodertrait for encoder-specific request headers.test/correctness/otlp-traces-ets/— correctness test comparing ETS behavior against a DDA baseline, witherror_rate: 0.1to ensure a meaningful number of error traces pass through.Behavioral notes
span.error != 0and_dd.span_events.has_exception = "true"DroppedTrace=true, no SSS or analytics event fallback; ETS chunk tag is not written (tag is only present on error traces)otlp_pre_sample()method, keeping the computation co-located with its only consumerTest plan
ets_keeps_trace_with_error— error trace kept by error samplerets_drops_trace_without_error— non-error trace dropped by run_samplersets_forwards_dropped_trace_with_dropped_flag— non-error ETS trace forwarded withDroppedTrace=true; SSS not appliedets_keeps_trace_with_exception_span_event— exception span events count as errorsets_disabled_uses_normal_sampling— ETS disabled falls through to normal sampling pathets_otlp_non_error_gets_presample_priority_and_dm— non-error OTLP trace getspriority=AutoKeep, dm="-9"before ETS dropets_otlp_error_gets_presample_priority_and_dm— error OTLP trace getspriority=AutoKeep, dm="-9"when keptets_otlp_probabilistic_path_skips_presample— probabilistic sampler path: no pre-sampling appliedets_non_otlp_unaffected_by_presample— non-OTLP traces unaffected by OTLP pre-sampling logicets_otlp_user_priority_gets_manual_dm— user-set priority getsdm="-4"(manual)ets_header_present_when_enabled/ets_header_absent_when_disabled— HTTP header behaviorets_chunk_tag_present_for_error_trace— chunk tag written for error tracesets_chunk_tag_absent_for_non_error_trace— chunk tag not written for non-error tracesets_chunk_tag_absent_when_disabled— chunk tag not written when ETS is disabledotlp-traces-ets): DDA vs ADP output matches with no differences detectedStacked on #1311.
🤖 Generated with Claude Code