fix(trace sampler): correctness test for probabilistic sampler + fixes#1344
Open
thieman wants to merge 4 commits intothieman/error-tracking-standalonefrom
Open
fix(trace sampler): correctness test for probabilistic sampler + fixes#1344thieman wants to merge 4 commits intothieman/error-tracking-standalonefrom
thieman wants to merge 4 commits intothieman/error-tracking-standalonefrom
Conversation
Tests that ADP's probabilistic sampler (at 50%) makes identical keep/drop decisions and produces the same APM stats as the Datadog Agent baseline. The probabilistic sampler is deterministic (FNV1a hash of trace ID), so both sides receiving the same millstone corpus should keep exactly the same traces. Any divergence in span counts or stats indicates a bug in ADP's pipeline — notably, ADP currently computes APM stats post-sampling while the Agent computes them pre-sampling, which this test is expected to surface. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… sampler test ADP intentionally computes APM stats only on sampled traces (post-sampling), whereas the Agent computes stats on all traces pre-sampling. Enable otlp_direct_analysis_mode to skip stats comparison and focus the test on verifying that both sides make identical probabilistic sampling decisions and produce identical span output for kept traces. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…OTLP traces with probabilistic sampler When the APM-level probabilistic sampler is enabled, the DD Agent writes `_dd.p.dm` to trace chunk metadata tags only, not to individual span meta. ADP was incorrectly writing it to span meta for all traces regardless of path. Fix: skip the span meta write for OTLP traces when probabilistic_sampler is enabled, while still passing the DM value through TraceSampling so the encoder writes it to chunk tags. The legacy OTLP pre-sampling path (without probabilistic_sampler_enabled) continues to write DM to span meta, matching existing agent behavior. Also fix the SSI metadata check in ground-truth: when otlp_direct_analysis_mode is true, sampling may legitimately drop the first-per-service spans that _dd.install.id is attached to. Wrap both baseline and comparison SSI checks in the same !otlp_direct_analysis_mode gate (matching what the comment already described). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thieman
commented
Apr 10, 2026
| error_count += 1; | ||
| // present (when not in OTLP-direct mode). In OTLP-direct mode, probabilistic sampling may legitimately drop | ||
| // the first-per-service spans that SSI metadata is attached to, so the check is skipped on both sides. | ||
| if !self.options.otlp_direct_analysis_mode { |
Contributor
Author
There was a problem hiding this comment.
This fix seems legitimate to me as it better aligns with what the comment was saying. Still a bit wary to change the tests rather than fixing the impl though, so check me please
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Binary Size Analysis (Agent Data Plane)Target: bdcdc6c (baseline) vs 575052d (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
core |
-4.17 KiB | 46 |
[Unmapped] |
-1.77 KiB | 1 |
saluki_components::common::datadog |
+694 B | 22 |
saluki_components::encoders::datadog |
+578 B | 5 |
saluki_components::transforms::trace_sampler |
+577 B | 1 |
serde_core |
-153 B | 11 |
http |
+148 B | 2 |
agent_data_plane::cli::run |
+124 B | 1 |
[sections] |
+101 B | 5 |
saluki_core::data_model::event |
+83 B | 2 |
saluki_core::topology::shutdown |
+24 B | 1 |
unicode_segmentation |
+16 B | 1 |
tokio |
+8 B | 28 |
agent_data_plane::components::tag_filterlist |
+4 B | 1 |
figment |
+4 B | 1 |
saluki_common::task::instrument |
+0 B | 2 |
anyhow |
+0 B | 17 |
tracing_core |
+0 B | 4 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +17.3Ki [NEW] +17.2Ki saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::he766a0eb48dc2ca7
[NEW] +13.2Ki [NEW] +13.0Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h521a21c05bafb1bd
[NEW] +10.5Ki [NEW] +10.3Ki _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::h4137df811c47120b
[NEW] +9.71Ki [NEW] +9.60Ki saluki_components::common::datadog::apm::ApmConfig::from_configuration::hdd7d8c6f0bfa77f8
[NEW] +5.71Ki [NEW] +5.55Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::h9941a28ad84c7d85
[NEW] +3.34Ki [NEW] +3.17Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::h7112218fdd761b82
[NEW] +2.62Ki [NEW] +2.46Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::h671ea09455371842
[NEW] +2.26Ki [NEW] +2.12Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::create_request::h3dcd64c14c05724e
[NEW] +1.84Ki [NEW] +1.76Ki tokio::runtime::task::raw::poll::h5f0300b45ba2987f
[NEW] +1.78Ki [NEW] +1.71Ki tokio::runtime::task::raw::poll::h44baad23952b6723
[DEL] -1.78Ki [DEL] -1.71Ki tokio::runtime::task::raw::poll::h99c86f35617a7f38
[DEL] -1.84Ki [DEL] -1.76Ki tokio::runtime::task::raw::poll::h22bad6a42a6d9e4c
-0.2% -2.18Ki -0.1% -654 [131 Others]
[DEL] -2.62Ki [DEL] -2.46Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::hc874e837eec984f6
[DEL] -3.34Ki [DEL] -3.18Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::hde2ee702a30f52ff
[DEL] -3.93Ki [DEL] -3.78Ki _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h0d466dc6294304ee
[DEL] -5.72Ki [DEL] -5.57Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::hdba5aa9ff3fada65
[DEL] -9.63Ki [DEL] -9.51Ki saluki_components::common::datadog::apm::ApmConfig::from_configuration::hbd8a0046fecbdb97
[DEL] -10.5Ki [DEL] -10.3Ki _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::hac79458d260639d8
[DEL] -13.2Ki [DEL] -13.1Ki saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h1199aefcd5f2ab2d
[DEL] -17.3Ki [DEL] -17.1Ki saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::hfafcbd98d1699351
-0.0% -3.78Ki -0.0% -2.23Ki TOTAL
thieman
commented
Apr 10, 2026
| // _dd.p.dm to trace chunk tags only (not span meta). For the legacy OTLP sampling path, | ||
| // it is written to both. We match that behavior by skipping the span meta write only when | ||
| // both conditions hold; the DM value still flows through TraceSampling to the encoder. | ||
| if priority > 0 && !(is_otlp && self.probabilistic_sampler_enabled) { |
Contributor
Author
There was a problem hiding this comment.
[Claude Sonnet 4.6] DDA references for this behavior:
- Probabilistic path (chunk tags): when
ProbabilisticSamplerEnabled=true, DDA writes_dd.p.dmtopt.TraceChunk.Tags(chunk-level), not span meta: https://github.com/DataDog/datadog-agent/blob/a90340956b9eaab7e5c16119cc45cfc408901d49/pkg/trace/agent/agent.go#L1090 - Legacy OTLP pre-sampling path (span meta): when
ProbabilisticSamplerEnabled=false, the OTLP receiver writes_dd.p.dmdirectly tospans[0].Metaviatraceutil.SetMeta: https://github.com/DataDog/datadog-agent/blob/d01d85293503f17dad8c2da1b9f00ec0ccc5556e/pkg/trace/api/otlp.go#L568-L585
Regression Detector (Agent Data Plane)This comment was omitted because it was over 65,536 characters.Please check the Gitlab Job logs to see its output. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
otlp-traces-probabilisticthat verifies ADP makes identical keep/drop decisions as the DD Agent baseline when the APM-level probabilistic sampler is enabled at 50%_dd.p.dmtag placement: whenprobabilistic_sampler_enabled=true, the DD Agent writes this to trace chunk metadata tags only (not span meta); ADP was incorrectly writing it to span metaground-truthto skip both sides whenotlp_direct_analysis_mode: true, since probabilistic sampling may legitimately drop the first-per-service spans that_dd.install.idis attached toDD Agent behavior references
pkg/trace/sampler/probabilistic_sampler.go— deterministicfnv1a_32(trace_id) & 0x3FFF < rate * 0x4000_dd.p.dmwritten to chunk tags (not span meta) for OTLP+probabilistic:pkg/trace/agent/agent.go#runSamplersV1— the probabilistic sampler path callssetChunkAttributeswhich writes DM to chunk-level metadata, distinct from the span meta path used in legacy samplingpkg/trace/api/otlp.go#L561-L585— whenProbabilisticSamplerEnabled=false, the OTLP receiver pre-assignspriority=AutoKeepanddm=-9before the sampler runs; this path does write DM to span metaotlp_direct_analysis_mode: Used in other OTLP trace correctness tests because ADP computes APM stats post-sampling while the DD Agent computes them pre-sampling; stats comparison is intentionally skipped in this modeTest plan
make test-correctness-otlp-traces-probabilisticpassesmake test-correctness-otlp-tracesstill passes (legacy OTLP path unaffected)make test-correctness-otlp-traces-etsstill passestrace_samplerunit tests pass (53/53)🤖 Generated with Claude Sonnet 4.6