fix(trace sampler): correctness test for probabilistic sampler + fixes by thieman · Pull Request #1344 · DataDog/saluki

thieman · 2026-04-10T19:20:56Z

Summary

Adds a new correctness test otlp-traces-probabilistic that verifies ADP makes identical keep/drop decisions as the DD Agent baseline when the APM-level probabilistic sampler is enabled at 50%
Fixes _dd.p.dm tag placement: when probabilistic_sampler_enabled=true, the DD Agent writes this to trace chunk metadata tags only (not span meta); ADP was incorrectly writing it to span meta
Fixes the SSI metadata check in ground-truth to skip both sides when otlp_direct_analysis_mode: true, since probabilistic sampling may legitimately drop the first-per-service spans that _dd.install.id is attached to

DD Agent behavior references

Probabilistic sampler decision: pkg/trace/sampler/probabilistic_sampler.go — deterministic fnv1a_32(trace_id) & 0x3FFF < rate * 0x4000
_dd.p.dm written to chunk tags (not span meta) for OTLP+probabilistic: pkg/trace/agent/agent.go#runSamplersV1 — the probabilistic sampler path calls setChunkAttributes which writes DM to chunk-level metadata, distinct from the span meta path used in legacy sampling
OTLP receiver pre-sampling (legacy path): pkg/trace/api/otlp.go#L561-L585 — when ProbabilisticSamplerEnabled=false, the OTLP receiver pre-assigns priority=AutoKeep and dm=-9 before the sampler runs; this path does write DM to span meta
otlp_direct_analysis_mode: Used in other OTLP trace correctness tests because ADP computes APM stats post-sampling while the DD Agent computes them pre-sampling; stats comparison is intentionally skipped in this mode

Test plan

make test-correctness-otlp-traces-probabilistic passes
make test-correctness-otlp-traces still passes (legacy OTLP path unaffected)
make test-correctness-otlp-traces-ets still passes
All trace_sampler unit tests pass (53/53)

Tests that ADP's probabilistic sampler (at 50%) makes identical keep/drop decisions and produces the same APM stats as the Datadog Agent baseline. The probabilistic sampler is deterministic (FNV1a hash of trace ID), so both sides receiving the same millstone corpus should keep exactly the same traces. Any divergence in span counts or stats indicates a bug in ADP's pipeline — notably, ADP currently computes APM stats post-sampling while the Agent computes them pre-sampling, which this test is expected to surface. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… sampler test ADP intentionally computes APM stats only on sampled traces (post-sampling), whereas the Agent computes stats on all traces pre-sampling. Enable otlp_direct_analysis_mode to skip stats comparison and focus the test on verifying that both sides make identical probabilistic sampling decisions and produce identical span output for kept traces. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…OTLP traces with probabilistic sampler When the APM-level probabilistic sampler is enabled, the DD Agent writes `_dd.p.dm` to trace chunk metadata tags only, not to individual span meta. ADP was incorrectly writing it to span meta for all traces regardless of path. Fix: skip the span meta write for OTLP traces when probabilistic_sampler is enabled, while still passing the DM value through TraceSampling so the encoder writes it to chunk tags. The legacy OTLP pre-sampling path (without probabilistic_sampler_enabled) continues to write DM to span meta, matching existing agent behavior. Also fix the SSI metadata check in ground-truth: when otlp_direct_analysis_mode is true, sampling may legitimately drop the first-per-service spans that _dd.install.id is attached to. Wrap both baseline and comparison SSI checks in the same !otlp_direct_analysis_mode gate (matching what the comment already described). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman · 2026-04-10T19:23:02Z

bin/correctness/ground-truth/src/analysis/traces/mod.rs

-            error_count += 1;
+        // present (when not in OTLP-direct mode). In OTLP-direct mode, probabilistic sampling may legitimately drop
+        // the first-per-service spans that SSI metadata is attached to, so the check is skipped on both sides.
+        if !self.options.otlp_direct_analysis_mode {


This fix seems legitimate to me as it better aligns with what the comment was saying. Still a bit wary to change the tests rather than fixing the impl though, so check me please

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pr-commenter · 2026-04-10T19:31:36Z

Binary Size Analysis (Agent Data Plane)

Target: bdcdc6c (baseline) vs 575052d (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 26.47 MiB
Comparison Size: 26.47 MiB
Size Change: -3.78 KiB (-0.01%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module	File Size	Symbols
`core`	-4.17 KiB	46
`[Unmapped]`	-1.77 KiB	1
`saluki_components::common::datadog`	+694 B	22
`saluki_components::encoders::datadog`	+578 B	5
`saluki_components::transforms::trace_sampler`	+577 B	1
`serde_core`	-153 B	11
`http`	+148 B	2
`agent_data_plane::cli::run`	+124 B	1
`[sections]`	+101 B	5
`saluki_core::data_model::event`	+83 B	2
`saluki_core::topology::shutdown`	+24 B	1
`unicode_segmentation`	+16 B	1
`tokio`	+8 B	28
`agent_data_plane::components::tag_filterlist`	+4 B	1
`figment`	+4 B	1
`saluki_common::task::instrument`	+0 B	2
`anyhow`	+0 B	17
`tracing_core`	+0 B	4

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +17.3Ki  [NEW] +17.2Ki    saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::he766a0eb48dc2ca7
  [NEW] +13.2Ki  [NEW] +13.0Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h521a21c05bafb1bd
  [NEW] +10.5Ki  [NEW] +10.3Ki    _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::h4137df811c47120b
  [NEW] +9.71Ki  [NEW] +9.60Ki    saluki_components::common::datadog::apm::ApmConfig::from_configuration::hdd7d8c6f0bfa77f8
  [NEW] +5.71Ki  [NEW] +5.55Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::h9941a28ad84c7d85
  [NEW] +3.34Ki  [NEW] +3.17Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::h7112218fdd761b82
  [NEW] +2.62Ki  [NEW] +2.46Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::h671ea09455371842
  [NEW] +2.26Ki  [NEW] +2.12Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::create_request::h3dcd64c14c05724e
  [NEW] +1.84Ki  [NEW] +1.76Ki    tokio::runtime::task::raw::poll::h5f0300b45ba2987f
  [NEW] +1.78Ki  [NEW] +1.71Ki    tokio::runtime::task::raw::poll::h44baad23952b6723
  [DEL] -1.78Ki  [DEL] -1.71Ki    tokio::runtime::task::raw::poll::h99c86f35617a7f38
  [DEL] -1.84Ki  [DEL] -1.76Ki    tokio::runtime::task::raw::poll::h22bad6a42a6d9e4c
  -0.2% -2.18Ki  -0.1%    -654    [131 Others]
  [DEL] -2.62Ki  [DEL] -2.46Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::hc874e837eec984f6
  [DEL] -3.34Ki  [DEL] -3.18Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::hde2ee702a30f52ff
  [DEL] -3.93Ki  [DEL] -3.78Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h0d466dc6294304ee
  [DEL] -5.72Ki  [DEL] -5.57Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::hdba5aa9ff3fada65
  [DEL] -9.63Ki  [DEL] -9.51Ki    saluki_components::common::datadog::apm::ApmConfig::from_configuration::hbd8a0046fecbdb97
  [DEL] -10.5Ki  [DEL] -10.3Ki    _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::hac79458d260639d8
  [DEL] -13.2Ki  [DEL] -13.1Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h1199aefcd5f2ab2d
  [DEL] -17.3Ki  [DEL] -17.1Ki    saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::hfafcbd98d1699351
  -0.0% -3.78Ki  -0.0% -2.23Ki    TOTAL

thieman · 2026-04-10T19:32:24Z

lib/saluki-components/src/transforms/trace_sampler/mod.rs

+        // _dd.p.dm to trace chunk tags only (not span meta). For the legacy OTLP sampling path,
+        // it is written to both. We match that behavior by skipping the span meta write only when
+        // both conditions hold; the DM value still flows through TraceSampling to the encoder.
+        if priority > 0 && !(is_otlp && self.probabilistic_sampler_enabled) {


[Claude Sonnet 4.6] DDA references for this behavior:

Probabilistic path (chunk tags): when ProbabilisticSamplerEnabled=true, DDA writes _dd.p.dm to pt.TraceChunk.Tags (chunk-level), not span meta: https://github.com/DataDog/datadog-agent/blob/a90340956b9eaab7e5c16119cc45cfc408901d49/pkg/trace/agent/agent.go#L1090

Legacy OTLP pre-sampling path (span meta): when ProbabilisticSamplerEnabled=false, the OTLP receiver writes _dd.p.dm directly to spans[0].Meta via traceutil.SetMeta: https://github.com/DataDog/datadog-agent/blob/d01d85293503f17dad8c2da1b9f00ec0ccc5556e/pkg/trace/api/otlp.go#L568-L585

pr-commenter · 2026-04-10T19:46:45Z

Regression Detector (Agent Data Plane)

This comment was omitted because it was over 65,536 characters.Please check the Gitlab Job logs to see its output.

thieman and others added 3 commits April 10, 2026 14:33

thieman commented Apr 10, 2026

View reviewed changes

chore(ground-truth): revert added comment on SSI metadata check

575052d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thieman changed the base branch from main to thieman/error-tracking-standalone April 10, 2026 19:24

thieman changed the title ~~test(correctness): add probabilistic sampler correctness test + fix _dd.p.dm chunk tag behavior~~ fix(trace sampler): write _dd.p.dm to chunk tags for OTLP traces with probabilistic sampler Apr 10, 2026

thieman commented Apr 10, 2026

View reviewed changes

thieman changed the title ~~fix(trace sampler): write _dd.p.dm to chunk tags for OTLP traces with probabilistic sampler~~ fix(trace sampler): correctness test for probabilistic sampler + fixes Apr 10, 2026

thieman marked this pull request as ready for review April 10, 2026 19:36

thieman requested a review from a team as a code owner April 10, 2026 19:36

Copilot AI review requested due to automatic review settings April 10, 2026 19:36

Copilot started reviewing on behalf of thieman April 10, 2026 19:37 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(trace sampler): correctness test for probabilistic sampler + fixes#1344

fix(trace sampler): correctness test for probabilistic sampler + fixes#1344
thieman wants to merge 4 commits intothieman/error-tracking-standalonefrom
thieman/traces-probabilistic-correctness

thieman commented Apr 10, 2026

Uh oh!

thieman Apr 10, 2026

Uh oh!

pr-commenter bot commented Apr 10, 2026 •

edited

Loading

Changes by Module

Detailed Symbol Changes

Uh oh!

thieman Apr 10, 2026

Uh oh!

pr-commenter bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thieman commented Apr 10, 2026

Summary

DD Agent behavior references

Test plan

Uh oh!

thieman Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

pr-commenter bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Binary Size Analysis (Agent Data Plane)

Changes by Module

Detailed Symbol Changes

Uh oh!

thieman Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

pr-commenter bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector (Agent Data Plane)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pr-commenter bot commented Apr 10, 2026 •

edited

Loading

pr-commenter bot commented Apr 10, 2026 •

edited

Loading