Skip to content

fix(trace sampler): correctness test for probabilistic sampler + fixes#1344

Open
thieman wants to merge 4 commits intothieman/error-tracking-standalonefrom
thieman/traces-probabilistic-correctness
Open

fix(trace sampler): correctness test for probabilistic sampler + fixes#1344
thieman wants to merge 4 commits intothieman/error-tracking-standalonefrom
thieman/traces-probabilistic-correctness

Conversation

@thieman
Copy link
Copy Markdown
Contributor

@thieman thieman commented Apr 10, 2026

Summary

  • Adds a new correctness test otlp-traces-probabilistic that verifies ADP makes identical keep/drop decisions as the DD Agent baseline when the APM-level probabilistic sampler is enabled at 50%
  • Fixes _dd.p.dm tag placement: when probabilistic_sampler_enabled=true, the DD Agent writes this to trace chunk metadata tags only (not span meta); ADP was incorrectly writing it to span meta
  • Fixes the SSI metadata check in ground-truth to skip both sides when otlp_direct_analysis_mode: true, since probabilistic sampling may legitimately drop the first-per-service spans that _dd.install.id is attached to

DD Agent behavior references

  • Probabilistic sampler decision: pkg/trace/sampler/probabilistic_sampler.go — deterministic fnv1a_32(trace_id) & 0x3FFF < rate * 0x4000
  • _dd.p.dm written to chunk tags (not span meta) for OTLP+probabilistic: pkg/trace/agent/agent.go#runSamplersV1 — the probabilistic sampler path calls setChunkAttributes which writes DM to chunk-level metadata, distinct from the span meta path used in legacy sampling
  • OTLP receiver pre-sampling (legacy path): pkg/trace/api/otlp.go#L561-L585 — when ProbabilisticSamplerEnabled=false, the OTLP receiver pre-assigns priority=AutoKeep and dm=-9 before the sampler runs; this path does write DM to span meta
  • otlp_direct_analysis_mode: Used in other OTLP trace correctness tests because ADP computes APM stats post-sampling while the DD Agent computes them pre-sampling; stats comparison is intentionally skipped in this mode

Test plan

  • make test-correctness-otlp-traces-probabilistic passes
  • make test-correctness-otlp-traces still passes (legacy OTLP path unaffected)
  • make test-correctness-otlp-traces-ets still passes
  • All trace_sampler unit tests pass (53/53)

🤖 Generated with Claude Sonnet 4.6

thieman and others added 3 commits April 10, 2026 14:33
Tests that ADP's probabilistic sampler (at 50%) makes identical keep/drop
decisions and produces the same APM stats as the Datadog Agent baseline.

The probabilistic sampler is deterministic (FNV1a hash of trace ID), so both
sides receiving the same millstone corpus should keep exactly the same traces.
Any divergence in span counts or stats indicates a bug in ADP's pipeline —
notably, ADP currently computes APM stats post-sampling while the Agent
computes them pre-sampling, which this test is expected to surface.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… sampler test

ADP intentionally computes APM stats only on sampled traces (post-sampling),
whereas the Agent computes stats on all traces pre-sampling. Enable
otlp_direct_analysis_mode to skip stats comparison and focus the test on
verifying that both sides make identical probabilistic sampling decisions
and produce identical span output for kept traces.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…OTLP traces with probabilistic sampler

When the APM-level probabilistic sampler is enabled, the DD Agent writes
`_dd.p.dm` to trace chunk metadata tags only, not to individual span meta.
ADP was incorrectly writing it to span meta for all traces regardless of
path.

Fix: skip the span meta write for OTLP traces when probabilistic_sampler
is enabled, while still passing the DM value through TraceSampling so the
encoder writes it to chunk tags. The legacy OTLP pre-sampling path (without
probabilistic_sampler_enabled) continues to write DM to span meta, matching
existing agent behavior.

Also fix the SSI metadata check in ground-truth: when otlp_direct_analysis_mode
is true, sampling may legitimately drop the first-per-service spans that
_dd.install.id is attached to. Wrap both baseline and comparison SSI checks
in the same !otlp_direct_analysis_mode gate (matching what the comment
already described).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dd-octo-sts dd-octo-sts bot added area/components Sources, transforms, and destinations. area/ci CI/CD, automated testing, etc. area/test All things testing: unit/integration, correctness, SMP regression, etc. encoder/datadog-traces Datadog Traces encoder. transform/trace-sampler Trace Sampler synchronous transform. labels Apr 10, 2026
error_count += 1;
// present (when not in OTLP-direct mode). In OTLP-direct mode, probabilistic sampling may legitimately drop
// the first-per-service spans that SSI metadata is attached to, so the check is skipped on both sides.
if !self.options.otlp_direct_analysis_mode {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix seems legitimate to me as it better aligns with what the comment was saying. Still a bit wary to change the tests rather than fixing the impl though, so check me please

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thieman thieman changed the base branch from main to thieman/error-tracking-standalone April 10, 2026 19:24
@thieman thieman changed the title test(correctness): add probabilistic sampler correctness test + fix _dd.p.dm chunk tag behavior fix(trace sampler): write _dd.p.dm to chunk tags for OTLP traces with probabilistic sampler Apr 10, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Apr 10, 2026

Binary Size Analysis (Agent Data Plane)

Target: bdcdc6c (baseline) vs 575052d (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 26.47 MiB
Comparison Size: 26.47 MiB
Size Change: -3.78 KiB (-0.01%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module File Size Symbols
core -4.17 KiB 46
[Unmapped] -1.77 KiB 1
saluki_components::common::datadog +694 B 22
saluki_components::encoders::datadog +578 B 5
saluki_components::transforms::trace_sampler +577 B 1
serde_core -153 B 11
http +148 B 2
agent_data_plane::cli::run +124 B 1
[sections] +101 B 5
saluki_core::data_model::event +83 B 2
saluki_core::topology::shutdown +24 B 1
unicode_segmentation +16 B 1
tokio +8 B 28
agent_data_plane::components::tag_filterlist +4 B 1
figment +4 B 1
saluki_common::task::instrument +0 B 2
anyhow +0 B 17
tracing_core +0 B 4

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +17.3Ki  [NEW] +17.2Ki    saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::he766a0eb48dc2ca7
  [NEW] +13.2Ki  [NEW] +13.0Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h521a21c05bafb1bd
  [NEW] +10.5Ki  [NEW] +10.3Ki    _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::h4137df811c47120b
  [NEW] +9.71Ki  [NEW] +9.60Ki    saluki_components::common::datadog::apm::ApmConfig::from_configuration::hdd7d8c6f0bfa77f8
  [NEW] +5.71Ki  [NEW] +5.55Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::h9941a28ad84c7d85
  [NEW] +3.34Ki  [NEW] +3.17Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::h7112218fdd761b82
  [NEW] +2.62Ki  [NEW] +2.46Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::h671ea09455371842
  [NEW] +2.26Ki  [NEW] +2.12Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::create_request::h3dcd64c14c05724e
  [NEW] +1.84Ki  [NEW] +1.76Ki    tokio::runtime::task::raw::poll::h5f0300b45ba2987f
  [NEW] +1.78Ki  [NEW] +1.71Ki    tokio::runtime::task::raw::poll::h44baad23952b6723
  [DEL] -1.78Ki  [DEL] -1.71Ki    tokio::runtime::task::raw::poll::h99c86f35617a7f38
  [DEL] -1.84Ki  [DEL] -1.76Ki    tokio::runtime::task::raw::poll::h22bad6a42a6d9e4c
  -0.2% -2.18Ki  -0.1%    -654    [131 Others]
  [DEL] -2.62Ki  [DEL] -2.46Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush_inner::_{{closure}}::hc874e837eec984f6
  [DEL] -3.34Ki  [DEL] -3.18Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::try_split_request::_{{closure}}::hde2ee702a30f52ff
  [DEL] -3.93Ki  [DEL] -3.78Ki    _<core::marker::PhantomData<T> as serde_core::de::DeserializeSeed>::deserialize::h0d466dc6294304ee
  [DEL] -5.72Ki  [DEL] -5.57Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::flush::_{{closure}}::hdba5aa9ff3fada65
  [DEL] -9.63Ki  [DEL] -9.51Ki    saluki_components::common::datadog::apm::ApmConfig::from_configuration::hbd8a0046fecbdb97
  [DEL] -10.5Ki  [DEL] -10.3Ki    _<saluki_common::task::instrument::InstrumentedTask<F> as core::future::future::Future>::poll::hac79458d260639d8
  [DEL] -13.2Ki  [DEL] -13.1Ki    saluki_components::common::datadog::request_builder::RequestBuilder<E>::encode_inner::_{{closure}}::h1199aefcd5f2ab2d
  [DEL] -17.3Ki  [DEL] -17.1Ki    saluki_components::encoders::datadog::traces::TraceEndpointEncoder::encode_tracer_payload::_{{closure}}::_{{closure}}::hfafcbd98d1699351
  -0.0% -3.78Ki  -0.0% -2.23Ki    TOTAL

// _dd.p.dm to trace chunk tags only (not span meta). For the legacy OTLP sampling path,
// it is written to both. We match that behavior by skipping the span meta write only when
// both conditions hold; the DM value still flows through TraceSampling to the encoder.
if priority > 0 && !(is_otlp && self.probabilistic_sampler_enabled) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Claude Sonnet 4.6] DDA references for this behavior:

@thieman thieman changed the title fix(trace sampler): write _dd.p.dm to chunk tags for OTLP traces with probabilistic sampler fix(trace sampler): correctness test for probabilistic sampler + fixes Apr 10, 2026
@thieman thieman marked this pull request as ready for review April 10, 2026 19:36
@thieman thieman requested a review from a team as a code owner April 10, 2026 19:36
Copilot AI review requested due to automatic review settings April 10, 2026 19:36
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Apr 10, 2026

Regression Detector (Agent Data Plane)

This comment was omitted because it was over 65,536 characters.Please check the Gitlab Job logs to see its output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci CI/CD, automated testing, etc. area/components Sources, transforms, and destinations. area/test All things testing: unit/integration, correctness, SMP regression, etc. encoder/datadog-traces Datadog Traces encoder. transform/trace-sampler Trace Sampler synchronous transform.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant