Skip to content

chore(agent-data-plane): tag_filterlist correctness test#1251

Closed
rayz wants to merge 12 commits intoolivielpeau/tag-aggregation-pre-aggrfrom
raymond/tag-agg-correctness
Closed

chore(agent-data-plane): tag_filterlist correctness test#1251
rayz wants to merge 12 commits intoolivielpeau/tag-aggregation-pre-aggrfrom
raymond/tag-agg-correctness

Conversation

@rayz
Copy link
Copy Markdown
Contributor

@rayz rayz commented Mar 20, 2026

Summary

Tag filtering correctness tests with registry.datadoghq.com/agent:7.76.2-jmx as baseline.

Test case Implementation Image PR
dsd-tag-filterlist ADP initial implementation saluki-images/datadog-agent:testing-release #1247
dsd-optimized-tag-filterlist Core Agent pre-aggr with optimizations registry.datadoghq.com/agent-dev:ff848d05-py3-jmx DataDog/datadog-agent#47085
dsd-additional-optimized-tag-filterlist Core Agent additional pre-aggr optimizations (earlier filtering in enrichment step, hash caching) registry.datadoghq.com/agent-dev:62567276-py3-jmx DataDog/datadog-agent#48214
dsd-tag-filterlist-context-cache Core Agent pre-aggr with context cache registry.datadoghq.com/agent-dev:aef811fd-py3-jmx DataDog/datadog-agent#47958
dsd-tag-filterlist-post-aggr Core Agent post-aggregation registry.datadoghq.com/agent-dev:ca27cfe6-py3-jmx DataDog/datadog-agent#47276

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

How did you test this PR?

CI

References

Copilot AI review requested due to automatic review settings March 20, 2026 12:32
@dd-octo-sts dd-octo-sts bot added area/components Sources, transforms, and destinations. area/test All things testing: unit/integration, correctness, SMP regression, etc. labels Mar 20, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new correctness test case covering DogStatsD metric tag filterlist behavior and fixes ground-truth metric normalization so tag ordering differences don’t incorrectly split/duplicate aggregated metrics.

Changes:

  • Add a new dsd-tag-filterlist correctness scenario (Millstone corpus + Agent config + runner config) and wire it into make test-correctness.
  • Normalize metric contexts (sort/dedup tags) before grouping/aggregating in ground-truth metric analysis, plus add a regression unit test.
  • Minor import cleanup in the tag filterlist transform module.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test/correctness/dsd-tag-filterlist/millstone.yaml Defines a deterministic corpus that exercises include/exclude tag filtering and tag-order variation.
test/correctness/dsd-tag-filterlist/datadog.yaml Configures DogStatsD UDS + tag filterlist rules for the scenario.
test/correctness/dsd-tag-filterlist/config.yaml Registers the new correctness case (baseline vs comparison containers).
lib/saluki-components/src/transforms/tag_filterlist/mod.rs Simplifies the Metric type import used by filter_metric_tags.
bin/correctness/ground-truth/src/analysis/metrics/types.rs Fixes grouping by using normalized contexts so equivalent tag sets match regardless of order; adds unit test.
Makefile Adds test-correctness-dsd-tag-filterlist and includes it in test-correctness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 20, 2026

Regression Detector (Agent Data Plane)

Regression Detector Results

Run ID: 73f486ee-78ff-432e-8637-0af5b802d7fb

Baseline: 1d50af4
Comparison: 2dcdab8
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
otlp_ingest_logs_5mb_memory memory utilization +1.96 [+1.38, +2.54] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput +0.00 [-0.12, +0.13] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_cpu % cpu utilization -1.08 [-5.72, +3.55] 1 (metrics) (profiles) (logs)

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
dsd_uds_10mb_3k_contexts_cpu % cpu utilization +8.50 [-22.76, +39.77] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_cpu % cpu utilization +3.98 [-50.19, +58.15] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_cpu % cpu utilization +2.98 [-53.47, +59.44] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_memory memory utilization +2.92 [+2.71, +3.13] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_memory memory utilization +1.96 [+1.38, +2.54] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_cpu % cpu utilization +1.21 [-6.58, +9.00] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory utilization +1.16 [+0.96, +1.35] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_memory memory utilization +1.15 [+0.98, +1.31] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_memory memory utilization +1.07 [+0.89, +1.25] 1 (metrics) (profiles) (logs)
quality_gates_rss_idle memory utilization +0.91 [+0.88, +0.94] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_memory memory utilization +0.89 [+0.71, +1.07] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_memory memory utilization +0.88 [+0.63, +1.13] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_memory memory utilization +0.82 [+0.65, +0.99] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory utilization +0.60 [+0.41, +0.79] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_memory memory utilization +0.48 [+0.32, +0.64] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_memory memory utilization +0.47 [+0.22, +0.72] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_memory memory utilization +0.37 [+0.04, +0.71] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_cpu % cpu utilization +0.33 [-2.21, +2.88] 1 (metrics) (profiles) (logs)
otlp_ingest_metrics_5mb_throughput ingress throughput +0.02 [-0.11, +0.16] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_heavy memory utilization +0.02 [-0.13, +0.16] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_throughput ingress throughput +0.01 [-0.03, +0.05] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_throughput ingress throughput +0.00 [-0.12, +0.13] 1 (metrics) (profiles) (logs)
dsd_uds_1mb_3k_contexts_throughput ingress throughput +0.00 [-0.05, +0.06] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_throughput ingress throughput +0.00 [-0.02, +0.02] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_throughput ingress throughput +0.00 [-0.02, +0.02] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_filtering_5mb_throughput ingress throughput -0.00 [-0.02, +0.02] 1 (metrics) (profiles) (logs)
dsd_uds_512kb_3k_contexts_throughput ingress throughput -0.00 [-0.06, +0.05] 1 (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory utilization -0.01 [-0.14, +0.12] 1 (metrics) (profiles) (logs)
dsd_uds_10mb_3k_contexts_throughput ingress throughput -0.01 [-0.16, +0.13] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_cpu % cpu utilization -0.41 [-1.81, +0.98] 1 (metrics) (profiles) (logs)
dsd_uds_500mb_3k_contexts_throughput ingress throughput -0.54 [-0.68, -0.41] 1 (metrics) (profiles) (logs)
dsd_uds_100mb_3k_contexts_cpu % cpu utilization -0.83 [-6.66, +4.99] 1 (metrics) (profiles) (logs)
otlp_ingest_logs_5mb_cpu % cpu utilization -1.08 [-5.72, +3.55] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_5mb_cpu % cpu utilization -1.76 [-4.12, +0.59] 1 (metrics) (profiles) (logs)
otlp_ingest_traces_ottl_transform_5mb_cpu % cpu utilization -3.92 [-6.11, -1.73] 1 (metrics) (profiles) (logs)

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed observed_value links
quality_gates_rss_dsd_heavy memory_usage 10/10 112.16MiB ≤ 140MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_low memory_usage 10/10 34.54MiB ≤ 50MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_medium memory_usage 10/10 53.33MiB ≤ 75MiB (metrics) (profiles) (logs)
quality_gates_rss_dsd_ultraheavy memory_usage 10/10 167.06MiB ≤ 200MiB (metrics) (profiles) (logs)
quality_gates_rss_idle memory_usage 10/10 21.32MiB ≤ 40MiB (metrics) (profiles) (logs)

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 20, 2026

Binary Size Analysis (Agent Data Plane)

Target: 1d50af4 (baseline) vs 2dcdab8 (comparison) diff
Analysis Type: Stripped binaries (debug symbols excluded)
Baseline Size: 26.19 MiB
Comparison Size: 26.28 MiB
Size Change: +90.59 KiB (+0.34%)
Pass/Fail Threshold: +5%
Result: PASSED ✅

Changes by Module

Module File Size Symbols
core +29.56 KiB 8776
saluki_components::transforms::tag_filterlist +26.19 KiB 18
serde_core +8.78 KiB 338
[sections] +6.92 KiB 7
agent_data_plane::cli::run +6.58 KiB 70
hashbrown +5.59 KiB 283
[Unmapped] +2.73 KiB 1
saluki_config::dynamic::watcher -2.34 KiB 2
saluki_context::hash::hash_context +2.15 KiB 1
tokio +1.69 KiB 2123
saluki_core::topology::blueprint +1.19 KiB 27
saluki_context::context::Context +894 B 4
smallvec +753 B 69
saluki_context::resolver::TagsResolver -574 B 13
saluki_components::encoders::datadog +379 B 288
serde_json +281 B 171
saluki_config::ConfigurationLoader::from_environment +249 B 1
saluki_components::common::datadog -240 B 326
saluki_components::sources::otlp -169 B 164
saluki_common::task::instrument -165 B 76

Detailed Symbol Changes

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  [NEW] +1.79Mi  [NEW] +1.79Mi    std::thread::local::LocalKey<T>::with::hf947cb2a7d030d57
  [NEW]  +121Ki  [NEW]  +120Ki    agent_data_plane::cli::run::create_topology::_{{closure}}::hc39e8af12512af57
  [NEW] +84.6Ki  [NEW] +84.5Ki    agent_data_plane::internal::control_plane::spawn_control_plane::_{{closure}}::h5be1ddd17aeb8c4f
  +0.6% +83.5Ki  +0.7% +70.3Ki    [22545 Others]
  [NEW] +64.2Ki  [NEW] +64.0Ki    saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h8ea6d1e501a9b09a
  [NEW] +57.8Ki  [NEW] +57.7Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::h663761cb14f422f3
  [NEW] +49.5Ki  [NEW] +49.3Ki    saluki_app::bootstrap::AppBootstrapper::bootstrap::_{{closure}}::h6bb4e73bcd65b796
  [NEW] +46.4Ki  [NEW] +46.3Ki    h2::proto::connection::Connection<T,P,B>::poll::h35223740d8685346
  [NEW] +46.1Ki  [NEW] +45.9Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::heae6862569bb613c
  [NEW] +45.9Ki  [NEW] +45.7Ki    _<saluki_components::destinations::prometheus::Prometheus as saluki_core::components::destinations::Destination>::run::_{{closure}}::hb3bed4ad4a0f4d9e
  [NEW] +44.4Ki  [NEW] +44.2Ki    _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::h3292df23d23f2b59
  [DEL] -44.4Ki  [DEL] -44.2Ki    _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::hb4d1ad0684099c4c
  [DEL] -46.0Ki  [DEL] -45.8Ki    _<saluki_components::destinations::prometheus::Prometheus as saluki_core::components::destinations::Destination>::run::_{{closure}}::hea5618e9afd08b63
  [DEL] -46.1Ki  [DEL] -45.9Ki    _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::hb0c627893c271e2c
  [DEL] -46.4Ki  [DEL] -46.3Ki    h2::proto::connection::Connection<T,P,B>::poll::h100071ec98c95c21
  [DEL] -49.5Ki  [DEL] -49.4Ki    saluki_app::bootstrap::AppBootstrapper::bootstrap::_{{closure}}::h15f926ef5659e580
  [DEL] -57.8Ki  [DEL] -57.7Ki    agent_data_plane::cli::run::handle_run_command::_{{closure}}::hebabd4c0d26708e2
  [DEL] -64.2Ki  [DEL] -64.0Ki    saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h1c7964dcd8ecbead
  [DEL] -84.6Ki  [DEL] -84.5Ki    agent_data_plane::internal::control_plane::spawn_control_plane::_{{closure}}::h78c38079d35f4f77
  [DEL]  -114Ki  [DEL]  -114Ki    agent_data_plane::cli::run::create_topology::_{{closure}}::h4a1b0b37b7e2d03e
  [DEL] -1.79Mi  [DEL] -1.79Mi    std::thread::local::LocalKey<T>::with::he517c09e5477efe1
  +0.3% +90.6Ki  +0.3% +77.4Ki    TOTAL

@rayz rayz changed the title Tag aggregation correctness tests chore(agent-data-plane): tag_filterlist correctness test Mar 20, 2026
@dd-octo-sts dd-octo-sts bot added the area/ci CI/CD, automated testing, etc. label Mar 20, 2026
@rayz rayz marked this pull request as ready for review March 20, 2026 13:24
@rayz rayz requested a review from a team as a code owner March 20, 2026 13:24
Copilot AI review requested due to automatic review settings March 20, 2026 13:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 23, 2026 22:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 23, 2026 23:50
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 23 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

test/correctness/dsd-additional-optimized-tag-filterlist-fix/config.yaml:20

  • This correctness case directory appears to be unused: it is not referenced by any make test-correctness-* target nor by .gitlab/e2e.yml, so it will likely bit-rot and confuse future readers. Either wire it into the Makefile/CI (if it’s meant to be a supported test case) or remove it / fold it into the intended case (e.g., document it as a local-only reproduction and keep it out of the main suite).
analysis_mode: metrics
millstone:
  image: saluki-images/millstone:latest
  config_path: millstone.yaml
datadog_intake:
  image: saluki-images/datadog-intake:latest
  config_path: ../datadog-intake.yaml
baseline:
  image: registry.datadoghq.com/agent:7.76.2-jmx
  files:
    - datadog.yaml:/etc/datadog-agent/datadog.yaml
  additional_env_vars:
    - DD_API_KEY=correctness-test
comparison:
  image: datadog/agent-dev:efcf783f-py3-jmx
  files:
    - datadog.yaml:/etc/datadog-agent/datadog.yaml
  additional_env_vars:
    - DD_API_KEY=correctness-test


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@webern webern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Is the -fix test case a mistake? Left a few other ponderous comments.

.PHONY: test-correctness
test-correctness: ## Runs the complete correctness suite
test-correctness: test-correctness-dsd-plain test-correctness-dsd-origin-detection test-correctness-otlp-metrics test-correctness-otlp-traces test-correctness-otlp-traces-ottl-filtering test-correctness-otlp-traces-ottl-transform
test-correctness: test-correctness-dsd-plain test-correctness-dsd-origin-detection test-correctness-dsd-tag-filterlist test-correctness-dsd-optimized-tag-filterlist test-correctness-dsd-additional-optimized-tag-filterlist test-correctness-dsd-tag-filterlist-context-cache test-correctness-dsd-tag-filterlist-post-aggr test-correctness-otlp-metrics test-correctness-otlp-traces test-correctness-otlp-traces-ottl-filtering test-correctness-otlp-traces-ottl-transform
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wrap these, e.g. one-per-line?

@echo "[*] Running 'dsd-origin-detection' correctness test case..."
@target/release/ground-truth $(shell pwd)/test/correctness/dsd-origin-detection/config.yaml

.PHONY: test-correctness-dsd-tag-filterlist
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reduce code duplication by having the correctness test "recipe" stated only once then sending the name into it?

  echo-anything:
        @echo $(NAME)

  hello:
        $(MAKE) echo-anything NAME="Hello, World"

  goodbye:
        $(MAKE) echo-anything NAME="Goodbye, World"

  farewell:
        $(MAKE) echo-anything NAME="Farewell, friend"

dependencies = [
"libc",
"windows-sys 0.52.0",
"windows-sys 0.59.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me we should churn the Cargo.lock file on PRs unrelated to updating crates.

@rayz rayz marked this pull request as draft March 24, 2026 14:12
Copilot AI review requested due to automatic review settings March 24, 2026 14:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 523 to +524
test-correctness: ## Runs the complete correctness suite
test-correctness: test-correctness-dsd-plain test-correctness-dsd-origin-detection test-correctness-otlp-metrics test-correctness-otlp-traces test-correctness-otlp-traces-ottl-filtering test-correctness-otlp-traces-ottl-transform
test-correctness: test-correctness-dsd-plain test-correctness-dsd-origin-detection test-correctness-dsd-tag-filterlist test-correctness-dsd-optimized-tag-filterlist test-correctness-dsd-additional-optimized-tag-filterlist test-correctness-dsd-tag-filterlist-context-cache test-correctness-dsd-tag-filterlist-post-aggr test-correctness-otlp-metrics test-correctness-otlp-traces test-correctness-otlp-traces-ottl-filtering test-correctness-otlp-traces-ottl-transform
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test-correctness now includes multiple new cases whose configs pin registry.datadoghq.com/agent-dev:... images. This makes the default “complete correctness suite” depend on access to those external/pinned images and may cause local runs to fail (or significantly increase runtime) for developers/environments without that registry access. Consider keeping the default test-correctness target limited to repo-built images and adding a separate target (e.g. test-correctness-agent-dev / test-correctness-tag-filterlist-variants) for the agent-dev comparison cases.

Copilot uses AI. Check for mistakes.
@rayz
Copy link
Copy Markdown
Contributor Author

rayz commented Apr 1, 2026

moved to 1297 and only comparing 7.76 and ADP

@rayz rayz closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci CI/CD, automated testing, etc. area/components Sources, transforms, and destinations. area/test All things testing: unit/integration, correctness, SMP regression, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants