chore(agent-data-plane): tag_filterlist correctness test#1251
chore(agent-data-plane): tag_filterlist correctness test#1251rayz wants to merge 12 commits intoolivielpeau/tag-aggregation-pre-aggrfrom
tag_filterlist correctness test#1251Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new correctness test case covering DogStatsD metric tag filterlist behavior and fixes ground-truth metric normalization so tag ordering differences don’t incorrectly split/duplicate aggregated metrics.
Changes:
- Add a new
dsd-tag-filterlistcorrectness scenario (Millstone corpus + Agent config + runner config) and wire it intomake test-correctness. - Normalize metric contexts (sort/dedup tags) before grouping/aggregating in ground-truth metric analysis, plus add a regression unit test.
- Minor import cleanup in the tag filterlist transform module.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| test/correctness/dsd-tag-filterlist/millstone.yaml | Defines a deterministic corpus that exercises include/exclude tag filtering and tag-order variation. |
| test/correctness/dsd-tag-filterlist/datadog.yaml | Configures DogStatsD UDS + tag filterlist rules for the scenario. |
| test/correctness/dsd-tag-filterlist/config.yaml | Registers the new correctness case (baseline vs comparison containers). |
| lib/saluki-components/src/transforms/tag_filterlist/mod.rs | Simplifies the Metric type import used by filter_metric_tags. |
| bin/correctness/ground-truth/src/analysis/metrics/types.rs | Fixes grouping by using normalized contexts so equivalent tag sets match regardless of order; adds unit test. |
| Makefile | Adds test-correctness-dsd-tag-filterlist and includes it in test-correctness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: 73f486ee-78ff-432e-8637-0af5b802d7fb Baseline: 1d50af4 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | +1.96 | [+1.38, +2.54] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.00 | [-0.12, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | -1.08 | [-5.72, +3.55] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | +8.50 | [-22.76, +39.77] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | +3.98 | [-50.19, +58.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | +2.98 | [-53.47, +59.44] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | +2.92 | [+2.71, +3.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | +1.96 | [+1.38, +2.54] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | +1.21 | [-6.58, +9.00] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | +1.16 | [+0.96, +1.35] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | +1.15 | [+0.98, +1.31] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | +1.07 | [+0.89, +1.25] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | +0.91 | [+0.88, +0.94] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | +0.89 | [+0.71, +1.07] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | +0.88 | [+0.63, +1.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | +0.82 | [+0.65, +0.99] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | +0.60 | [+0.41, +0.79] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | +0.48 | [+0.32, +0.64] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | +0.47 | [+0.22, +0.72] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | +0.37 | [+0.04, +0.71] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | +0.33 | [-2.21, +2.88] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | +0.02 | [-0.11, +0.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | +0.02 | [-0.13, +0.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | +0.01 | [-0.03, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.00 | [-0.12, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.05, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | -0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | -0.01 | [-0.14, +0.12] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | -0.01 | [-0.16, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | -0.41 | [-1.81, +0.98] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | -0.54 | [-0.68, -0.41] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | -0.83 | [-6.66, +4.99] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | -1.08 | [-5.72, +3.55] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | -1.76 | [-4.12, +0.59] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | -3.92 | [-6.11, -1.73] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 112.16MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 34.54MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 53.33MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 167.06MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 21.32MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Binary Size Analysis (Agent Data Plane)Target: 1d50af4 (baseline) vs 2dcdab8 (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
core |
+29.56 KiB | 8776 |
saluki_components::transforms::tag_filterlist |
+26.19 KiB | 18 |
serde_core |
+8.78 KiB | 338 |
[sections] |
+6.92 KiB | 7 |
agent_data_plane::cli::run |
+6.58 KiB | 70 |
hashbrown |
+5.59 KiB | 283 |
[Unmapped] |
+2.73 KiB | 1 |
saluki_config::dynamic::watcher |
-2.34 KiB | 2 |
saluki_context::hash::hash_context |
+2.15 KiB | 1 |
tokio |
+1.69 KiB | 2123 |
saluki_core::topology::blueprint |
+1.19 KiB | 27 |
saluki_context::context::Context |
+894 B | 4 |
smallvec |
+753 B | 69 |
saluki_context::resolver::TagsResolver |
-574 B | 13 |
saluki_components::encoders::datadog |
+379 B | 288 |
serde_json |
+281 B | 171 |
saluki_config::ConfigurationLoader::from_environment |
+249 B | 1 |
saluki_components::common::datadog |
-240 B | 326 |
saluki_components::sources::otlp |
-169 B | 164 |
saluki_common::task::instrument |
-165 B | 76 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +1.79Mi [NEW] +1.79Mi std::thread::local::LocalKey<T>::with::hf947cb2a7d030d57
[NEW] +121Ki [NEW] +120Ki agent_data_plane::cli::run::create_topology::_{{closure}}::hc39e8af12512af57
[NEW] +84.6Ki [NEW] +84.5Ki agent_data_plane::internal::control_plane::spawn_control_plane::_{{closure}}::h5be1ddd17aeb8c4f
+0.6% +83.5Ki +0.7% +70.3Ki [22545 Others]
[NEW] +64.2Ki [NEW] +64.0Ki saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h8ea6d1e501a9b09a
[NEW] +57.8Ki [NEW] +57.7Ki agent_data_plane::cli::run::handle_run_command::_{{closure}}::h663761cb14f422f3
[NEW] +49.5Ki [NEW] +49.3Ki saluki_app::bootstrap::AppBootstrapper::bootstrap::_{{closure}}::h6bb4e73bcd65b796
[NEW] +46.4Ki [NEW] +46.3Ki h2::proto::connection::Connection<T,P,B>::poll::h35223740d8685346
[NEW] +46.1Ki [NEW] +45.9Ki _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::heae6862569bb613c
[NEW] +45.9Ki [NEW] +45.7Ki _<saluki_components::destinations::prometheus::Prometheus as saluki_core::components::destinations::Destination>::run::_{{closure}}::hb3bed4ad4a0f4d9e
[NEW] +44.4Ki [NEW] +44.2Ki _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::h3292df23d23f2b59
[DEL] -44.4Ki [DEL] -44.2Ki _<saluki_components::transforms::aggregate::Aggregate as saluki_core::components::transforms::Transform>::run::_{{closure}}::hb4d1ad0684099c4c
[DEL] -46.0Ki [DEL] -45.8Ki _<saluki_components::destinations::prometheus::Prometheus as saluki_core::components::destinations::Destination>::run::_{{closure}}::hea5618e9afd08b63
[DEL] -46.1Ki [DEL] -45.9Ki _<saluki_components::forwarders::otlp::OtlpForwarder as saluki_core::components::forwarders::Forwarder>::run::_{{closure}}::hb0c627893c271e2c
[DEL] -46.4Ki [DEL] -46.3Ki h2::proto::connection::Connection<T,P,B>::poll::h100071ec98c95c21
[DEL] -49.5Ki [DEL] -49.4Ki saluki_app::bootstrap::AppBootstrapper::bootstrap::_{{closure}}::h15f926ef5659e580
[DEL] -57.8Ki [DEL] -57.7Ki agent_data_plane::cli::run::handle_run_command::_{{closure}}::hebabd4c0d26708e2
[DEL] -64.2Ki [DEL] -64.0Ki saluki_components::common::datadog::io::run_endpoint_io_loop::_{{closure}}::h1c7964dcd8ecbead
[DEL] -84.6Ki [DEL] -84.5Ki agent_data_plane::internal::control_plane::spawn_control_plane::_{{closure}}::h78c38079d35f4f77
[DEL] -114Ki [DEL] -114Ki agent_data_plane::cli::run::create_topology::_{{closure}}::h4a1b0b37b7e2d03e
[DEL] -1.79Mi [DEL] -1.79Mi std::thread::local::LocalKey<T>::with::he517c09e5477efe1
+0.3% +90.6Ki +0.3% +77.4Ki TOTAL
tag_filterlist correctness test
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 20 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 23 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
test/correctness/dsd-additional-optimized-tag-filterlist-fix/config.yaml:20
- This correctness case directory appears to be unused: it is not referenced by any
make test-correctness-*target nor by.gitlab/e2e.yml, so it will likely bit-rot and confuse future readers. Either wire it into the Makefile/CI (if it’s meant to be a supported test case) or remove it / fold it into the intended case (e.g., document it as a local-only reproduction and keep it out of the main suite).
analysis_mode: metrics
millstone:
image: saluki-images/millstone:latest
config_path: millstone.yaml
datadog_intake:
image: saluki-images/datadog-intake:latest
config_path: ../datadog-intake.yaml
baseline:
image: registry.datadoghq.com/agent:7.76.2-jmx
files:
- datadog.yaml:/etc/datadog-agent/datadog.yaml
additional_env_vars:
- DD_API_KEY=correctness-test
comparison:
image: datadog/agent-dev:efcf783f-py3-jmx
files:
- datadog.yaml:/etc/datadog-agent/datadog.yaml
additional_env_vars:
- DD_API_KEY=correctness-test
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
webern
left a comment
There was a problem hiding this comment.
Looks pretty good. Is the -fix test case a mistake? Left a few other ponderous comments.
test/correctness/dsd-additional-optimized-tag-filterlist-fix/config.yaml
Outdated
Show resolved
Hide resolved
| .PHONY: test-correctness | ||
| test-correctness: ## Runs the complete correctness suite | ||
| test-correctness: test-correctness-dsd-plain test-correctness-dsd-origin-detection test-correctness-otlp-metrics test-correctness-otlp-traces test-correctness-otlp-traces-ottl-filtering test-correctness-otlp-traces-ottl-transform | ||
| test-correctness: test-correctness-dsd-plain test-correctness-dsd-origin-detection test-correctness-dsd-tag-filterlist test-correctness-dsd-optimized-tag-filterlist test-correctness-dsd-additional-optimized-tag-filterlist test-correctness-dsd-tag-filterlist-context-cache test-correctness-dsd-tag-filterlist-post-aggr test-correctness-otlp-metrics test-correctness-otlp-traces test-correctness-otlp-traces-ottl-filtering test-correctness-otlp-traces-ottl-transform |
There was a problem hiding this comment.
Should we wrap these, e.g. one-per-line?
| @echo "[*] Running 'dsd-origin-detection' correctness test case..." | ||
| @target/release/ground-truth $(shell pwd)/test/correctness/dsd-origin-detection/config.yaml | ||
|
|
||
| .PHONY: test-correctness-dsd-tag-filterlist |
There was a problem hiding this comment.
Should we reduce code duplication by having the correctness test "recipe" stated only once then sending the name into it?
echo-anything:
@echo $(NAME)
hello:
$(MAKE) echo-anything NAME="Hello, World"
goodbye:
$(MAKE) echo-anything NAME="Goodbye, World"
farewell:
$(MAKE) echo-anything NAME="Farewell, friend"
| dependencies = [ | ||
| "libc", | ||
| "windows-sys 0.52.0", | ||
| "windows-sys 0.59.0", |
There was a problem hiding this comment.
Seems to me we should churn the Cargo.lock file on PRs unrelated to updating crates.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 20 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| test-correctness: ## Runs the complete correctness suite | ||
| test-correctness: test-correctness-dsd-plain test-correctness-dsd-origin-detection test-correctness-otlp-metrics test-correctness-otlp-traces test-correctness-otlp-traces-ottl-filtering test-correctness-otlp-traces-ottl-transform | ||
| test-correctness: test-correctness-dsd-plain test-correctness-dsd-origin-detection test-correctness-dsd-tag-filterlist test-correctness-dsd-optimized-tag-filterlist test-correctness-dsd-additional-optimized-tag-filterlist test-correctness-dsd-tag-filterlist-context-cache test-correctness-dsd-tag-filterlist-post-aggr test-correctness-otlp-metrics test-correctness-otlp-traces test-correctness-otlp-traces-ottl-filtering test-correctness-otlp-traces-ottl-transform |
There was a problem hiding this comment.
test-correctness now includes multiple new cases whose configs pin registry.datadoghq.com/agent-dev:... images. This makes the default “complete correctness suite” depend on access to those external/pinned images and may cause local runs to fail (or significantly increase runtime) for developers/environments without that registry access. Consider keeping the default test-correctness target limited to repo-built images and adding a separate target (e.g. test-correctness-agent-dev / test-correctness-tag-filterlist-variants) for the agent-dev comparison cases.
|
moved to 1297 and only comparing 7.76 and ADP |
Summary
Tag filtering correctness tests with
registry.datadoghq.com/agent:7.76.2-jmxas baseline.dsd-tag-filterlistsaluki-images/datadog-agent:testing-releasedsd-optimized-tag-filterlistregistry.datadoghq.com/agent-dev:ff848d05-py3-jmxdsd-additional-optimized-tag-filterlistregistry.datadoghq.com/agent-dev:62567276-py3-jmxdsd-tag-filterlist-context-cacheregistry.datadoghq.com/agent-dev:aef811fd-py3-jmxdsd-tag-filterlist-post-aggrregistry.datadoghq.com/agent-dev:ca27cfe6-py3-jmxChange Type
How did you test this PR?
CI
References