feat(otlp): added support for summary, histogram, and exponential histogram #1319
feat(otlp): added support for summary, histogram, and exponential histogram #1319
Conversation
…added helper functions
…roperly emitted for ingestion
…rding, align histogram/quantile defaults with Go reference
Binary Size Analysis (Agent Data Plane)Target: 1fec9ae (baseline) vs a51ff19 (comparison) diff
|
| Module | File Size | Symbols |
|---|---|---|
saluki_components::sources::otlp |
+26.61 KiB | 25 |
core |
+11.20 KiB | 34 |
saluki_context::resolver::ContextResolver |
+3.88 KiB | 1 |
anyhow |
+2.41 KiB | 9 |
ddsketch::canonical::store |
+1.11 KiB | 1 |
[sections] |
+1.09 KiB | 8 |
[Unmapped] |
+784 B | 1 |
exp2 |
+543 B | 1 |
ddsketch::agent::sketch |
+324 B | 3 |
alloc |
+262 B | 55 |
std |
+208 B | 16 |
floor |
+174 B | 1 |
__powidf2 |
+156 B | 1 |
serde_core |
+70 B | 1 |
aho_corasick |
-16 B | 1 |
mlkem512_dec |
-16 B | 1 |
[LOAD #3 [RW]] |
+8 B | 1 |
agent_data_plane::cli::run |
+8 B | 1 |
unicode_segmentation |
-8 B | 1 |
agent_data_plane::components::tag_filterlist |
-4 B | 1 |
Detailed Symbol Changes
FILE SIZE VM SIZE
-------------- --------------
[NEW] +24.2Ki [NEW] +24.1Ki saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::new::h0c0f217e5f83abe1
+1.1% +16.9Ki +1.1% +12.1Ki [224 Others]
[NEW] +16.3Ki [NEW] +16.2Ki saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::map_histogram_metrics::h516742a8e963419f
[NEW] +11.6Ki [NEW] +11.4Ki saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::map_to_dd_format::hde1e20c70359a11d
[NEW] +5.13Ki [NEW] +5.00Ki <hickory_proto::rr::record_data::RData as core::clone::Clone>::clone.11405
[NEW] +4.71Ki [NEW] +4.59Ki <rustls::error::Error as core::clone::Clone>::clone.11420
[NEW] +3.88Ki [NEW] +3.78Ki saluki_context::resolver::ContextResolver::resolve::h4e000b719a42af9d
[NEW] +3.83Ki [NEW] +3.71Ki <webpki::error::Error as core::fmt::Debug>::fmt.11598
[NEW] +3.57Ki [NEW] +3.45Ki <rustls::error::Error as core::fmt::Debug>::fmt.11421
[NEW] +3.41Ki [NEW] +3.32Ki core::slice::sort::stable::quicksort::quicksort::h0083a43a153462bc
[NEW] +2.32Ki [NEW] +2.23Ki core::slice::sort::unstable::quicksort::quicksort::hd2eaa63094f11709
[NEW] +1.98Ki [NEW] +1.90Ki core::slice::sort::stable::drift::sort::h18e45b0a82c1a3a0
[NEW] +1.79Ki [NEW] +1.65Ki <tonic::transport::channel::endpoint::Endpoint as core::clone::Clone>::clone.13182
[DEL] -1.79Ki [DEL] -1.65Ki <tonic::transport::channel::endpoint::Endpoint as core::clone::Clone>::clone.13180
[DEL] -3.57Ki [DEL] -3.45Ki <rustls::error::Error as core::fmt::Debug>::fmt.11419
[DEL] -3.83Ki [DEL] -3.71Ki <webpki::error::Error as core::fmt::Debug>::fmt.11596
-79.3% -3.83Ki -81.7% -3.83Ki saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::record_metric_event::h4c5cbdb89ee79abd
[DEL] -3.83Ki [DEL] -3.69Ki saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::map_to_dd_format::h504d386aa98922b6
[DEL] -4.71Ki [DEL] -4.59Ki <rustls::error::Error as core::clone::Clone>::clone.11418
[DEL] -5.13Ki [DEL] -5.00Ki <hickory_proto::rr::record_data::RData as core::clone::Clone>::clone.11403
[DEL] -24.2Ki [DEL] -24.1Ki saluki_components::sources::otlp::metrics::translator::OtlpMetricsTranslator::new::h9fff571cb4fb1d98
+0.2% +48.7Ki +0.2% +43.4Ki TOTAL
Regression Detector (Agent Data Plane)Regression Detector ResultsRun ID: 558082e4-30d7-4758-bb43-44bde4b9725c Baseline: 1fec9ae Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | +3.27 | [+2.93, +3.61] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +0.27 | [-4.72, +5.27] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.01 | [-0.11, +0.13] | 1 | (metrics) (profiles) (logs) |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | dsd_uds_512kb_3k_contexts_cpu | % cpu utilization | +7.14 | [-51.90, +66.17] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_cpu | % cpu utilization | +4.34 | [-51.96, +60.65] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_memory | memory utilization | +3.27 | [+2.93, +3.61] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_cpu | % cpu utilization | +3.26 | [+1.06, +5.46] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_cpu | % cpu utilization | +2.48 | [-28.94, +33.90] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_cpu | % cpu utilization | +2.01 | [-0.06, +4.08] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_cpu | % cpu utilization | +1.53 | [-5.58, +8.64] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_cpu | % cpu utilization | +0.54 | [-5.43, +6.51] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_memory | memory utilization | +0.51 | [+0.17, +0.85] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_throughput | ingress throughput | +0.32 | [+0.19, +0.45] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_memory | memory utilization | +0.30 | [+0.06, +0.55] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_cpu | % cpu utilization | +0.30 | [-1.10, +1.70] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_low | memory utilization | +0.27 | [+0.08, +0.46] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_cpu | % cpu utilization | +0.27 | [-4.72, +5.27] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_memory | memory utilization | +0.15 | [-0.02, +0.32] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_memory | memory utilization | +0.14 | [-0.11, +0.39] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_metrics_5mb_throughput | ingress throughput | +0.04 | [-0.09, +0.16] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_logs_5mb_throughput | ingress throughput | +0.01 | [-0.11, +0.13] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_medium | memory utilization | +0.01 | [-0.17, +0.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_throughput | ingress throughput | +0.01 | [-0.03, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_memory | memory utilization | +0.01 | [-0.18, +0.19] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_10mb_3k_contexts_throughput | ingress throughput | +0.01 | [-0.14, +0.15] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_throughput | ingress throughput | +0.00 | [-0.01, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_throughput | ingress throughput | +0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_throughput | ingress throughput | +0.00 | [-0.05, +0.05] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_5mb_throughput | ingress throughput | -0.00 | [-0.02, +0.02] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_1mb_3k_contexts_throughput | ingress throughput | -0.00 | [-0.06, +0.06] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_ultraheavy | memory utilization | -0.03 | [-0.15, +0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_512kb_3k_contexts_memory | memory utilization | -0.04 | [-0.21, +0.14] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_transform_5mb_memory | memory utilization | -0.05 | [-0.29, +0.20] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_idle | memory utilization | -0.06 | [-0.09, -0.03] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_100mb_3k_contexts_memory | memory utilization | -0.08 | [-0.26, +0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | quality_gates_rss_dsd_heavy | memory utilization | -0.23 | [-0.36, -0.10] | 1 | (metrics) (profiles) (logs) |
| ➖ | dsd_uds_500mb_3k_contexts_memory | memory utilization | -0.48 | [-0.65, -0.31] | 1 | (metrics) (profiles) (logs) |
| ➖ | otlp_ingest_traces_ottl_filtering_5mb_cpu | % cpu utilization | -0.93 | [-3.38, +1.51] | 1 | (metrics) (profiles) (logs) |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gates_rss_dsd_heavy | memory_usage | 10/10 | 115.55MiB ≤ 140MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_low | memory_usage | 10/10 | 34.46MiB ≤ 50MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_medium | memory_usage | 10/10 | 54.20MiB ≤ 75MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_dsd_ultraheavy | memory_usage | 10/10 | 167.95MiB ≤ 200MiB | (metrics) (profiles) (logs) |
| ✅ | quality_gates_rss_idle | memory_usage | 10/10 | 21.50MiB ≤ 40MiB | (metrics) (profiles) (logs) |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
There was a problem hiding this comment.
Pull request overview
This PR adds support for three previously unimplemented OpenTelemetry metric types: Summary metrics, Histogram metrics with bucket conversion to sketches, and Exponential Histogram metrics. The implementation includes new metric mapping functions, DDSketch conversion utilities, and metric metadata enhancements to support interval tracking for delta metrics. Comprehensive correctness testing was performed using the lading repository's test suite, which confirmed that the implementation produces metrics consistent with the baseline.
Changes:
- Added
map_summary_metrics()to handle OTLP Summary data points with optional quantile emission based on configuration - Added
map_exponential_histogram_metrics()to convert OTLP ExponentialHistogram to agent DDSketch format, supporting delta metrics only - Added helper functions for DDSketch manipulation:
exponential_histogram_to_ddsketch(),convert_ddsketch_into_sketch(),remap_bins_to_agent_space()to support exponential histogram conversion - Enhanced DDSketch library with setter methods (
set_count,set_sum,set_avg,set_min,set_max) and newLogarithmicMapping::new_with_gamma()constructor for explicit gamma configuration - Added
intervalfield toMetricMetadatato track the collection interval for delta metrics (used for consistency with Go agent'sConsumeSketchbehavior) - Added configuration options
quantiles,infer_delta_interval, and validation logic
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/saluki-core/src/data_model/event/metric/value/sketch.rs | Added From<(u64, DDSketch)> implementation for direct sketch-with-timestamp conversion |
| lib/saluki-core/src/data_model/event/metric/metadata.rs | Added interval field and with_interval() builder method for metric collection interval tracking |
| lib/saluki-components/src/sources/otlp/metrics/translator.rs | Implemented summary and exponential histogram metric mapping with bucket-to-sketch conversion utilities |
| lib/saluki-components/src/sources/otlp/metrics/config.rs | Added new configuration fields for quantiles and interval inference; added validation method |
| lib/ddsketch/src/canonical/mapping/logarithmic.rs | Added new_with_gamma() constructor for direct gamma-based mapping creation |
| lib/ddsketch/src/agent/sketch.rs | Added setter methods to override count, sum, avg, min, and max values |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -121,13 +135,15 @@ impl Default for OtlpMetricsTranslatorConfig { | |||
| fn default() -> Self { | |||
| Self { | |||
| hist_mode: HistogramMode::default(), | |||
There was a problem hiding this comment.
The default value of send_histogram_aggregations was changed from true to false in this PR. This is a significant behavioral change that affects histogram processing for all existing code using the default configuration. While the validation tests pass, this change should be explicitly justified and documented, as it alters the default metric output behavior. Consider adding a comment explaining why this default was changed.
| hist_mode: HistogramMode::default(), | |
| hist_mode: HistogramMode::default(), | |
| // Intentionally disabled by default because enabling histogram aggregations | |
| // changes the default metric output for callers relying on the translator's | |
| // default configuration. |
There was a problem hiding this comment.
By default, the Agent sets send_histogram_aggregations to false. I changed the Rust code to mimic that.
…t_float, added GenericError
…al field, added log for non-delta temporality, cleaner error reporting
|
As a reminder, we'll still need to get the |
Summary
This PR adds supports for the remaining open telemetry metrics (summary, histogram, and exponential histogram).
Change Type
How did you test this PR?
I created a draft PR in the lading repo that included the appropriate changes to emit the newly added metrics.
I had my
Cargo.tomlpoint to the branch and ranmake build-datadog-intake-image build-millstone-image build-datadog-agent-imageandmake test-correctnesswhich yielded the following results:References