Skip to content

Coordinate cumulative histogram reset timestamps across bucket series#318

Closed
dashpole wants to merge 3 commits into
GoogleCloudPlatform:release-2.53.5-gmpfrom
dashpole:fix-kong-histogram-timestamps
Closed

Coordinate cumulative histogram reset timestamps across bucket series#318
dashpole wants to merge 3 commits into
GoogleCloudPlatform:release-2.53.5-gmpfrom
dashpole:fix-kong-histogram-timestamps

Conversation

@dashpole

@dashpole dashpole commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Overview & Problem Statement

Enterprise Kong API gateway users on GKE reported frequent metric write rejections in Google Managed Prometheus (GMP) / Cloud Monitoring:
Points must be written in order. One or more of the points specified had an older start time than the most recent point.

Root cause analysis isolated these sparse failures specifically to cumulative histogram metrics (e.g. kong_upstream_latency_ms).

The Bug in Kong & Manifestation in Prometheus Scrapes

Analysis of Kong's shared memory dictionary (ngx.shared.dict) and coroutine yielding (kong/plugins/prometheus/prometheus.lua) revealed two scrape anomalies:

  1. Omitted Zero Buckets (_bucket): Kong omits zero-count buckets from memory to conserve storage. When lower latency observations occur on a subsequent scrape, new bucket boundary series (e.g. le="25") dynamically appear in the scrape output mid-stream.
  2. Mid-Scrape Yield Desynchronization: In metric_data(), keys are retrieved alphabetically (_bucket before _count before _sum) with coroutine.yield() called before each fetch. Incoming requests mid-scrape increment _count and _sum after _bucket has already been read.

GMP Exporter Failure Logic

In google/export:

  • When a newly appearing zero bucket arrives mid-stream on Scrape N (>1), getResetAdjusted treats the series reference as uninitialized (!hasReset) and marks dist.skip = true. The entire distribution sample is skipped on Scrape N, while _count and _sum advance their baseline tracking.
  • When worker jitter causes _count to decrease relative to the prior scrape (v < lastValue), getResetAdjusted resets resetTimestamp = t - 1. When _count subsequently recovers on Scrape N+1 while _sum lags, Monarch rejects the misaligned reset and start timestamps.

Solution

Patched transform.go and series_cache.go:

  1. Authoritative Reset Coordination: seriesCache tracks established cumulative histogram reset timestamps from _count series in histogramResets.
  2. Dynamic Bucket Normalization (getResetAdjustedBucket): When a bucket boundary (_bucket) arrives without prior tracking (!hasReset), if an authoritative reset timestamp was already established on an earlier scrape (rt < t), the bucket inherits the established reset timestamp and initializes baseline resetValue = 0.

Verifiable Investigation & Reproduction Artifacts

For reviewers interested in verifying the full Kong simulation, memory traces, and reproduction test harness, the complete experiment is documented and archived on branch kong-experiment-artifacts.


Created by Gemini

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for handling dynamically appearing histogram buckets (such as zero-count buckets omitted by Kong) by tracking authoritative reset timestamps in a new histogramResets map and utilizing a specialized getResetAdjustedBucket method. A comprehensive test suite has been added to verify these changes. Feedback on the changes highlights a potential memory leak, as the newly introduced histogramResets map is never cleared or garbage-collected, and suggests updating the clear and garbageCollect methods to clean up inactive hashes.

Comment thread google/export/series_cache.go
{Ref: 3, T: 1000000, V: 10},
{Ref: 4, T: 1000000, V: 500},
},
{ // Scrape 2 (T=1030s): Mid-scrape request increments _count=21 while buckets reflect 20

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked manually and this does not trigger ST ingestion problems in Monarch

{Ref: 4, T: 1060000, V: 1100},
},
},
wantSkipped: []bool{true, false, false},

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true that we skip the sample in this case, but it's probably for the better, it does not harm. The proposed fix might work though (adds sample and ensures the right adjustment), if we want the complexity of it.

Still not reproducing out of order ST (manually checked)

@dashpole dashpole closed this Jun 25, 2026
@dashpole

Copy link
Copy Markdown
Collaborator Author

(me actually writing)

I tried for a while longer to reproduce the actual issue. It seems like this CAN be the cause of gaps in histogram data, but that it can't be the source of start time ordering errors. Those seem like they could only be possible when scraping the same metrics from different ports or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants