Kong issue repro attempt by bwplotka · Pull Request #319 · GoogleCloudPlatform/prometheus

bwplotka · 2026-06-25T19:05:13Z

Add gmp-test-sa-key.json with monitoring write access - I used gpe-test-1.

cd google/internal/ingeste2etest 
GCM_SECRET=$(cat ./gmp-test-sa-key.json) make

gemini-code-assist

Code Review

This pull request optimizes Docker build caching in Dockerfile.google, adds a helper script bw.docker.sh for building and extracting binaries, and introduces end-to-end ingestion tests under google/internal/ingeste2etest. Key feedback includes suppressing stderr in the Makefile when reading the service account key, using a trap in bw.docker.sh to clean up temporary containers on failure, avoiding resetting scrapeCount to prevent non-monotonic metric values, and guarding interactive test steps so they do not block CI/CD pipelines.

gemini-code-assist · 2026-06-25T19:06:46Z

+	mu.Lock()
+	scrapeCount = 0
+	mu.Unlock()


Resetting scrapeCount to 0 after starting Prometheus can cause non-monotonic metric values. Since Prometheus starts scraping immediately upon startup, it may have already performed one or more scrapes before this reset. Resetting the count to 0 will cause subsequent scrapes to return lower values, which Prometheus will interpret as a counter reset or a metric decrease, potentially leading to unexpected test behavior or query results. Consider letting the count increase monotonically or using a separate endpoint for readiness checks.

Suggested change

mu.Lock()

scrapeCount = 0

mu.Unlock()

\t// Do not reset scrapeCount to avoid non-monotonic metric values if Prometheus scraped during startup.

gemini-code-assist · 2026-06-25T19:06:46Z

+	testutil.Ok(t, e2einteractive.OpenInBrowser("http://"+prom.Endpoint("http")))
+	testutil.Ok(t, e2einteractive.RunUntilEndpointHit())


Using e2einteractive.OpenInBrowser and e2einteractive.RunUntilEndpointHit will cause the test to block indefinitely or fail in non-interactive environments (such as CI/CD pipelines). It is recommended to guard these interactive steps with an environment variable or a command-line flag so they only run during local manual debugging.

\tif os.Getenv("INTERACTIVE") == "true" {\n\t\ttestutil.Ok(t, e2einteractive.OpenInBrowser("http://"+prom.Endpoint("http")))\n\t\ttestutil.Ok(t, e2einteractive.RunUntilEndpointHit())\n\t}

bwplotka · 2026-06-25T19:08:00Z

+	//fmt.Fprintf(&buf, "kong_kong_latency_ms_sum{case=\"healthy\"} %d\n", scrapeNum*310)
+	//fmt.Fprintf(&buf, "kong_kong_latency_ms_count{case=\"healthy\"} %d\n", scrapeNum*10)
+
+	// b. add_bucket is slowly increasing. On 3rd scrape new bucket arrives


For some reason those two histograms cause issue on one of those

20:03:15 gmp-prom-ingest: {"caller":"export.go:624","component":"gcm_exporter","err":"invalid bucket with negative count {__name__=\"kong_kong_latency_ms_count\", case=\"add_bucket\", instance=\"proxy-kong\", job=\"test\"}: count=30.000000, sum=930.000000, dev=4056.000000, index=1, bucketVal=5, bucketPrevVal=6","level":"debug","msg":"building sample failed","ts":"2026-06-25T19:03:15.914Z"} 20:03:21 gmp-prom-ingest: {"caller":"export.go:1037","component":"gcm_exporter","err":"rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: timeSeries[2,6] (example metric.type=\"prometheus.googleapis.com/kong_kong_latency_ms/histogram\", metric.labels={\"case\": \"counter-missing\", \"collector\": \"gmp-prom-ingest\"}, resource.type=\"prometheus_target\", resource.labels={\"cluster\": \"pe-github-action\", \"location\": \"europe-west3-a\", \"instance\": \"proxy-kong\", \"job\": \"test\", \"project_id\": \"gpe-test-1\", \"namespace\": \"\"}): write for resource failed: Points must be written in order. One or more of the points specified had an older start time than the most recent point.\nerror details: name = Unknown desc = total_point_count:7 success_point_count:6 errors:{status:{code:3} point_count:1}","level":"error","msg":"send batch","size":7,"ts":"2026-06-25T19:03:21.047Z"} 20:03:25 gmp-prom-ingest: {"caller":"export.go:1037","component":"gcm_exporter","err":"rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: timeSeries[2,6] (example metric.type=\"prometheus.googleapis.com/kong_kong_latency_ms/histogram\", metric.labels={\"case\": \"add_bucket\", \"collector\": \"gmp-prom-ingest\"}, resource.type=\"prometheus_target\", resource.labels={\"namespace\": \"\", \"project_id\": \"gpe-test-1\", \"cluster\": \"pe-github-action\", \"instance\": \"proxy-kong\", \"job\": \"test\", \"location\": \"europe-west3-a\"}): write for resource failed: Points must be written in order. One or more of the points specified had an older start time than the most recent point.\nerror details: name = Unknown desc = total_point_count:7 success_point_count:6 errors:{status:{code:3} point_count:1}","level":"error","msg":"send batch","size":7,"ts":"2026-06-25T19:03:26.015Z"} ^C20:03:26 Killing gmp-prom-ingest

Notably when I added more logging this does not repro OOO ST, so there's some race here.

Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka marked this pull request as draft June 25, 2026 19:05

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

bwplotka commented Jun 25, 2026

View reviewed changes

bwplotka force-pushed the storedebug-pub branch from 4e6d824 to c1bfcf7 Compare June 25, 2026 19:13

Repro attempt

d102f2c

Signed-off-by: bwplotka <bwplotka@gmail.com>

bwplotka force-pushed the storedebug-pub branch from c1bfcf7 to d102f2c Compare June 25, 2026 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kong issue repro attempt#319

Kong issue repro attempt#319
bwplotka wants to merge 1 commit into
release-2.53.5-gmpfrom
storedebug-pub

bwplotka commented Jun 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

bwplotka Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		testutil.Ok(t, e2einteractive.OpenInBrowser("http://"+prom.Endpoint("http")))
		testutil.Ok(t, e2einteractive.RunUntilEndpointHit())

Uh oh!

Conversation

bwplotka commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

bwplotka Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bwplotka commented Jun 25, 2026 •

edited

Loading