Skip to content

Kong issue repro attempt#319

Draft
bwplotka wants to merge 1 commit into
release-2.53.5-gmpfrom
storedebug-pub
Draft

Kong issue repro attempt#319
bwplotka wants to merge 1 commit into
release-2.53.5-gmpfrom
storedebug-pub

Conversation

@bwplotka

@bwplotka bwplotka commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Add gmp-test-sa-key.json with monitoring write access - I used gpe-test-1.

cd google/internal/ingeste2etest 
GCM_SECRET=$(cat ./gmp-test-sa-key.json) make

@bwplotka bwplotka marked this pull request as draft June 25, 2026 19:05

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes Docker build caching in Dockerfile.google, adds a helper script bw.docker.sh for building and extracting binaries, and introduces end-to-end ingestion tests under google/internal/ingeste2etest. Key feedback includes suppressing stderr in the Makefile when reading the service account key, using a trap in bw.docker.sh to clean up temporary containers on failure, avoiding resetting scrapeCount to prevent non-monotonic metric values, and guarding interactive test steps so they do not block CI/CD pipelines.

Comment thread google/internal/ingeste2etest/Makefile Outdated
Comment thread bw.docker.sh Outdated
Comment on lines +276 to +278
mu.Lock()
scrapeCount = 0
mu.Unlock()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Resetting scrapeCount to 0 after starting Prometheus can cause non-monotonic metric values. Since Prometheus starts scraping immediately upon startup, it may have already performed one or more scrapes before this reset. Resetting the count to 0 will cause subsequent scrapes to return lower values, which Prometheus will interpret as a counter reset or a metric decrease, potentially leading to unexpected test behavior or query results. Consider letting the count increase monotonically or using a separate endpoint for readiness checks.

Suggested change
mu.Lock()
scrapeCount = 0
mu.Unlock()
\t// Do not reset scrapeCount to avoid non-monotonic metric values if Prometheus scraped during startup.

Comment on lines +280 to +281
testutil.Ok(t, e2einteractive.OpenInBrowser("http://"+prom.Endpoint("http")))
testutil.Ok(t, e2einteractive.RunUntilEndpointHit())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using e2einteractive.OpenInBrowser and e2einteractive.RunUntilEndpointHit will cause the test to block indefinitely or fail in non-interactive environments (such as CI/CD pipelines). It is recommended to guard these interactive steps with an environment variable or a command-line flag so they only run during local manual debugging.

\tif os.Getenv("INTERACTIVE") == "true" {\n\t\ttestutil.Ok(t, e2einteractive.OpenInBrowser("http://"+prom.Endpoint("http")))\n\t\ttestutil.Ok(t, e2einteractive.RunUntilEndpointHit())\n\t}

//fmt.Fprintf(&buf, "kong_kong_latency_ms_sum{case=\"healthy\"} %d\n", scrapeNum*310)
//fmt.Fprintf(&buf, "kong_kong_latency_ms_count{case=\"healthy\"} %d\n", scrapeNum*10)

// b. add_bucket is slowly increasing. On 3rd scrape new bucket arrives

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason those two histograms cause issue on one of those

20:03:15 gmp-prom-ingest: {"caller":"export.go:624","component":"gcm_exporter","err":"invalid bucket with negative count {__name__=\"kong_kong_latency_ms_count\", case=\"add_bucket\", instance=\"proxy-kong\", job=\"test\"}: count=30.000000, sum=930.000000, dev=4056.000000, index=1, bucketVal=5, bucketPrevVal=6","level":"debug","msg":"building sample failed","ts":"2026-06-25T19:03:15.914Z"}
20:03:21 gmp-prom-ingest: {"caller":"export.go:1037","component":"gcm_exporter","err":"rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: timeSeries[2,6] (example metric.type=\"prometheus.googleapis.com/kong_kong_latency_ms/histogram\", metric.labels={\"case\": \"counter-missing\", \"collector\": \"gmp-prom-ingest\"}, resource.type=\"prometheus_target\", resource.labels={\"cluster\": \"pe-github-action\", \"location\": \"europe-west3-a\", \"instance\": \"proxy-kong\", \"job\": \"test\", \"project_id\": \"gpe-test-1\", \"namespace\": \"\"}): write for resource failed: Points must be written in order. One or more of the points specified had an older start time than the most recent point.\nerror details: name = Unknown  desc = total_point_count:7  success_point_count:6  errors:{status:{code:3}  point_count:1}","level":"error","msg":"send batch","size":7,"ts":"2026-06-25T19:03:21.047Z"}
20:03:25 gmp-prom-ingest: {"caller":"export.go:1037","component":"gcm_exporter","err":"rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: timeSeries[2,6] (example metric.type=\"prometheus.googleapis.com/kong_kong_latency_ms/histogram\", metric.labels={\"case\": \"add_bucket\", \"collector\": \"gmp-prom-ingest\"}, resource.type=\"prometheus_target\", resource.labels={\"namespace\": \"\", \"project_id\": \"gpe-test-1\", \"cluster\": \"pe-github-action\", \"instance\": \"proxy-kong\", \"job\": \"test\", \"location\": \"europe-west3-a\"}): write for resource failed: Points must be written in order. One or more of the points specified had an older start time than the most recent point.\nerror details: name = Unknown  desc = total_point_count:7  success_point_count:6  errors:{status:{code:3}  point_count:1}","level":"error","msg":"send batch","size":7,"ts":"2026-06-25T19:03:26.015Z"}
^C20:03:26 Killing gmp-prom-ingest

Notably when I added more logging this does not repro OOO ST, so there's some race here.

Signed-off-by: bwplotka <bwplotka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant