Skip to content

ACM-35158 avoid blocking incoming requests [release-2.16]#6339

Open
KevinFCormier wants to merge 2 commits into
stolostron:release-2.16from
KevinFCormier:ACM-35158-avoid-blocking-incoming-requests
Open

ACM-35158 avoid blocking incoming requests [release-2.16]#6339
KevinFCormier wants to merge 2 commits into
stolostron:release-2.16from
KevinFCormier:ACM-35158-avoid-blocking-incoming-requests

Conversation

@KevinFCormier

@KevinFCormier KevinFCormier commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

📝 Summary

Ticket Summary (Title):
console-mce pod CrashLoopBackOff due to probe failure with large resource sets

Ticket Link:
https://redhat.atlassian.net/browse/ACM-35499

Type of Change:

  • 🐞 Bug Fix
  • ✨ Feature
  • 🔧 Refactor
  • 💸 Tech Debt
  • 🧪 Test-related
  • 📄 Docs

✅ Checklist

General

  • PR title follows the convention (e.g. ACM-12340 Fix bug with...)
  • Code builds and runs locally without errors
  • No console logs, commented-out code, or unnecessary files
  • All commits are meaningful and well-labeled
  • All new display strings are externalized for localization (English only)
  • (Nice to have) JSDoc comments added for new functions and interfaces

If Bugfix

  • Root cause and fix summary are documented in the ticket (for future reference / errata)
  • Fix tested thoroughly and resolves the issue
  • Test(s) added to prevent regression

🗒️ Notes for Reviewers

Customer was observing CrashLoopBackoff on console-mce pods and had over 20,000 Group resources present. Initially processing these resources may have blocked us from responding to the readiness and liveness probes on time, causing Kubernetes to restart the pod.

This PR batches Promise.all calls so that we call setImmediate after each batch of 100 to yield the event queue so that new requests can be handled. This seems to smooth out memory usage of the pods. Before this change, I observed a spike at startup, then a retreat to stable size.

Memory Usage

In the following videos, I deleted pods, then observed the memory usage of the replacements. You can see that before the patch, the pods sometimes have a memory spike during startup, and in this case we see one of the pods spike to almost 12 GB, before falling back to a more typical value. After the patch, there is no initial spike.

Before

Before.Patch.mov

After

After.Patch.mov

Liveness Probe Response Time

I also checked the response time, turning on garbage collection tracing and using a simple script to repeatedly check the liveness probe endpoint. I tested against cluster kevin-probe-test which has 50,000 Group resources on it.

Here is the process I followed:

  1. Run ./check-response-time.sh or ./check-response-time.sh -t 1.0 in one terminal. (The latter command shows only when response time is greater than 1.0 seconds.)
  2. In a second terminal, run npm run plugins.
  3. Drive load to the backend by loading https://localhost:9000/multicloud/credentials in several tabs. Repeatedly refresh the tabs to drive new full loads of the SSE stream.

Using a baseline test branch vs. a patched test branch, I found:

  • Without the patch, I can easily provoke response times of 1-2 seconds. I saw a particularly long pause of 53 s on one occasion.
  • With the patch, I used ./check-response-time.sh -t 0.1 and only saw response times up to just under 0.5 s.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: stolostron/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 22cb4710-14e7-4a7e-b81a-14f069d95529

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

…oming network requests

Assisted-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>
@KevinFCormier KevinFCormier changed the title WIP ACM-35158 avoid blocking incoming requests [release-2.16] ACM-35158 avoid blocking incoming requests [release-2.16] Jun 19, 2026
@KevinFCormier

Copy link
Copy Markdown
Contributor Author

/cc @fxiang1

@openshift-ci openshift-ci Bot requested a review from fxiang1 June 19, 2026 15:51
Generated-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>
@KevinFCormier KevinFCormier force-pushed the ACM-35158-avoid-blocking-incoming-requests branch from b975c63 to d1b5b32 Compare June 19, 2026 15:52
@KevinFCormier

Copy link
Copy Markdown
Contributor Author

/test unit-tests-sonarcloud

@fxiang1

fxiang1 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

/lgtm

I put this on hold, not sure if the 2.16 stream is open.

@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fxiang1, KevinFCormier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [KevinFCormier,fxiang1]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants