Skip to content

WIP ACM-35158 Improve readiness/liveness probe behaviour [release-2.16]#6333

Open
KevinFCormier wants to merge 6 commits into
stolostron:release-2.16from
KevinFCormier:ACM-35158-fix-console-mce-probe-failures
Open

WIP ACM-35158 Improve readiness/liveness probe behaviour [release-2.16]#6333
KevinFCormier wants to merge 6 commits into
stolostron:release-2.16from
KevinFCormier:ACM-35158-fix-console-mce-probe-failures

Conversation

@KevinFCormier

@KevinFCormier KevinFCormier commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

📝 Summary

Ticket Summary (Title):
console-mce pod CrashLoopBackOff due to probe failure with large resource sets

Ticket Link:
https://redhat.atlassian.net/browse/ACM-35158

Type of Change:

  • 🐞 Bug Fix
  • ✨ Feature
  • 🔧 Refactor
  • 💸 Tech Debt
  • 🧪 Test-related
  • 📄 Docs

✅ Checklist

General

  • PR title follows the convention (e.g. ACM-12340 Fix bug with...)
  • Code builds and runs locally without errors
  • No console logs, commented-out code, or unnecessary files
  • All commits are meaningful and well-labeled
  • All new display strings are externalized for localization (English only)
  • (Nice to have) JSDoc comments added for new functions and interfaces

If Bugfix

  • Root cause and fix summary are documented in the ticket (for future reference / errata)
  • Fix tested thoroughly and resolves the issue
  • Test(s) added to prevent regression

🗒️ Notes for Reviewers

Customer was observing CrashLoopBackoff on console-mce pods and had over 20,000 Group resources present. Initially processing these resources may have blocked us from responding to the readiness and liveness probes on time, causing Kubernetes to restart the pod.

This PR batches Promise.all calls so that we call setImmediate after each batch of 100 to yield the event queue so that new requests can be handled. This seems to smooth out memory usage of the pods. Before this change, I observed a spike at startup, then a retreat to stable size.

This PR also separates the liveness and readiness probes so that we can delay marking the pod as ready for traffic until after initial loading of the watched resources and aggregated application resources has completed.

…orm (stolostron#6314)

* Add Azure and generic hcp destroy commands

Generated-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>

* Add unit testing

Generated-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>

* Update AWS command to clarify creds are from a file

Signed-off-by: Kevin Cormier <kcormier@redhat.com>

---------

Signed-off-by: Kevin Cormier <kcormier@redhat.com>
Co-authored-by: Kevin Cormier <kcormier@redhat.com>
…) grows unbounded causing memory leak (stolostron#6319)

* Add cleanupAccessCache function

Signed-off-by: Oksana Bazylieva <obazylie@redhat.com>

* Add tests

Signed-off-by: Oksana Bazylieva <obazylie@redhat.com>

* coderabbitai fix

Signed-off-by: Oksana Bazylieva <obazylie@redhat.com>

---------

Signed-off-by: Oksana Bazylieva <obazylie@redhat.com>
Co-authored-by: Oksana Bazylieva <obazylie@redhat.com>
Signed-off-by: John Swanke <jswanke@redhat.com>
…oming network requests

Assisted-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: stolostron/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 70188913-e4a8-49c5-8b7e-a9c69926e0f7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KevinFCormier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@KevinFCormier KevinFCormier changed the title WIP ACM-35158 Break up large loops to avoid saturating event queue WIP ACM-35158 Improve readiness/liveness probe behaviour Jun 11, 2026
@KevinFCormier KevinFCormier changed the title WIP ACM-35158 Improve readiness/liveness probe behaviour WIP ACM-35158 Improve readiness/liveness probe behaviour [release-2.16] Jun 11, 2026
@KevinFCormier KevinFCormier force-pushed the ACM-35158-fix-console-mce-probe-failures branch from edfadbd to 3a2908f Compare June 11, 2026 19:41
Assisted-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>
Generated-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>
@KevinFCormier KevinFCormier force-pushed the ACM-35158-fix-console-mce-probe-failures branch from 3a2908f to 061b0d4 Compare June 11, 2026 20:42
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
65.7% Coverage on New Code (required ≥ 70%)

See analysis details on SonarQube Cloud

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown

@KevinFCormier: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit-tests-sonarcloud 061b0d4 link true /test unit-tests-sonarcloud

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants