Skip to content

WIP ACM-35158 Improve readiness/liveness probe behaviour#6336

Open
KevinFCormier wants to merge 3 commits into
stolostron:mainfrom
KevinFCormier:ACM-35158-fix-console-mce-probe-failures-main
Open

WIP ACM-35158 Improve readiness/liveness probe behaviour#6336
KevinFCormier wants to merge 3 commits into
stolostron:mainfrom
KevinFCormier:ACM-35158-fix-console-mce-probe-failures-main

Conversation

@KevinFCormier

@KevinFCormier KevinFCormier commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

📝 Summary

Ticket Summary (Title):
console-mce pod CrashLoopBackOff due to probe failure with large resource sets

Ticket Link:
https://redhat.atlassian.net/browse/ACM-35158

Type of Change:

  • 🐞 Bug Fix
  • ✨ Feature
  • 🔧 Refactor
  • 💸 Tech Debt
  • 🧪 Test-related
  • 📄 Docs

✅ Checklist

General

  • PR title follows the convention (e.g. ACM-12340 Fix bug with...)
  • Code builds and runs locally without errors
  • No console logs, commented-out code, or unnecessary files
  • All commits are meaningful and well-labeled
  • All new display strings are externalized for localization (English only)
  • (Nice to have) JSDoc comments added for new functions and interfaces

If Bugfix

  • Root cause and fix summary are documented in the ticket (for future reference / errata)
  • Fix tested thoroughly and resolves the issue
  • Test(s) added to prevent regression

🗒️ Notes for Reviewers

Customer was observing CrashLoopBackoff on console-mce pods and had over 20,000 Group resources present. Initially processing these resources may have blocked us from responding to the readiness and liveness probes on time, causing Kubernetes to restart the pod.

This PR batches Promise.all calls so that we call setImmediate after each batch of 100 to yield the event queue so that new requests can be handled. This seems to smooth out memory usage of the pods. Before this change, I observed a spike at startup, then a retreat to stable size.

This PR also separates the liveness and readiness probes so that we can delay marking the pod as ready for traffic until after initial loading of the watched resources and aggregated application resources has completed.

Summary by CodeRabbit

Release Notes

  • Refactor

    • Implemented batch processing for large asynchronous operations to improve system performance and reduce resource consumption during data processing and synchronization.
    • Enhanced server startup coordination to ensure all components are properly initialized before the service becomes ready.
    • Improved server readiness detection with proper state tracking.
  • Tests

    • Added comprehensive test coverage for batch processing functionality and readiness detection mechanisms.

…oming network requests

Assisted-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>
Assisted-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>
Generated-by: Cursor (Claude Opus 4.6 High)
Signed-off-by: Kevin Cormier <kcormier@redhat.com>
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KevinFCormier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Introduces a batchPromiseAll utility that processes async work in fixed-size batches with event-loop yields. The readiness probe gains an explicit isReady flag set by a new setReady() export. startWatching and startAggregating are changed to return Promise<void>, resolved after their first passes complete; app.ts awaits both before calling setReady.

Changes

Coordinated startup readiness

Layer / File(s) Summary
batchPromiseAll utility and tests
backend/src/lib/batch-promise-all.ts, backend/test/lib/batch-promise-all.test.ts
New batchPromiseAll<T,R> function processes items in configurable batches with setImmediate yields between them. Tests cover ordering, empty input, batch sizing, yield behavior, default batch size of 100, and error propagation.
Readiness probe isReady gate
backend/src/routes/readiness.ts, backend/test/routes/readiness.test.ts
readiness handler now checks an internal isReady boolean (500 when false, 200 when true) instead of delegating to the liveness handler. New setReady() export flips the flag once and logs. Tests updated accordingly.
startWatching deferred signals and batching in events
backend/src/routes/events.ts
createDeferredSignal helper added for manual promise coordination. startWatching builds one deferred signal per watch definition, passes its resolve as onFirstListComplete to listAndWatch, and returns Promise.all of those signals. getKubeResources, resource caching, and deletion steps in listKubernetesObjects switch from unbounded Promise.all to batchPromiseAll.
startAggregating promise and onFirstPassComplete callback chain
backend/src/routes/aggregator.ts, backend/src/routes/aggregators/applications.ts
startAggregating returns Promise<void> resolved via a callback threaded into startAggregatingApplications and searchLoop. The callback fires once — either when the search API becomes available or after the first aggregation pass — then is cleared.
app.ts wiring and server-side-events batching
backend/src/app.ts, backend/src/lib/server-side-events.ts
start() awaits both startWatching() and startAggregating() before calling setReady(); calls setReady() immediately when events are disabled. handleRequest in server-side-events switches event inflation from Promise.all to batchPromiseAll.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description follows the template structure, clearly documents the bug fix with root cause analysis, and includes notes for reviewers explaining the batching strategy and probe separation rationale.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'WIP ACM-35158 Improve readiness/liveness probe behaviour' accurately reflects the main change: separating readiness and liveness probes with event queue batching to address pod restart issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
backend/src/routes/readiness.ts (1)

9-15: ⚡ Quick win

Align readiness handler signature with route contract

This route still uses : void. Update it to Promise<void> to match the backend route-handler convention.

Suggested fix
-export function readiness(req: Http2ServerRequest, res: Http2ServerResponse): void {
+export async function readiness(req: Http2ServerRequest, res: Http2ServerResponse): Promise<void> {
   if (!isReady) {
     respondInternalServerError(req, res)
+    return
   } else {
     respondOK(req, res)
   }
 }

As per coding guidelines, "Route handler signature: (req: Http2ServerRequest, res: Http2ServerResponse): Promise<void>".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/routes/readiness.ts` around lines 9 - 15, The readiness function
signature needs to be updated to match the backend route-handler convention.
Change the return type annotation of the readiness function from : void to :
Promise<void> to align with the standard route handler contract that expects
async handlers returning Promise<void>.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/src/app.ts`:
- Around line 125-129: The Promise chain created by `Promise.all([watchingReady,
aggregatingReady]).then(() => { setReady() })` lacks error handling, which means
if either `startWatching()` or `startAggregating()` rejects, the promise
rejection will be unhandled and the application readiness state will remain
unresolved. Add a `.catch` handler to the Promise chain that logs the startup
error with appropriate context and prevents the unhandled rejection from
occurring, ensuring that initialization failures are handled explicitly rather
than silently failing.

In `@backend/src/lib/batch-promise-all.ts`:
- Around line 11-15: The batchPromiseAll function can hang indefinitely if
batchSize is zero or negative because the loop condition i += batchSize will
never advance. Add validation at the start of the function to ensure batchSize
is a positive number greater than zero. If batchSize is invalid, throw an error
with a descriptive message indicating that batchSize must be a positive integer.
This validation should occur before the loop that processes items in the for
statement.

In `@backend/src/routes/aggregator.ts`:
- Around line 20-23: The `startAggregating()` function only handles the success
path by invoking the resolve callback, but if `startAggregatingApplications()`
rejects before calling the callback, the returned Promise will never settle. Add
error handling to the function to ensure the Promise also rejects when
`startAggregatingApplications()` throws an error or rejects. This can be done by
adding a catch handler or reject callback to ensure all settlement paths (both
success and failure) are properly handled.

In `@backend/src/routes/aggregators/applications.ts`:
- Around line 372-375: The onFirstPassComplete callback is being invoked
unconditionally, which can signal startup readiness before the first aggregation
pass completes, bypassing the readiness gate. Guard the callback invocation with
a check to ensure the Search API is available before calling
onFirstPassComplete. Only invoke the callback when the Search API is ready, not
when it is unavailable.

In `@backend/src/routes/events.ts`:
- Around line 348-360: The startWatching function waits indefinitely for all
deferred signals to resolve via Promise.all(listPromises), but the resolve
callback is only invoked when onFirstListComplete fires after a successful list
operation. If a watch definition's list never completes successfully and keeps
retrying, the deferred signal will never be resolved, causing the promise to
hang indefinitely and block readiness. Add a timeout mechanism or error handling
in the listAndWatch function to ensure the deferred signal (resolve callback) is
always invoked regardless of whether the list succeeds or fails, preventing
indefinite blocking.

In `@backend/test/lib/batch-promise-all.test.ts`:
- Around line 39-48: The test code patches global.setImmediate and only restores
it after the await and expect statement complete successfully. If either the
batchPromiseAll call or the expect assertion throws an error, the patched global
is never restored, causing test contamination. Wrap the await batchPromiseAll
call and the expect statement in a try block, and move the global.setImmediate
restoration to a finally block to ensure it always executes regardless of
whether the test passes or fails. Apply this pattern to both test cases at lines
39-48 and 55-64.
- Around line 67-82: The test for the default batch size of 100 is not
sufficiently tight to catch incorrect implementations. Currently it only checks
that batchCount equals 2, but this doesn't verify all items are actually
processed or that the final partial batch is handled correctly. Enhance the test
to assert that all 250 items are processed (track the total number of callback
invocations to verify it equals 250) and properly validate the batch boundaries
by checking that batches complete at the correct positions: the first complete
batch at item 100, the second at item 200, and verify the final batch contains
the remaining 50 items. This ensures the test meaningfully validates both the
default batch size and the actual batching boundary behavior.

In `@backend/test/routes/readiness.test.ts`:
- Around line 6-14: The two test cases for the readiness probe endpoint are
creating inter-test state coupling because setReady() mutates shared module
state, making the second test dependent on the first test running first. Combine
these two separate tests into a single test that verifies the transition
behavior: first assert that GET /readinessProbe returns status code 500 before
calling setReady(), then call setReady() and assert that the same endpoint
returns status code 200 after the state change. This eliminates the test order
dependency and ensures the readiness probe behavior is tested as a single
logical unit.

---

Nitpick comments:
In `@backend/src/routes/readiness.ts`:
- Around line 9-15: The readiness function signature needs to be updated to
match the backend route-handler convention. Change the return type annotation of
the readiness function from : void to : Promise<void> to align with the standard
route handler contract that expects async handlers returning Promise<void>.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1e25cf0e-4c77-4824-8f4f-ab2edf97211b

📥 Commits

Reviewing files that changed from the base of the PR and between dec9ba6 and 5fd588a.

📒 Files selected for processing (9)
  • backend/src/app.ts
  • backend/src/lib/batch-promise-all.ts
  • backend/src/lib/server-side-events.ts
  • backend/src/routes/aggregator.ts
  • backend/src/routes/aggregators/applications.ts
  • backend/src/routes/events.ts
  • backend/src/routes/readiness.ts
  • backend/test/lib/batch-promise-all.test.ts
  • backend/test/routes/readiness.test.ts

Comment thread backend/src/app.ts
Comment on lines +125 to +129
const watchingReady: Promise<void> = startWatching()
const aggregatingReady: Promise<void> = startAggregating()
void Promise.all([watchingReady, aggregatingReady]).then(() => {
setReady()
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle startup Promise rejections explicitly.

The Promise.all(...).then(setReady) chain has no .catch, so initialization failures can surface as unhandled rejections and leave readiness unresolved without clear startup logging.

Suggested fix
-    void Promise.all([watchingReady, aggregatingReady]).then(() => {
-      setReady()
-    })
+    void Promise.all([watchingReady, aggregatingReady])
+      .then(() => {
+        setReady()
+      })
+      .catch((err: unknown) => {
+        logger.error({
+          msg: 'startup readiness initialization failed',
+          error: err instanceof Error ? err.message : String(err),
+        })
+      })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const watchingReady: Promise<void> = startWatching()
const aggregatingReady: Promise<void> = startAggregating()
void Promise.all([watchingReady, aggregatingReady]).then(() => {
setReady()
})
const watchingReady: Promise<void> = startWatching()
const aggregatingReady: Promise<void> = startAggregating()
void Promise.all([watchingReady, aggregatingReady])
.then(() => {
setReady()
})
.catch((err: unknown) => {
logger.error({
msg: 'startup readiness initialization failed',
error: err instanceof Error ? err.message : String(err),
})
})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/app.ts` around lines 125 - 129, The Promise chain created by
`Promise.all([watchingReady, aggregatingReady]).then(() => { setReady() })`
lacks error handling, which means if either `startWatching()` or
`startAggregating()` rejects, the promise rejection will be unhandled and the
application readiness state will remain unresolved. Add a `.catch` handler to
the Promise chain that logs the startup error with appropriate context and
prevents the unhandled rejection from occurring, ensuring that initialization
failures are handled explicitly rather than silently failing.

Comment on lines +11 to +15
batchSize = 100
): Promise<R[]> {
const results: R[] = []
for (let i = 0; i < items.length; i += batchSize) {
const batch = await Promise.all(items.slice(i, i + batchSize).map(mapper))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate batchSize to prevent non-terminating loops.

i += batchSize never advances when batchSize <= 0, so this utility can hang startup flows instead of yielding.

Suggested fix
 export async function batchPromiseAll<T, R>(
   items: T[],
   mapper: (item: T) => Promise<R>,
   batchSize = 100
 ): Promise<R[]> {
+  if (!Number.isInteger(batchSize) || batchSize <= 0) {
+    throw new Error('batchSize must be a positive integer')
+  }
   const results: R[] = []
   for (let i = 0; i < items.length; i += batchSize) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/lib/batch-promise-all.ts` around lines 11 - 15, The
batchPromiseAll function can hang indefinitely if batchSize is zero or negative
because the loop condition i += batchSize will never advance. Add validation at
the start of the function to ensure batchSize is a positive number greater than
zero. If batchSize is invalid, throw an error with a descriptive message
indicating that batchSize must be a positive integer. This validation should
occur before the loop that processes items in the for statement.

Comment on lines +20 to +23
export function startAggregating(): Promise<void> {
return new Promise<void>((resolve) => {
void startAggregatingApplications(resolve)
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure startAggregating() settles on initialization failure.

If startAggregatingApplications() rejects before invoking the callback, this Promise never resolves/rejects, which can stall readiness coordination.

Suggested fix
 export function startAggregating(): Promise<void> {
-  return new Promise<void>((resolve) => {
-    void startAggregatingApplications(resolve)
+  return new Promise<void>((resolve, reject) => {
+    void startAggregatingApplications(resolve).catch(reject)
   })
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function startAggregating(): Promise<void> {
return new Promise<void>((resolve) => {
void startAggregatingApplications(resolve)
})
export function startAggregating(): Promise<void> {
return new Promise<void>((resolve, reject) => {
void startAggregatingApplications(resolve).catch(reject)
})
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/routes/aggregator.ts` around lines 20 - 23, The
`startAggregating()` function only handles the success path by invoking the
resolve callback, but if `startAggregatingApplications()` rejects before calling
the callback, the returned Promise will never settle. Add error handling to the
function to ensure the Promise also rejects when
`startAggregatingApplications()` throws an error or rejects. This can be done by
adding a catch handler or reject callback to ensure all settlement paths (both
success and failure) are properly handled.

Comment on lines +372 to +375
if (onFirstPassComplete) {
onFirstPassComplete()
onFirstPassComplete = undefined
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not signal first-pass completion while Search API is unavailable.

This callback path can resolve startup readiness before the first aggregation pass executes, which bypasses the readiness gate.

Suggested fix
         if (!searchAPIMissing) {
           logger.error('search API missing')
           searchAPIMissing = true
-          if (onFirstPassComplete) {
-            onFirstPassComplete()
-            onFirstPassComplete = undefined
-          }
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/routes/aggregators/applications.ts` around lines 372 - 375, The
onFirstPassComplete callback is being invoked unconditionally, which can signal
startup readiness before the first aggregation pass completes, bypassing the
readiness gate. Guard the callback invocation with a check to ensure the Search
API is available before calling onFirstPassComplete. Only invoke the callback
when the Search API is ready, not when it is unavailable.

Comment on lines +348 to +360
export function startWatching(): Promise<void> {
ServerSideEvents.eventFilter = eventFilter
startAccessCacheCleanup()

const listPromises: Promise<void>[] = []
for (const definition of definitions) {
void listAndWatch(definition)
const { promise, resolve } = createDeferredSignal()
listPromises.push(promise)
void listAndWatch(definition, resolve)
}
return Promise.all(listPromises).then(() => {
logger.info('all initial resource lists complete')
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid indefinite readiness blocking when a watch definition never completes its first successful list.

startWatching() waits on all deferred signals, but onFirstListComplete is only fired after a successful list. The retry paths can loop forever without signaling, leaving readiness permanently false.

Suggested direction
 export async function listAndWatch(options: IWatchOptions, onFirstListComplete?: () => void) {
   const serviceAccountToken = getServiceAccountToken()
   let firstListDone = false
+  const signalFirstListComplete = () => {
+    if (!firstListDone) {
+      firstListDone = true
+      onFirstListComplete?.()
+    }
+  }
   while (!stopping) {
     try {
       const { resourceVersion } = await listKubernetesObjects(serviceAccountToken, options)
-      if (!firstListDone) {
-        firstListDone = true
-        onFirstListComplete?.()
-      }
+      signalFirstListComplete()
       if (options.isPolled) {
         await pollKubernetesObjects(serviceAccountToken, options)
       } else {
         await watchKubernetesObjects(serviceAccountToken, options, resourceVersion)
       }
@@
           case 403:
+            signalFirstListComplete()
             logger.error({ msg: 'watch', ...options, status: 'Forbidden' })
@@
           case 404:
+            signalFirstListComplete()
             logger.trace({ msg: 'watch', ...options, status: 'Not found' })

Also applies to: 370-373

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/routes/events.ts` around lines 348 - 360, The startWatching
function waits indefinitely for all deferred signals to resolve via
Promise.all(listPromises), but the resolve callback is only invoked when
onFirstListComplete fires after a successful list operation. If a watch
definition's list never completes successfully and keeps retrying, the deferred
signal will never be resolved, causing the promise to hang indefinitely and
block readiness. Add a timeout mechanism or error handling in the listAndWatch
function to ensure the deferred signal (resolve callback) is always invoked
regardless of whether the list succeeds or fails, preventing indefinite
blocking.

Comment on lines +39 to +48
const originalSetImmediate = global.setImmediate
global.setImmediate = ((fn: () => void) => {
yielded = true
return originalSetImmediate(fn)
}) as typeof setImmediate

await batchPromiseAll(items, (n) => Promise.resolve(n), 2)

global.setImmediate = originalSetImmediate
expect(yielded).toBe(true)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Restore global.setImmediate in finally to avoid test contamination

Both tests patch a global and restore it only on the happy path. If the awaited call/assertion throws, the patched global leaks into later tests.

Suggested fix
   it('should yield the event loop between batches', async () => {
     let yielded = false
     const items = [1, 2, 3, 4]

     const originalSetImmediate = global.setImmediate
     global.setImmediate = ((fn: () => void) => {
       yielded = true
       return originalSetImmediate(fn)
     }) as typeof setImmediate

-    await batchPromiseAll(items, (n) => Promise.resolve(n), 2)
-
-    global.setImmediate = originalSetImmediate
+    try {
+      await batchPromiseAll(items, (n) => Promise.resolve(n), 2)
+    } finally {
+      global.setImmediate = originalSetImmediate
+    }
     expect(yielded).toBe(true)
   })

   it('should not yield after the last batch', async () => {
     let yieldCount = 0
     const items = [1, 2, 3]

     const originalSetImmediate = global.setImmediate
     global.setImmediate = ((fn: () => void) => {
       yieldCount++
       return originalSetImmediate(fn)
     }) as typeof setImmediate

-    await batchPromiseAll(items, (n) => Promise.resolve(n), 3)
-
-    global.setImmediate = originalSetImmediate
+    try {
+      await batchPromiseAll(items, (n) => Promise.resolve(n), 3)
+    } finally {
+      global.setImmediate = originalSetImmediate
+    }
     expect(yieldCount).toBe(0)
   })

As per coding guidelines, "Test files should meaningfully cover behavior, not just achieve coverage metrics; properly mock and isolate dependencies; async tests must handle promises correctly".

Also applies to: 55-64

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/test/lib/batch-promise-all.test.ts` around lines 39 - 48, The test
code patches global.setImmediate and only restores it after the await and expect
statement complete successfully. If either the batchPromiseAll call or the
expect assertion throws an error, the patched global is never restored, causing
test contamination. Wrap the await batchPromiseAll call and the expect statement
in a try block, and move the global.setImmediate restoration to a finally block
to ensure it always executes regardless of whether the test passes or fails.
Apply this pattern to both test cases at lines 39-48 and 55-64.

Source: Coding guidelines

Comment on lines +67 to +82
it('should use default batch size of 100', async () => {
const items = Array.from({ length: 250 }, (_, i) => i)
let batchCount = 0
let currentBatchSize = 0

await batchPromiseAll(items, (n) => {
currentBatchSize++
if (currentBatchSize === 100) {
batchCount++
currentBatchSize = 0
}
return Promise.resolve(n)
})

expect(batchCount).toBe(2)
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tighten the default batch-size test to assert actual boundary behavior

The current batchCount/currentBatchSize logic can still pass with some incorrect default sizes. Assert the observed number of inter-batch yields for 250 items (should be 2 when default is 100).

Suggested fix
   it('should use default batch size of 100', async () => {
-    const items = Array.from({ length: 250 }, (_, i) => i)
-    let batchCount = 0
-    let currentBatchSize = 0
-
-    await batchPromiseAll(items, (n) => {
-      currentBatchSize++
-      if (currentBatchSize === 100) {
-        batchCount++
-        currentBatchSize = 0
-      }
-      return Promise.resolve(n)
-    })
-
-    expect(batchCount).toBe(2)
+    const items = Array.from({ length: 250 }, (_, i) => i)
+    const setImmediateSpy = jest.spyOn(global, 'setImmediate')
+    try {
+      await batchPromiseAll(items, async (n) => n)
+      expect(setImmediateSpy).toHaveBeenCalledTimes(2)
+    } finally {
+      setImmediateSpy.mockRestore()
+    }
   })

As per coding guidelines, "Test files should meaningfully cover behavior, not just achieve coverage metrics".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/test/lib/batch-promise-all.test.ts` around lines 67 - 82, The test
for the default batch size of 100 is not sufficiently tight to catch incorrect
implementations. Currently it only checks that batchCount equals 2, but this
doesn't verify all items are actually processed or that the final partial batch
is handled correctly. Enhance the test to assert that all 250 items are
processed (track the total number of callback invocations to verify it equals
250) and properly validate the batch boundaries by checking that batches
complete at the correct positions: the first complete batch at item 100, the
second at item 200, and verify the final batch contains the remaining 50 items.
This ensures the test meaningfully validates both the default batch size and the
actual batching boundary behavior.

Source: Coding guidelines

Comment on lines +6 to 14
it(`GET /readinessProbe should return status code 500 if not ready`, async function () {
const res = await request('GET', '/readinessProbe')
expect(res.statusCode).toEqual(200)
expect(res.statusCode).toEqual(500)
})
it(`GET /readinessProbe should return status code 500 if dead`, async function () {
nock(process.env.CLUSTER_API_URL).get('/apis').reply(401)
await apiServerPing()
it(`GET /readinessProbe should return status code 200 after setReady`, async function () {
setReady()
const res = await request('GET', '/readinessProbe')
expect(res.statusCode).toEqual(500)
expect(res.statusCode).toEqual(200)
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid inter-test state coupling on module-level readiness

setReady() mutates shared module state, so splitting pre/post assertions into separate tests creates order dependence. Prefer a single transition test.

Suggested fix
 describe(`readiness Route`, function () {
-  it(`GET /readinessProbe should return status code 500 if not ready`, async function () {
-    const res = await request('GET', '/readinessProbe')
-    expect(res.statusCode).toEqual(500)
-  })
-  it(`GET /readinessProbe should return status code 200 after setReady`, async function () {
+  it(`GET /readinessProbe should return 500 before setReady and 200 after setReady`, async function () {
+    const before = await request('GET', '/readinessProbe')
+    expect(before.statusCode).toEqual(500)
+
     setReady()
-    const res = await request('GET', '/readinessProbe')
-    expect(res.statusCode).toEqual(200)
+    const after = await request('GET', '/readinessProbe')
+    expect(after.statusCode).toEqual(200)
   })
 })

As per coding guidelines, "Test files should meaningfully cover behavior, not just achieve coverage metrics; properly mock and isolate dependencies".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/test/routes/readiness.test.ts` around lines 6 - 14, The two test
cases for the readiness probe endpoint are creating inter-test state coupling
because setReady() mutates shared module state, making the second test dependent
on the first test running first. Combine these two separate tests into a single
test that verifies the transition behavior: first assert that GET
/readinessProbe returns status code 500 before calling setReady(), then call
setReady() and assert that the same endpoint returns status code 200 after the
state change. This eliminates the test order dependency and ensures the
readiness probe behavior is tested as a single logical unit.

Source: Coding guidelines

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
68.2% Coverage on New Code (required ≥ 70%)

See analysis details on SonarQube Cloud

@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

@KevinFCormier: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit-tests-sonarcloud 5fd588a link true /test unit-tests-sonarcloud

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@KevinFCormier KevinFCormier changed the title ACM-35158 Improve readiness/liveness probe behaviour WIP ACM-35158 Improve readiness/liveness probe behaviour Jun 15, 2026
@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant