WIP ACM-35158 Improve readiness/liveness probe behaviour#6336
WIP ACM-35158 Improve readiness/liveness probe behaviour#6336KevinFCormier wants to merge 3 commits into
Conversation
…oming network requests Assisted-by: Cursor (Claude Opus 4.6 High) Signed-off-by: Kevin Cormier <kcormier@redhat.com>
Assisted-by: Cursor (Claude Opus 4.6 High) Signed-off-by: Kevin Cormier <kcormier@redhat.com>
Generated-by: Cursor (Claude Opus 4.6 High) Signed-off-by: Kevin Cormier <kcormier@redhat.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: KevinFCormier The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughIntroduces a ChangesCoordinated startup readiness
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
backend/src/routes/readiness.ts (1)
9-15: ⚡ Quick winAlign
readinesshandler signature with route contractThis route still uses
: void. Update it toPromise<void>to match the backend route-handler convention.Suggested fix
-export function readiness(req: Http2ServerRequest, res: Http2ServerResponse): void { +export async function readiness(req: Http2ServerRequest, res: Http2ServerResponse): Promise<void> { if (!isReady) { respondInternalServerError(req, res) + return } else { respondOK(req, res) } }As per coding guidelines, "Route handler signature:
(req: Http2ServerRequest, res: Http2ServerResponse): Promise<void>".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/src/routes/readiness.ts` around lines 9 - 15, The readiness function signature needs to be updated to match the backend route-handler convention. Change the return type annotation of the readiness function from : void to : Promise<void> to align with the standard route handler contract that expects async handlers returning Promise<void>.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/src/app.ts`:
- Around line 125-129: The Promise chain created by `Promise.all([watchingReady,
aggregatingReady]).then(() => { setReady() })` lacks error handling, which means
if either `startWatching()` or `startAggregating()` rejects, the promise
rejection will be unhandled and the application readiness state will remain
unresolved. Add a `.catch` handler to the Promise chain that logs the startup
error with appropriate context and prevents the unhandled rejection from
occurring, ensuring that initialization failures are handled explicitly rather
than silently failing.
In `@backend/src/lib/batch-promise-all.ts`:
- Around line 11-15: The batchPromiseAll function can hang indefinitely if
batchSize is zero or negative because the loop condition i += batchSize will
never advance. Add validation at the start of the function to ensure batchSize
is a positive number greater than zero. If batchSize is invalid, throw an error
with a descriptive message indicating that batchSize must be a positive integer.
This validation should occur before the loop that processes items in the for
statement.
In `@backend/src/routes/aggregator.ts`:
- Around line 20-23: The `startAggregating()` function only handles the success
path by invoking the resolve callback, but if `startAggregatingApplications()`
rejects before calling the callback, the returned Promise will never settle. Add
error handling to the function to ensure the Promise also rejects when
`startAggregatingApplications()` throws an error or rejects. This can be done by
adding a catch handler or reject callback to ensure all settlement paths (both
success and failure) are properly handled.
In `@backend/src/routes/aggregators/applications.ts`:
- Around line 372-375: The onFirstPassComplete callback is being invoked
unconditionally, which can signal startup readiness before the first aggregation
pass completes, bypassing the readiness gate. Guard the callback invocation with
a check to ensure the Search API is available before calling
onFirstPassComplete. Only invoke the callback when the Search API is ready, not
when it is unavailable.
In `@backend/src/routes/events.ts`:
- Around line 348-360: The startWatching function waits indefinitely for all
deferred signals to resolve via Promise.all(listPromises), but the resolve
callback is only invoked when onFirstListComplete fires after a successful list
operation. If a watch definition's list never completes successfully and keeps
retrying, the deferred signal will never be resolved, causing the promise to
hang indefinitely and block readiness. Add a timeout mechanism or error handling
in the listAndWatch function to ensure the deferred signal (resolve callback) is
always invoked regardless of whether the list succeeds or fails, preventing
indefinite blocking.
In `@backend/test/lib/batch-promise-all.test.ts`:
- Around line 39-48: The test code patches global.setImmediate and only restores
it after the await and expect statement complete successfully. If either the
batchPromiseAll call or the expect assertion throws an error, the patched global
is never restored, causing test contamination. Wrap the await batchPromiseAll
call and the expect statement in a try block, and move the global.setImmediate
restoration to a finally block to ensure it always executes regardless of
whether the test passes or fails. Apply this pattern to both test cases at lines
39-48 and 55-64.
- Around line 67-82: The test for the default batch size of 100 is not
sufficiently tight to catch incorrect implementations. Currently it only checks
that batchCount equals 2, but this doesn't verify all items are actually
processed or that the final partial batch is handled correctly. Enhance the test
to assert that all 250 items are processed (track the total number of callback
invocations to verify it equals 250) and properly validate the batch boundaries
by checking that batches complete at the correct positions: the first complete
batch at item 100, the second at item 200, and verify the final batch contains
the remaining 50 items. This ensures the test meaningfully validates both the
default batch size and the actual batching boundary behavior.
In `@backend/test/routes/readiness.test.ts`:
- Around line 6-14: The two test cases for the readiness probe endpoint are
creating inter-test state coupling because setReady() mutates shared module
state, making the second test dependent on the first test running first. Combine
these two separate tests into a single test that verifies the transition
behavior: first assert that GET /readinessProbe returns status code 500 before
calling setReady(), then call setReady() and assert that the same endpoint
returns status code 200 after the state change. This eliminates the test order
dependency and ensures the readiness probe behavior is tested as a single
logical unit.
---
Nitpick comments:
In `@backend/src/routes/readiness.ts`:
- Around line 9-15: The readiness function signature needs to be updated to
match the backend route-handler convention. Change the return type annotation of
the readiness function from : void to : Promise<void> to align with the standard
route handler contract that expects async handlers returning Promise<void>.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1e25cf0e-4c77-4824-8f4f-ab2edf97211b
📒 Files selected for processing (9)
backend/src/app.tsbackend/src/lib/batch-promise-all.tsbackend/src/lib/server-side-events.tsbackend/src/routes/aggregator.tsbackend/src/routes/aggregators/applications.tsbackend/src/routes/events.tsbackend/src/routes/readiness.tsbackend/test/lib/batch-promise-all.test.tsbackend/test/routes/readiness.test.ts
| const watchingReady: Promise<void> = startWatching() | ||
| const aggregatingReady: Promise<void> = startAggregating() | ||
| void Promise.all([watchingReady, aggregatingReady]).then(() => { | ||
| setReady() | ||
| }) |
There was a problem hiding this comment.
Handle startup Promise rejections explicitly.
The Promise.all(...).then(setReady) chain has no .catch, so initialization failures can surface as unhandled rejections and leave readiness unresolved without clear startup logging.
Suggested fix
- void Promise.all([watchingReady, aggregatingReady]).then(() => {
- setReady()
- })
+ void Promise.all([watchingReady, aggregatingReady])
+ .then(() => {
+ setReady()
+ })
+ .catch((err: unknown) => {
+ logger.error({
+ msg: 'startup readiness initialization failed',
+ error: err instanceof Error ? err.message : String(err),
+ })
+ })📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const watchingReady: Promise<void> = startWatching() | |
| const aggregatingReady: Promise<void> = startAggregating() | |
| void Promise.all([watchingReady, aggregatingReady]).then(() => { | |
| setReady() | |
| }) | |
| const watchingReady: Promise<void> = startWatching() | |
| const aggregatingReady: Promise<void> = startAggregating() | |
| void Promise.all([watchingReady, aggregatingReady]) | |
| .then(() => { | |
| setReady() | |
| }) | |
| .catch((err: unknown) => { | |
| logger.error({ | |
| msg: 'startup readiness initialization failed', | |
| error: err instanceof Error ? err.message : String(err), | |
| }) | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/src/app.ts` around lines 125 - 129, The Promise chain created by
`Promise.all([watchingReady, aggregatingReady]).then(() => { setReady() })`
lacks error handling, which means if either `startWatching()` or
`startAggregating()` rejects, the promise rejection will be unhandled and the
application readiness state will remain unresolved. Add a `.catch` handler to
the Promise chain that logs the startup error with appropriate context and
prevents the unhandled rejection from occurring, ensuring that initialization
failures are handled explicitly rather than silently failing.
| batchSize = 100 | ||
| ): Promise<R[]> { | ||
| const results: R[] = [] | ||
| for (let i = 0; i < items.length; i += batchSize) { | ||
| const batch = await Promise.all(items.slice(i, i + batchSize).map(mapper)) |
There was a problem hiding this comment.
Validate batchSize to prevent non-terminating loops.
i += batchSize never advances when batchSize <= 0, so this utility can hang startup flows instead of yielding.
Suggested fix
export async function batchPromiseAll<T, R>(
items: T[],
mapper: (item: T) => Promise<R>,
batchSize = 100
): Promise<R[]> {
+ if (!Number.isInteger(batchSize) || batchSize <= 0) {
+ throw new Error('batchSize must be a positive integer')
+ }
const results: R[] = []
for (let i = 0; i < items.length; i += batchSize) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/src/lib/batch-promise-all.ts` around lines 11 - 15, The
batchPromiseAll function can hang indefinitely if batchSize is zero or negative
because the loop condition i += batchSize will never advance. Add validation at
the start of the function to ensure batchSize is a positive number greater than
zero. If batchSize is invalid, throw an error with a descriptive message
indicating that batchSize must be a positive integer. This validation should
occur before the loop that processes items in the for statement.
| export function startAggregating(): Promise<void> { | ||
| return new Promise<void>((resolve) => { | ||
| void startAggregatingApplications(resolve) | ||
| }) |
There was a problem hiding this comment.
Ensure startAggregating() settles on initialization failure.
If startAggregatingApplications() rejects before invoking the callback, this Promise never resolves/rejects, which can stall readiness coordination.
Suggested fix
export function startAggregating(): Promise<void> {
- return new Promise<void>((resolve) => {
- void startAggregatingApplications(resolve)
+ return new Promise<void>((resolve, reject) => {
+ void startAggregatingApplications(resolve).catch(reject)
})
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function startAggregating(): Promise<void> { | |
| return new Promise<void>((resolve) => { | |
| void startAggregatingApplications(resolve) | |
| }) | |
| export function startAggregating(): Promise<void> { | |
| return new Promise<void>((resolve, reject) => { | |
| void startAggregatingApplications(resolve).catch(reject) | |
| }) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/src/routes/aggregator.ts` around lines 20 - 23, The
`startAggregating()` function only handles the success path by invoking the
resolve callback, but if `startAggregatingApplications()` rejects before calling
the callback, the returned Promise will never settle. Add error handling to the
function to ensure the Promise also rejects when
`startAggregatingApplications()` throws an error or rejects. This can be done by
adding a catch handler or reject callback to ensure all settlement paths (both
success and failure) are properly handled.
| if (onFirstPassComplete) { | ||
| onFirstPassComplete() | ||
| onFirstPassComplete = undefined | ||
| } |
There was a problem hiding this comment.
Do not signal first-pass completion while Search API is unavailable.
This callback path can resolve startup readiness before the first aggregation pass executes, which bypasses the readiness gate.
Suggested fix
if (!searchAPIMissing) {
logger.error('search API missing')
searchAPIMissing = true
- if (onFirstPassComplete) {
- onFirstPassComplete()
- onFirstPassComplete = undefined
- }
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/src/routes/aggregators/applications.ts` around lines 372 - 375, The
onFirstPassComplete callback is being invoked unconditionally, which can signal
startup readiness before the first aggregation pass completes, bypassing the
readiness gate. Guard the callback invocation with a check to ensure the Search
API is available before calling onFirstPassComplete. Only invoke the callback
when the Search API is ready, not when it is unavailable.
| export function startWatching(): Promise<void> { | ||
| ServerSideEvents.eventFilter = eventFilter | ||
| startAccessCacheCleanup() | ||
|
|
||
| const listPromises: Promise<void>[] = [] | ||
| for (const definition of definitions) { | ||
| void listAndWatch(definition) | ||
| const { promise, resolve } = createDeferredSignal() | ||
| listPromises.push(promise) | ||
| void listAndWatch(definition, resolve) | ||
| } | ||
| return Promise.all(listPromises).then(() => { | ||
| logger.info('all initial resource lists complete') | ||
| }) |
There was a problem hiding this comment.
Avoid indefinite readiness blocking when a watch definition never completes its first successful list.
startWatching() waits on all deferred signals, but onFirstListComplete is only fired after a successful list. The retry paths can loop forever without signaling, leaving readiness permanently false.
Suggested direction
export async function listAndWatch(options: IWatchOptions, onFirstListComplete?: () => void) {
const serviceAccountToken = getServiceAccountToken()
let firstListDone = false
+ const signalFirstListComplete = () => {
+ if (!firstListDone) {
+ firstListDone = true
+ onFirstListComplete?.()
+ }
+ }
while (!stopping) {
try {
const { resourceVersion } = await listKubernetesObjects(serviceAccountToken, options)
- if (!firstListDone) {
- firstListDone = true
- onFirstListComplete?.()
- }
+ signalFirstListComplete()
if (options.isPolled) {
await pollKubernetesObjects(serviceAccountToken, options)
} else {
await watchKubernetesObjects(serviceAccountToken, options, resourceVersion)
}
@@
case 403:
+ signalFirstListComplete()
logger.error({ msg: 'watch', ...options, status: 'Forbidden' })
@@
case 404:
+ signalFirstListComplete()
logger.trace({ msg: 'watch', ...options, status: 'Not found' })Also applies to: 370-373
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/src/routes/events.ts` around lines 348 - 360, The startWatching
function waits indefinitely for all deferred signals to resolve via
Promise.all(listPromises), but the resolve callback is only invoked when
onFirstListComplete fires after a successful list operation. If a watch
definition's list never completes successfully and keeps retrying, the deferred
signal will never be resolved, causing the promise to hang indefinitely and
block readiness. Add a timeout mechanism or error handling in the listAndWatch
function to ensure the deferred signal (resolve callback) is always invoked
regardless of whether the list succeeds or fails, preventing indefinite
blocking.
| const originalSetImmediate = global.setImmediate | ||
| global.setImmediate = ((fn: () => void) => { | ||
| yielded = true | ||
| return originalSetImmediate(fn) | ||
| }) as typeof setImmediate | ||
|
|
||
| await batchPromiseAll(items, (n) => Promise.resolve(n), 2) | ||
|
|
||
| global.setImmediate = originalSetImmediate | ||
| expect(yielded).toBe(true) |
There was a problem hiding this comment.
Restore global.setImmediate in finally to avoid test contamination
Both tests patch a global and restore it only on the happy path. If the awaited call/assertion throws, the patched global leaks into later tests.
Suggested fix
it('should yield the event loop between batches', async () => {
let yielded = false
const items = [1, 2, 3, 4]
const originalSetImmediate = global.setImmediate
global.setImmediate = ((fn: () => void) => {
yielded = true
return originalSetImmediate(fn)
}) as typeof setImmediate
- await batchPromiseAll(items, (n) => Promise.resolve(n), 2)
-
- global.setImmediate = originalSetImmediate
+ try {
+ await batchPromiseAll(items, (n) => Promise.resolve(n), 2)
+ } finally {
+ global.setImmediate = originalSetImmediate
+ }
expect(yielded).toBe(true)
})
it('should not yield after the last batch', async () => {
let yieldCount = 0
const items = [1, 2, 3]
const originalSetImmediate = global.setImmediate
global.setImmediate = ((fn: () => void) => {
yieldCount++
return originalSetImmediate(fn)
}) as typeof setImmediate
- await batchPromiseAll(items, (n) => Promise.resolve(n), 3)
-
- global.setImmediate = originalSetImmediate
+ try {
+ await batchPromiseAll(items, (n) => Promise.resolve(n), 3)
+ } finally {
+ global.setImmediate = originalSetImmediate
+ }
expect(yieldCount).toBe(0)
})As per coding guidelines, "Test files should meaningfully cover behavior, not just achieve coverage metrics; properly mock and isolate dependencies; async tests must handle promises correctly".
Also applies to: 55-64
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/test/lib/batch-promise-all.test.ts` around lines 39 - 48, The test
code patches global.setImmediate and only restores it after the await and expect
statement complete successfully. If either the batchPromiseAll call or the
expect assertion throws an error, the patched global is never restored, causing
test contamination. Wrap the await batchPromiseAll call and the expect statement
in a try block, and move the global.setImmediate restoration to a finally block
to ensure it always executes regardless of whether the test passes or fails.
Apply this pattern to both test cases at lines 39-48 and 55-64.
Source: Coding guidelines
| it('should use default batch size of 100', async () => { | ||
| const items = Array.from({ length: 250 }, (_, i) => i) | ||
| let batchCount = 0 | ||
| let currentBatchSize = 0 | ||
|
|
||
| await batchPromiseAll(items, (n) => { | ||
| currentBatchSize++ | ||
| if (currentBatchSize === 100) { | ||
| batchCount++ | ||
| currentBatchSize = 0 | ||
| } | ||
| return Promise.resolve(n) | ||
| }) | ||
|
|
||
| expect(batchCount).toBe(2) | ||
| }) |
There was a problem hiding this comment.
Tighten the default batch-size test to assert actual boundary behavior
The current batchCount/currentBatchSize logic can still pass with some incorrect default sizes. Assert the observed number of inter-batch yields for 250 items (should be 2 when default is 100).
Suggested fix
it('should use default batch size of 100', async () => {
- const items = Array.from({ length: 250 }, (_, i) => i)
- let batchCount = 0
- let currentBatchSize = 0
-
- await batchPromiseAll(items, (n) => {
- currentBatchSize++
- if (currentBatchSize === 100) {
- batchCount++
- currentBatchSize = 0
- }
- return Promise.resolve(n)
- })
-
- expect(batchCount).toBe(2)
+ const items = Array.from({ length: 250 }, (_, i) => i)
+ const setImmediateSpy = jest.spyOn(global, 'setImmediate')
+ try {
+ await batchPromiseAll(items, async (n) => n)
+ expect(setImmediateSpy).toHaveBeenCalledTimes(2)
+ } finally {
+ setImmediateSpy.mockRestore()
+ }
})As per coding guidelines, "Test files should meaningfully cover behavior, not just achieve coverage metrics".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/test/lib/batch-promise-all.test.ts` around lines 67 - 82, The test
for the default batch size of 100 is not sufficiently tight to catch incorrect
implementations. Currently it only checks that batchCount equals 2, but this
doesn't verify all items are actually processed or that the final partial batch
is handled correctly. Enhance the test to assert that all 250 items are
processed (track the total number of callback invocations to verify it equals
250) and properly validate the batch boundaries by checking that batches
complete at the correct positions: the first complete batch at item 100, the
second at item 200, and verify the final batch contains the remaining 50 items.
This ensures the test meaningfully validates both the default batch size and the
actual batching boundary behavior.
Source: Coding guidelines
| it(`GET /readinessProbe should return status code 500 if not ready`, async function () { | ||
| const res = await request('GET', '/readinessProbe') | ||
| expect(res.statusCode).toEqual(200) | ||
| expect(res.statusCode).toEqual(500) | ||
| }) | ||
| it(`GET /readinessProbe should return status code 500 if dead`, async function () { | ||
| nock(process.env.CLUSTER_API_URL).get('/apis').reply(401) | ||
| await apiServerPing() | ||
| it(`GET /readinessProbe should return status code 200 after setReady`, async function () { | ||
| setReady() | ||
| const res = await request('GET', '/readinessProbe') | ||
| expect(res.statusCode).toEqual(500) | ||
| expect(res.statusCode).toEqual(200) | ||
| }) |
There was a problem hiding this comment.
Avoid inter-test state coupling on module-level readiness
setReady() mutates shared module state, so splitting pre/post assertions into separate tests creates order dependence. Prefer a single transition test.
Suggested fix
describe(`readiness Route`, function () {
- it(`GET /readinessProbe should return status code 500 if not ready`, async function () {
- const res = await request('GET', '/readinessProbe')
- expect(res.statusCode).toEqual(500)
- })
- it(`GET /readinessProbe should return status code 200 after setReady`, async function () {
+ it(`GET /readinessProbe should return 500 before setReady and 200 after setReady`, async function () {
+ const before = await request('GET', '/readinessProbe')
+ expect(before.statusCode).toEqual(500)
+
setReady()
- const res = await request('GET', '/readinessProbe')
- expect(res.statusCode).toEqual(200)
+ const after = await request('GET', '/readinessProbe')
+ expect(after.statusCode).toEqual(200)
})
})As per coding guidelines, "Test files should meaningfully cover behavior, not just achieve coverage metrics; properly mock and isolate dependencies".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/test/routes/readiness.test.ts` around lines 6 - 14, The two test
cases for the readiness probe endpoint are creating inter-test state coupling
because setReady() mutates shared module state, making the second test dependent
on the first test running first. Combine these two separate tests into a single
test that verifies the transition behavior: first assert that GET
/readinessProbe returns status code 500 before calling setReady(), then call
setReady() and assert that the same endpoint returns status code 200 after the
state change. This eliminates the test order dependency and ensures the
readiness probe behavior is tested as a single logical unit.
Source: Coding guidelines
|
|
@KevinFCormier: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |


📝 Summary
Ticket Summary (Title):
console-mce pod CrashLoopBackOff due to probe failure with large resource sets
Ticket Link:
https://redhat.atlassian.net/browse/ACM-35158
Type of Change:
✅ Checklist
General
ACM-12340 Fix bug with...)If Bugfix
🗒️ Notes for Reviewers
Customer was observing CrashLoopBackoff on console-mce pods and had over 20,000 Group resources present. Initially processing these resources may have blocked us from responding to the readiness and liveness probes on time, causing Kubernetes to restart the pod.
This PR batches Promise.all calls so that we call setImmediate after each batch of 100 to yield the event queue so that new requests can be handled. This seems to smooth out memory usage of the pods. Before this change, I observed a spike at startup, then a retreat to stable size.
This PR also separates the liveness and readiness probes so that we can delay marking the pod as ready for traffic until after initial loading of the watched resources and aggregated application resources has completed.
Summary by CodeRabbit
Release Notes
Refactor
Tests