NO-JIRA: Update e2e diagnostics#488
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReworks e2e diagnostics to collect context-bound diagnostic strings (returned by helpers instead of writing to GinkgoWriter), store per-spec diagnostics, thread a 5-minute timeout context through diagnostic collection calls, remove per-helper panic recover wrappers, inject stored diagnostics into JUnit failure messages, and add unit tests and test-invocation tweaks. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@theobarberbany: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e/e2e_common.go`:
- Around line 71-73: The comment for specDiagnostics is inaccurate: it says the
map is keyed by “spec text” but the code keys entries by
LeafNodeLocation.String(); update the comment for specDiagnostics to state that
entries are keyed by the spec's LeafNodeLocation.String() (or simply “leaf node
location string”) and mention that ReportAfterEach populates it and
ReportAfterSuite reads it, so future maintainers won't assume spec text is the
key.
In `@e2e/e2e_test.go`:
- Around line 39-40: Replace the weak map key using only
report.LeafNodeLocation.String() when populating specDiagnostics (where diag :=
collectTrackedResourceDiagnostics() and
specDiagnostics[report.LeafNodeLocation.String()] = diag) with a composite key
that includes the leaf location plus the spec's unique text (e.g., combine
report.LeafNodeLocation.String() and report.FullText() or equivalent) so each
spec instance is unique; apply the same change to the other occurrence around
the specDiagnostics assignment at the later location (the lines referenced as
68-69) to prevent overwriting diagnostics for table-driven or helper-generated
specs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 33c5a4e8-e29b-454b-af86-9e5a52c242eb
📒 Files selected for processing (2)
e2e/e2e_common.goe2e/e2e_test.go
07fcbc1 to
4333b8e
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
e2e/e2e_test.go (1)
54-66: Diagnostics stored even when collection panics twice.On line 61,
diagis stored regardless ofok. If bothJustAfterEachandReportAfterEachpanic during collection, this stores an empty string. This appears intentional to prevent infinite retry attempts, but the JUnit failure message will have an empty diagnostics section. Consider whether a placeholder like"(diagnostics unavailable due to collection failure)"would be more informative.💡 Optional: Add placeholder for failed diagnostics
if _, exists := specDiagnostics[specKey(report)]; !exists { diag, ok := collectTrackedResourceDiagnostics() if !ok { fmt.Fprintf(os.Stderr, "WARNING: diagnostics collection panicked in both JustAfterEach and ReportAfterEach for %s\n", report.FullText()) + diag = "(diagnostics unavailable due to collection failure)" } specDiagnostics[specKey(report)] = diag }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e/e2e_test.go` around lines 54 - 66, The ReportAfterEach block stores whatever diag is returned from collectTrackedResourceDiagnostics into specDiagnostics[specKey(report)] even when collectTrackedResourceDiagnostics returns ok==false (i.e., collection panicked twice), leaving an empty diagnostics string; change the logic in the ReportAfterEach closure (the function using ReportAfterEach, specDiagnostics, specKey and collectTrackedResourceDiagnostics) to only store diag when ok==true and otherwise store a clear placeholder like "(diagnostics unavailable due to collection failure)" so JUnit output is informative; ensure this replacement happens before writing to specDiagnostics[specKey(report)] and preserve existing behavior that prevents repeated retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@e2e/e2e_test.go`:
- Around line 54-66: The ReportAfterEach block stores whatever diag is returned
from collectTrackedResourceDiagnostics into specDiagnostics[specKey(report)]
even when collectTrackedResourceDiagnostics returns ok==false (i.e., collection
panicked twice), leaving an empty diagnostics string; change the logic in the
ReportAfterEach closure (the function using ReportAfterEach, specDiagnostics,
specKey and collectTrackedResourceDiagnostics) to only store diag when ok==true
and otherwise store a clear placeholder like "(diagnostics unavailable due to
collection failure)" so JUnit output is informative; ensure this replacement
happens before writing to specDiagnostics[specKey(report)] and preserve existing
behavior that prevents repeated retries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: cc24425c-a0a8-4296-8d6a-3f01b1903268
📒 Files selected for processing (5)
Makefilee2e/e2e_common.goe2e/e2e_diagnostics_test.goe2e/e2e_test.gohack/test.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- e2e/e2e_common.go
|
/test e2e-aws-capi-techpreview |
|
/lgtm |
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
/test e2e-aws-capi-techpreview |
4333b8e to
2707376
Compare
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e/e2e_common.go`:
- Around line 140-154: The current diagnostics code creates a single diagCtx
with a 30s deadline and reuses it for all dumps, causing later API calls to fail
if an earlier call is slow; instead, stop using one shared diagCtx and pass the
parent ctx into the dump calls (dumpSingleResource, dumpAllAWSMachineTemplates,
dumpNamespaceEvents) and ensure each individual API call inside those functions
wraps the parent ctx with its own context.WithTimeout(ctx, 30*time.Second)
(callCtx, cancel) / defer cancel() so each Get/List has its own 30s budget
rather than sharing a single deadline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 57e206f6-bc51-48a2-990f-e0224c68d7d7
📒 Files selected for processing (5)
Makefilee2e/e2e_common.goe2e/e2e_diagnostics_test.goe2e/e2e_test.gohack/test.sh
🚧 Files skipped from review as they are similar to previous changes (3)
- hack/test.sh
- Makefile
- e2e/e2e_diagnostics_test.go
| specDiagnostics[specKey(CurrentSpecReport())] = diag | ||
| } | ||
| } | ||
| }) |
There was a problem hiding this comment.
You can add a node timeout this this JustAfterEach node, which will be applied to ctx.
2707376 to
75107fb
Compare
|
@coderabbitai full |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
e2e/e2e_test.go (1)
75-76: Minor: Use the already-dereferencedsrfor consistency.Line 75 accesses
report.SpecReports[i]directly whilesr(a pointer to the same element) is already available. Using*srwould be clearer.♻️ Suggested fix
- if diag, ok := specDiagnostics[specKey(report.SpecReports[i])]; ok { + if diag, ok := specDiagnostics[specKey(*sr)]; ok {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e/e2e_test.go` around lines 75 - 76, The code uses report.SpecReports[i] even though you already have a pointer sr to that element; replace report.SpecReports[i] with the dereferenced pointer *sr when calling specKey and when referencing that element for consistency (e.g., change specKey(report.SpecReports[i]) to specKey(*sr) and any similar usages), keeping the rest of the logic that appends diag to sr.Failure.Message unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@e2e/e2e_test.go`:
- Around line 75-76: The code uses report.SpecReports[i] even though you already
have a pointer sr to that element; replace report.SpecReports[i] with the
dereferenced pointer *sr when calling specKey and when referencing that element
for consistency (e.g., change specKey(report.SpecReports[i]) to specKey(*sr) and
any similar usages), keeping the rest of the logic that appends diag to
sr.Failure.Message unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: cb8d0e2a-25c7-4d56-9432-08ca15977fff
📒 Files selected for processing (5)
Makefilee2e/e2e_common.goe2e/e2e_diagnostics_test.goe2e/e2e_test.gohack/test.sh
✅ Files skipped from review due to trivial changes (1)
- hack/test.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- Makefile
- Exclude e2e from ginkgo parallelism - Collect diagnostics in JustAfterEach (before cleanup) for spec-body failures; fall back to ReportAfterEach for cleanup failures - Use composite specKey hash to prevent collisions for table-driven specs - Add 30s context timeout to diagnostic API calls - Add stack traces to panic recovery - Add unit tests for the diagnostics infrastructure using a fake client, runnable via make unit without a cluster.
75107fb to
0c4ab88
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
e2e/e2e_common.go (1)
126-141:⚠️ Potential issue | 🟠 MajorUse per-call timeouts instead of one shared diagnostics budget.
diagCtxat Line 127 is reused for everyGetandListin this dump. One slow fetch can consume most or all of the 5-minute budget and starve the remaining resource and event queries, which is exactly when these diagnostics are most useful. PassparentCtxthrough here and wrap each individual API call in its own shortcontext.WithTimeout(...)instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e/e2e_common.go` around lines 126 - 141, The diagnostics function collectTrackedResourceDiagnostics currently creates a single diagCtx used for all API calls (dumpSingleResource, dumpAllAWSMachineTemplates, dumpNamespaceEvents) which can let one slow call exhaust the 5-minute budget; change the pattern to pass parentCtx into these dump helpers and have each API operation inside them create its own short timeout (e.g., context.WithTimeout(parentCtx, shortDuration)) so each Get/List has a per-call deadline instead of sharing diagCtx. Update dumpSingleResource, dumpAllAWSMachineTemplates, and dumpNamespaceEvents signatures/usages to accept parentCtx and implement per-call context.WithTimeout wrappers around individual client calls. Ensure cancels are deferred for each per-call context.
🧹 Nitpick comments (1)
e2e/e2e_diagnostics_test.go (1)
71-80: This subtest doesn't exercise the skip path it names.It only pre-populates
specDiagnosticsand checks the map. IfReportAfterEachstopped honoring theexistsguard and overwrote the first entry, this would still pass. Consider extracting the store-if-absent branch into a small helper and testing that directly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e/e2e_diagnostics_test.go` around lines 71 - 80, The test currently only pre-populates specDiagnostics and checks the map but never exercises the actual "skip" logic in ReportAfterEach; extract the store-if-absent branch into a small helper (e.g., AddSpecDiagnosticIfNotExists or StoreSpecDiagnosticIfAbsent) that accepts a key and diagnostics (from collectTrackedResourceDiagnostics) and returns whether it stored or skipped, move the map-write logic that currently lives in ReportAfterEach into that helper, then update the subtest to call the helper directly and assert it returns "skipped" when the key already exists and "stored" when it does not, while keeping ReportAfterEach calling the helper so behavior stays unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@e2e/e2e_common.go`:
- Around line 126-141: The diagnostics function
collectTrackedResourceDiagnostics currently creates a single diagCtx used for
all API calls (dumpSingleResource, dumpAllAWSMachineTemplates,
dumpNamespaceEvents) which can let one slow call exhaust the 5-minute budget;
change the pattern to pass parentCtx into these dump helpers and have each API
operation inside them create its own short timeout (e.g.,
context.WithTimeout(parentCtx, shortDuration)) so each Get/List has a per-call
deadline instead of sharing diagCtx. Update dumpSingleResource,
dumpAllAWSMachineTemplates, and dumpNamespaceEvents signatures/usages to accept
parentCtx and implement per-call context.WithTimeout wrappers around individual
client calls. Ensure cancels are deferred for each per-call context.
---
Nitpick comments:
In `@e2e/e2e_diagnostics_test.go`:
- Around line 71-80: The test currently only pre-populates specDiagnostics and
checks the map but never exercises the actual "skip" logic in ReportAfterEach;
extract the store-if-absent branch into a small helper (e.g.,
AddSpecDiagnosticIfNotExists or StoreSpecDiagnosticIfAbsent) that accepts a key
and diagnostics (from collectTrackedResourceDiagnostics) and returns whether it
stored or skipped, move the map-write logic that currently lives in
ReportAfterEach into that helper, then update the subtest to call the helper
directly and assert it returns "skipped" when the key already exists and
"stored" when it does not, while keeping ReportAfterEach calling the helper so
behavior stays unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: df5bef14-0dc3-45ed-adb8-e4027729ca51
📒 Files selected for processing (5)
Makefilee2e/e2e_common.goe2e/e2e_diagnostics_test.goe2e/e2e_test.gohack/test.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- hack/test.sh
|
/test e2e-aws-capi-techpreview |
|
@theobarberbany: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
failures; fall back to ReportAfterEach for cleanup failures
runnable via make unit without a cluster.