fix: mark exporter offline when unregistering#828
Conversation
instead of relying on the heartbeat Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
📝 WalkthroughWalkthroughThe unregister flow now updates exporter status fields and conditions to explicitly mark the exporter offline, stores a reason-based status message, sets the Registered condition to false, patches the status, and logs that the exporter was marked offline. ChangesExporter unregister status update
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@mangelajo @evakhoni let me know if it makes sense |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@controller/internal/service/controller_service.go`:
- Around line 328-333: The reason from req.GetReason() is being assigned to
exporter.Status.StatusMessage without validation, and since this message is
later copied into the Online condition message field by
syncOnlineConditionWithStatus, it must be bounded to respect the CRD schema
limit of 32768 characters. Before assigning the reason to
exporter.Status.StatusMessage, check if the reason string exceeds 32768
characters and truncate it to that limit if necessary to prevent validation
failures during the Status().Patch() call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3f18095c-9c91-4b94-bc66-b4f7c65310d2
📒 Files selected for processing (1)
controller/internal/service/controller_service.go
| if reason := req.GetReason(); reason != "" { | ||
| exporter.Status.StatusMessage = reason | ||
| } else { | ||
| exporter.Status.StatusMessage = "Exporter unregistered" | ||
| } | ||
| syncOnlineConditionWithStatus(exporter) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Confirm condition message limits and whether UnregisterRequest.reason is already bounded.
set -euo pipefail
# Inspect generated YAML schemas for condition/status message constraints.
mapfile -t yamls < <(fd -i '\.ya?ml$' .)
if ((${`#yamls`[@]})); then
rg -n -C4 'maxLength:\s*32768|conditions:|statusMessage:' "${yamls[@]}"
fi
# Inspect protobuf/API definition for UnregisterRequest.reason validation or comments.
rg -n -C5 'message\s+UnregisterRequest|reason\s*=' .Repository: jumpstarter-dev/jumpstarter
Length of output: 217
🏁 Script executed:
# Fix: avoid mapfile, use fd directly with rg
# Search for UnregisterRequest protobuf definition
find . -type f \( -name "*.proto" -o -name "*.go" \) -print0 | xargs -0 grep -l "UnregisterRequest" | head -5Repository: jumpstarter-dev/jumpstarter
Length of output: 400
🏁 Script executed:
# Search for the Unregister method and syncOnlineConditionWithStatus function
rg -n "func.*Unregister|syncOnlineConditionWithStatus" --type go controller/Repository: jumpstarter-dev/jumpstarter
Length of output: 2767
🏁 Script executed:
# Examine the controller_service.go file around lines 328-333 and the syncOnlineConditionWithStatus function
cat -n controller/internal/service/controller_service.go | sed -n '320,340p'Repository: jumpstarter-dev/jumpstarter
Length of output: 1030
🏁 Script executed:
# Search for syncOnlineConditionWithStatus implementation
rg -n -A 15 "func.*syncOnlineConditionWithStatus" --type goRepository: jumpstarter-dev/jumpstarter
Length of output: 165
🏁 Script executed:
# Read the syncOnlineConditionWithStatus implementation
cat -n controller/internal/service/controller_service.go | sed -n '476,510p'Repository: jumpstarter-dev/jumpstarter
Length of output: 2140
🏁 Script executed:
# Read the protobuf definition for UnregisterRequest
cat -n protocol/proto/jumpstarter/v1/jumpstarter.proto | grep -A 10 "message UnregisterRequest"Repository: jumpstarter-dev/jumpstarter
Length of output: 543
🏁 Script executed:
# Search for CRD/schema definitions that might define condition message limits
find . -type f \( -name "*crd*.yaml" -o -name "*schema*.yaml" -o -name "*exporter*.yaml" \) | head -10Repository: jumpstarter-dev/jumpstarter
Length of output: 825
🏁 Script executed:
# Check if there are any validation or constraint comments in controller_service.go around the Unregister method
cat -n controller/internal/service/controller_service.go | sed -n '303,350p'Repository: jumpstarter-dev/jumpstarter
Length of output: 1937
🏁 Script executed:
# Check the CRD schema for the Exporter resource to see if condition messages have maxLength constraints
cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_exporters.yaml | grep -A 50 "conditions:" | head -80Repository: jumpstarter-dev/jumpstarter
Length of output: 3099
🏁 Script executed:
# Search more specifically for message field constraints in the CRD
cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_exporters.yaml | grep -B 5 -A 5 "message"Repository: jumpstarter-dev/jumpstarter
Length of output: 1852
🏁 Script executed:
# Check if there are any validation rules or constraints on StatusMessage field
rg -n "StatusMessage|maxLength" controller/internal/api/v1alpha1/exporter_types.goRepository: jumpstarter-dev/jumpstarter
Length of output: 256
🏁 Script executed:
# Look at the Exporter CRD type definition
cat -n controller/internal/api/v1alpha1/exporter_types.go | head -100Repository: jumpstarter-dev/jumpstarter
Length of output: 248
Bound the unregister reason before copying it into status conditions.
req.GetReason() is copied into StatusMessage, then syncOnlineConditionWithStatus copies it into the Online condition message. The condition message field has a hard limit of 32768 characters in the CRD schema. If the reason exceeds this limit, the Status().Patch() call will fail with validation error and the exporter will remain not marked offline/unregistered.
🛡️ Proposed guard
- if reason := req.GetReason(); reason != "" {
- exporter.Status.StatusMessage = reason
- } else {
- exporter.Status.StatusMessage = "Exporter unregistered"
- }
+ statusMessage := req.GetReason()
+ if statusMessage == "" {
+ statusMessage = "Exporter unregistered"
+ }
+ const maxConditionMessageLength = 32768
+ statusMessageRunes := []rune(statusMessage)
+ if len(statusMessageRunes) > maxConditionMessageLength {
+ statusMessage = string(statusMessageRunes[:maxConditionMessageLength])
+ }
+ exporter.Status.StatusMessage = statusMessage📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if reason := req.GetReason(); reason != "" { | |
| exporter.Status.StatusMessage = reason | |
| } else { | |
| exporter.Status.StatusMessage = "Exporter unregistered" | |
| } | |
| syncOnlineConditionWithStatus(exporter) | |
| statusMessage := req.GetReason() | |
| if statusMessage == "" { | |
| statusMessage = "Exporter unregistered" | |
| } | |
| const maxConditionMessageLength = 32768 | |
| statusMessageRunes := []rune(statusMessage) | |
| if len(statusMessageRunes) > maxConditionMessageLength { | |
| statusMessage = string(statusMessageRunes[:maxConditionMessageLength]) | |
| } | |
| exporter.Status.StatusMessage = statusMessage | |
| syncOnlineConditionWithStatus(exporter) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@controller/internal/service/controller_service.go` around lines 328 - 333,
The reason from req.GetReason() is being assigned to
exporter.Status.StatusMessage without validation, and since this message is
later copied into the Online condition message field by
syncOnlineConditionWithStatus, it must be bounded to respect the CRD schema
limit of 32768 characters. Before assigning the reason to
exporter.Status.StatusMessage, check if the reason string exceeds 32768
characters and truncate it to that limit if necessary to prevent validation
failures during the Status().Patch() call.
|
What is the advantage of this? I am trying to remember why we had Unregistered < Offline. Unregistered is an indication that the exporter may come back soon, so when you have very few exporters of one type and an exporter is ending a lease, while it is restarting the sub-child, still the server will lease it. The problem comes when the exporter goes offline longer than expected (image rotation), ... then the client fails to connect. We want to keep the "Unregistered" as "Online" for a while for this reason. You don't see it in our lab because we have plenty, but when you have 1-2... the user hits that problem and then the lease is instantly rejected because all exporters are "offline" sometimes. |
mangelajo
left a comment
There was a problem hiding this comment.
See the comment on the main thread, that's the reason for the difference.
We should just do client retries, and eventual failure.
unless i misunderstood you, we already have that retry behavior with: the lease is not rejected |
Hmmmm, you're probably right. Ok, let me do a couple of tests tomorrow, but this would be complementary to the client-retry logic, while the window would be smaller, it'd still be possible. Thanks for double checking @bennyz I was so sure... |
yeah we still need client retry because we still might have a race
yeah it does not replace client reconnection as there is still a chance for race |
bkhizgiy
left a comment
There was a problem hiding this comment.
I think it's also worth adding some test coverage for this as part of the change.
| meta.SetStatusCondition(&exporter.Status.Conditions, metav1.Condition{ | ||
| Type: string(jumpstarterdevv1alpha1.ExporterConditionTypeRegistered), | ||
| Status: metav1.ConditionFalse, | ||
| ObservedGeneration: exporter.Generation, | ||
| Reason: "Unregister", |
There was a problem hiding this comment.
Should we include the Message filed to make it consistent with how other statuses are being handled?
|
so this is complimentary to the client retry I'm working on 🤔 I'll take a look hopefully tomorrow to see how it would behave in conjunction with my changes |
instead of relying on the heartbeat