Skip to content

fix: mark exporter offline when unregistering#828

Open
bennyz wants to merge 1 commit into
jumpstarter-dev:mainfrom
bennyz:worktree-fix-unregister-race
Open

fix: mark exporter offline when unregistering#828
bennyz wants to merge 1 commit into
jumpstarter-dev:mainfrom
bennyz:worktree-fix-unregister-race

Conversation

@bennyz

@bennyz bennyz commented Jun 23, 2026

Copy link
Copy Markdown
Member

instead of relying on the heartbeat

instead of relying on the heartbeat

Signed-off-by: Benny Zlotnik <bzlotnik@redhat.com>
@bennyz bennyz requested review from evakhoni and mangelajo June 23, 2026 14:34
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The unregister flow now updates exporter status fields and conditions to explicitly mark the exporter offline, stores a reason-based status message, sets the Registered condition to false, patches the status, and logs that the exporter was marked offline.

Changes

Exporter unregister status update

Layer / File(s) Summary
Offline status marking
controller/internal/service/controller_service.go
Unregister now sets exporter status to Offline, fills StatusMessage from the request reason or a default, synchronizes the deprecated Online condition, sets Registered to false with reason Unregister, then patches status and updates the log message.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I tugged the cord on a quiet machine,
and marked it offline, tidy and clean.
A reason now rides in the status note,
while conditions agree in a well-mended coat.
Hop, patch, and log — away I bound.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: marking exporters offline during unregistration instead of relying on heartbeat.
Description check ✅ Passed The description is related to the changeset, explaining the rationale for the change (marking offline immediately instead of waiting for heartbeat).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@bennyz

bennyz commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

@mangelajo @evakhoni let me know if it makes sense

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@controller/internal/service/controller_service.go`:
- Around line 328-333: The reason from req.GetReason() is being assigned to
exporter.Status.StatusMessage without validation, and since this message is
later copied into the Online condition message field by
syncOnlineConditionWithStatus, it must be bounded to respect the CRD schema
limit of 32768 characters. Before assigning the reason to
exporter.Status.StatusMessage, check if the reason string exceeds 32768
characters and truncate it to that limit if necessary to prevent validation
failures during the Status().Patch() call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3f18095c-9c91-4b94-bc66-b4f7c65310d2

📥 Commits

Reviewing files that changed from the base of the PR and between c7c0ec9 and 83575d6.

📒 Files selected for processing (1)
  • controller/internal/service/controller_service.go

Comment on lines +328 to +333
if reason := req.GetReason(); reason != "" {
exporter.Status.StatusMessage = reason
} else {
exporter.Status.StatusMessage = "Exporter unregistered"
}
syncOnlineConditionWithStatus(exporter)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Confirm condition message limits and whether UnregisterRequest.reason is already bounded.

set -euo pipefail

# Inspect generated YAML schemas for condition/status message constraints.
mapfile -t yamls < <(fd -i '\.ya?ml$' .)
if ((${`#yamls`[@]})); then
  rg -n -C4 'maxLength:\s*32768|conditions:|statusMessage:' "${yamls[@]}"
fi

# Inspect protobuf/API definition for UnregisterRequest.reason validation or comments.
rg -n -C5 'message\s+UnregisterRequest|reason\s*=' .

Repository: jumpstarter-dev/jumpstarter

Length of output: 217


🏁 Script executed:

# Fix: avoid mapfile, use fd directly with rg
# Search for UnregisterRequest protobuf definition
find . -type f \( -name "*.proto" -o -name "*.go" \) -print0 | xargs -0 grep -l "UnregisterRequest" | head -5

Repository: jumpstarter-dev/jumpstarter

Length of output: 400


🏁 Script executed:

# Search for the Unregister method and syncOnlineConditionWithStatus function
rg -n "func.*Unregister|syncOnlineConditionWithStatus" --type go controller/

Repository: jumpstarter-dev/jumpstarter

Length of output: 2767


🏁 Script executed:

# Examine the controller_service.go file around lines 328-333 and the syncOnlineConditionWithStatus function
cat -n controller/internal/service/controller_service.go | sed -n '320,340p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 1030


🏁 Script executed:

# Search for syncOnlineConditionWithStatus implementation
rg -n -A 15 "func.*syncOnlineConditionWithStatus" --type go

Repository: jumpstarter-dev/jumpstarter

Length of output: 165


🏁 Script executed:

# Read the syncOnlineConditionWithStatus implementation
cat -n controller/internal/service/controller_service.go | sed -n '476,510p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 2140


🏁 Script executed:

# Read the protobuf definition for UnregisterRequest
cat -n protocol/proto/jumpstarter/v1/jumpstarter.proto | grep -A 10 "message UnregisterRequest"

Repository: jumpstarter-dev/jumpstarter

Length of output: 543


🏁 Script executed:

# Search for CRD/schema definitions that might define condition message limits
find . -type f \( -name "*crd*.yaml" -o -name "*schema*.yaml" -o -name "*exporter*.yaml" \) | head -10

Repository: jumpstarter-dev/jumpstarter

Length of output: 825


🏁 Script executed:

# Check if there are any validation or constraint comments in controller_service.go around the Unregister method
cat -n controller/internal/service/controller_service.go | sed -n '303,350p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 1937


🏁 Script executed:

# Check the CRD schema for the Exporter resource to see if condition messages have maxLength constraints
cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_exporters.yaml | grep -A 50 "conditions:" | head -80

Repository: jumpstarter-dev/jumpstarter

Length of output: 3099


🏁 Script executed:

# Search more specifically for message field constraints in the CRD
cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_exporters.yaml | grep -B 5 -A 5 "message"

Repository: jumpstarter-dev/jumpstarter

Length of output: 1852


🏁 Script executed:

# Check if there are any validation rules or constraints on StatusMessage field
rg -n "StatusMessage|maxLength" controller/internal/api/v1alpha1/exporter_types.go

Repository: jumpstarter-dev/jumpstarter

Length of output: 256


🏁 Script executed:

# Look at the Exporter CRD type definition
cat -n controller/internal/api/v1alpha1/exporter_types.go | head -100

Repository: jumpstarter-dev/jumpstarter

Length of output: 248


Bound the unregister reason before copying it into status conditions.

req.GetReason() is copied into StatusMessage, then syncOnlineConditionWithStatus copies it into the Online condition message. The condition message field has a hard limit of 32768 characters in the CRD schema. If the reason exceeds this limit, the Status().Patch() call will fail with validation error and the exporter will remain not marked offline/unregistered.

🛡️ Proposed guard
-	if reason := req.GetReason(); reason != "" {
-		exporter.Status.StatusMessage = reason
-	} else {
-		exporter.Status.StatusMessage = "Exporter unregistered"
-	}
+	statusMessage := req.GetReason()
+	if statusMessage == "" {
+		statusMessage = "Exporter unregistered"
+	}
+	const maxConditionMessageLength = 32768
+	statusMessageRunes := []rune(statusMessage)
+	if len(statusMessageRunes) > maxConditionMessageLength {
+		statusMessage = string(statusMessageRunes[:maxConditionMessageLength])
+	}
+	exporter.Status.StatusMessage = statusMessage
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if reason := req.GetReason(); reason != "" {
exporter.Status.StatusMessage = reason
} else {
exporter.Status.StatusMessage = "Exporter unregistered"
}
syncOnlineConditionWithStatus(exporter)
statusMessage := req.GetReason()
if statusMessage == "" {
statusMessage = "Exporter unregistered"
}
const maxConditionMessageLength = 32768
statusMessageRunes := []rune(statusMessage)
if len(statusMessageRunes) > maxConditionMessageLength {
statusMessage = string(statusMessageRunes[:maxConditionMessageLength])
}
exporter.Status.StatusMessage = statusMessage
syncOnlineConditionWithStatus(exporter)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controller/internal/service/controller_service.go` around lines 328 - 333,
The reason from req.GetReason() is being assigned to
exporter.Status.StatusMessage without validation, and since this message is
later copied into the Online condition message field by
syncOnlineConditionWithStatus, it must be bounded to respect the CRD schema
limit of 32768 characters. Before assigning the reason to
exporter.Status.StatusMessage, check if the reason string exceeds 32768
characters and truncate it to that limit if necessary to prevent validation
failures during the Status().Patch() call.

@mangelajo

Copy link
Copy Markdown
Member

What is the advantage of this? I am trying to remember why we had Unregistered < Offline.

Unregistered is an indication that the exporter may come back soon, so when you have very few exporters of one type and an exporter is ending a lease, while it is restarting the sub-child, still the server will lease it.

The problem comes when the exporter goes offline longer than expected (image rotation), ... then the client fails to connect.

We want to keep the "Unregistered" as "Online" for a while for this reason.

You don't see it in our lab because we have plenty, but when you have 1-2... the user hits that problem and then the lease is instantly rejected because all exporters are "offline" sometimes.

@mangelajo mangelajo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment on the main thread, that's the reason for the difference.

We should just do client retries, and eventual failure.

@bennyz

bennyz commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

What is the advantage of this? I am trying to remember why we had Unregistered < Offline.

Unregistered is an indication that the exporter may come back soon, so when you have very few exporters of one type and an exporter is ending a lease, while it is restarting the sub-child, still the server will lease it.

The problem comes when the exporter goes offline longer than expected (image rotation), ... then the client fails to connect.

We want to keep the "Unregistered" as "Online" for a while for this reason.

You don't see it in our lab because we have plenty, but when you have 1-2... the user hits that problem and then the lease is instantly rejected because all exporters are "offline" sometimes.

unless i misunderstood you, we already have that retry behavior with:

⠼ Waiting for lease: While there are 1 available exporters (i.e. ), none of them are online (0:00:01)

the lease is not rejected

@mangelajo

Copy link
Copy Markdown
Member

What is the advantage of this? I am trying to remember why we had Unregistered < Offline.
Unregistered is an indication that the exporter may come back soon, so when you have very few exporters of one type and an exporter is ending a lease, while it is restarting the sub-child, still the server will lease it.
The problem comes when the exporter goes offline longer than expected (image rotation), ... then the client fails to connect.
We want to keep the "Unregistered" as "Online" for a while for this reason.
You don't see it in our lab because we have plenty, but when you have 1-2... the user hits that problem and then the lease is instantly rejected because all exporters are "offline" sometimes.

unless i misunderstood you, we already have that retry behavior with:

⠼ Waiting for lease: While there are 1 available exporters (i.e. ), none of them are online (0:00:01)

the lease is not rejected

Hmmmm, you're probably right. Ok, let me do a couple of tests tomorrow, but this would be complementary to the client-retry logic, while the window would be smaller, it'd still be possible.

Thanks for double checking @bennyz I was so sure...

@mangelajo mangelajo self-requested a review June 23, 2026 16:48
@bennyz

bennyz commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Hmmmm, you're probably right. Ok, let me do a couple of tests tomorrow, but this would be complementary to the client-retry logic, while the window would be smaller, it'd still be possible.

yeah we still need client retry because we still might have a race

What is the advantage of this? I am trying to remember why we had Unregistered < Offline.
Unregistered is an indication that the exporter may come back soon, so when you have very few exporters of one type and an exporter is ending a lease, while it is restarting the sub-child, still the server will lease it.
The problem comes when the exporter goes offline longer than expected (image rotation), ... then the client fails to connect.
We want to keep the "Unregistered" as "Online" for a while for this reason.
You don't see it in our lab because we have plenty, but when you have 1-2... the user hits that problem and then the lease is instantly rejected because all exporters are "offline" sometimes.

unless i misunderstood you, we already have that retry behavior with:

⠼ Waiting for lease: While there are 1 available exporters (i.e. ), none of them are online (0:00:01)

the lease is not rejected

Hmmmm, you're probably right. Ok, let me do a couple of tests tomorrow, but this would be complementary to the client-retry logic, while the window would be smaller, it'd still be possible.

Thanks for double checking @bennyz I was so sure...

yeah it does not replace client reconnection as there is still a chance for race

@bkhizgiy bkhizgiy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also worth adding some test coverage for this as part of the change.

Comment on lines +334 to +338
meta.SetStatusCondition(&exporter.Status.Conditions, metav1.Condition{
Type: string(jumpstarterdevv1alpha1.ExporterConditionTypeRegistered),
Status: metav1.ConditionFalse,
ObservedGeneration: exporter.Generation,
Reason: "Unregister",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include the Message filed to make it consistent with how other statuses are being handled?

@mangelajo mangelajo added this to the 0.9.0 milestone Jun 24, 2026
@evakhoni

Copy link
Copy Markdown
Member

so this is complimentary to the client retry I'm working on 🤔 I'll take a look hopefully tomorrow to see how it would behave in conjunction with my changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants