CBG-5444 Close terminator instead of stopProcess when polling detects StatusNotRunning#8353
CBG-5444 Close terminator instead of stopProcess when polling detects StatusNotRunning#8353adamcfraser wants to merge 2 commits into
Conversation
… StatusNotRunning Avoids having other nodes overwrite completed status with stopped if they are not the first to update the status to completed.
There was a problem hiding this comment.
Pull request overview
This PR adjusts multi-node background manager behavior so that when the polling loop detects the cluster status is no longer running, it terminates local execution without calling stopProcess, to avoid persisting a “stopping/stopped” state over a terminal cluster state (e.g., completed). It also adds explanatory notes to the long-running distributed resync dev-time test about cbgt feed/dest behavior when a second node comes online.
Changes:
- In multi-node status polling, close the terminator directly instead of calling
stopProcesswhen the cluster status indicates the process is not running. - Add detailed commentary to
TestDistributedResyncdescribing cbgt feed rebalance/destKey behavior and its effect on what the test is actually validating.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
db/background_mgr.go |
Changes the multi-node polling termination behavior when the cluster status doc indicates a terminal state. |
rest/adminapitest/resync_test.go |
Adds clarifying comments about cbgt behavior impacting distributed resync dev-time testing. |
| err := b.updateMultiNodeClusterAwareStatus(ctx, backgroundManagerStatusUpdate) | ||
| if err != nil { | ||
| if errors.Is(err, errBackgroundManagerStatusNotRunning) { | ||
| b.stopProcess(ctx) | ||
| b.terminator.Close() | ||
| return |
| if err != nil { | ||
| if errors.Is(err, errBackgroundManagerStatusNotRunning) { | ||
| b.stopProcess(ctx) | ||
| b.terminator.Close() | ||
| return |
There was a problem hiding this comment.
I think this comment is correct - I think what has to happen is that we should transition the local status to match whatever the cluster status was - and then call terminate()
There was a problem hiding this comment.
This code is safe because of this block: https://github.com/couchbase/sync_gateway/blob/main/db/background_mgr.go#L723-L733
In the case of the persistClusterStatusCallback, this is defined as b.UpdateStatusClusterAware which uses UpdateStatusClusterAware which uses backgroundManagerStatusUpdate.
In the case of the transition at the end of the process, b.UpdateStatusClusterAware is also called.
CBG-5444
Avoids having nodes overwrite completed status with stopped if they are not the first to persist the completed status state.