feat!: make queue and reconciliation leader-loss-aware#1130
Draft
kimpenhaus wants to merge 32 commits into
Draft
feat!: make queue and reconciliation leader-loss-aware#1130kimpenhaus wants to merge 32 commits into
kimpenhaus wants to merge 32 commits into
Conversation
- stops reconciliation and leadership loss closes #784
- Introduced `ValidateRegistrations` setting in `OperatorSettings` to enable validation of DI registrations on host startup. - Added `OperatorRegistrationValidator` to ensure required components are registered for each managed entity, preventing silent misconfigurations. - Implemented `OperatorRegistrationRegistry` to track managed entities and their associated services. - Updated documentation with usage details and examples for registration validation. - Added comprehensive unit tests to cover all validation scenarios.
…plication cache consistency - Added checks to preserve deduplication cache state if enqueue fails due to leadership loss. - Introduced tests for drop scenarios: updates, deletions, and retry behavior. - Updated `EntityQueueBackgroundService` to use correct cancellation token for error retries. - Improved logging to trace dropped enqueues.
…isposal - Made `StartAsync` idempotent to avoid duplicate processing loops under concurrent leadership signals. - Added lifecycle lock to synchronize start/stop state transitions. - Fixed `Dispose` and `DisposeAsync` to unsubscribe from leadership elector callbacks. - Updated `DisposeAsync` to follow the asynchronous disposal pattern and release shared resources. - Introduced additional tests to validate idempotency and proper disposal behavior.
…ership flaps - Updated `EntityQueueBackgroundService` to assign a fresh `CancellationTokenSource` for each processing loop, ensuring proper disposal only after the loop ends. - Refactored `_cts` handling to avoid disposing a token source still observed by a previously running loop. - Enhanced DI validation to correctly handle open-generic service registrations with generic constraints. - Added unit tests for leadership flap scenarios and DI validation improvements.
…on cancellation on leadership loss
… prevent token disposal during in-flight reconciliations - Made `ReconcileAsync` fully asynchronous in multiple integration tests to align with updated queue behavior. - Refactored `EntityQueueBackgroundService` to manage multiple active processing loops, ensuring proper disposal and cancellation. - Introduced safeguards against `ObjectDisposedException` when a token is accessed during in-flight reconciliations. - Added timeout to drain in-flight reconciliations during disposal to prevent indefinite blocking.
… lifecycle management - Replaced duplicated lifecycle handling logic in `EntityQueueBackgroundService` and `ResourceWatcher` with the new `RestartableHostedService` base class. - Simplified start/stop mechanics by centralizing idempotent loop execution and cancellation handling in `RestartableHostedService`. - Updated disposal methods to align with the asynchronous disposal pattern, ensuring proper resource cleanup. - Adjusted integration tests to accommodate changes in background service behavior.
… `LeaderElectionType` - Updated documentation to explain how `LeaderElectionType` affects the queue-consumer service configuration, including behavior for `None`, `Single`, and `Custom` types. - Clarified scheduling state management and leadership-loss protection mechanisms.
…ElectionSubscription` - Introduced `LeaderElectionSubscription` to manage leadership callbacks consistently across services. - Simplified elector subscription/unsubscription logic in `LeaderAwareResourceWatcher` and `EntityQueueBackgroundService`. - Updated `RestartableHostedService` to support non-blocking stop behavior (`RequestStopAsync`). - Enhanced async disposal flow to ensure handlers are unsubscribed, preventing lingering references. - Added tests for leadership transitions, idle draining, and disposal correctness.
…handling - Fixed `EntityQueueBackgroundService` to reset `_running` state for proper restart after unexpected loop exits. - Added handling for unexpected loop faults with explicit logging via `OnLoopFaulted`. - Enhanced `LeaderAwareEntityQueueBackgroundService` to record reconciliation metrics for leader-elected consumers using `OperatorMetrics`. - Introduced comprehensive tests for loop restarting and metrics recording.
… handling - Updated `EntityCache` logic to scope removal by entity type, preserving unrelated entries during leadership loss. - Improved `RestartableHostedService` to restart loops with exponential backoff on faults, preventing silent service failures. - Refactored `LeaderElectionBackgroundService` to cancel backoff promptly on shutdown, ensuring graceful disposal. - Added new tests for loop restart behavior, entity-specific cache clearing, and backoff timing during shutdown.
…lt tolerance - Introduced startup validation to ensure FusionCache tagging remains enabled for resource watcher caches, preventing runtime failures due to misconfiguration. - Enhanced cache cleanup logic during leadership loss to handle exceptions gracefully, ensuring safety-critical stops proceed unaffected. - Improved error handling in leader-sensitive services to prevent propagation of faults into elector callbacks. - Added comprehensive unit tests for tagging validation, cache cleanup behavior, and leadership fault tolerance.
…adership flap handling
- Added unit tests for `RestartableHostedService` to verify backoff escalation in crash loops and reset behavior after healthy runs.
- Introduced a test for leadership flap handling, ensuring concurrent loops drain gracefully without faults.
- Applied `[Trait("Area", "LeaderLoss")]` to categorize relevant tests.
…or` and fix warning format in docs - Cleaned up comments in `OperatorRegistrationValidator`, removing redundant explanations and correcting phrasing. - Fixed markdown formatting for warnings in caching documentation to ensure proper rendering.
…gistration logic
…g to prevent ungraceful backoff shutdowns
…ant comments in `RestartableHostedService` and `ResourceWatcher`
…OrUpdate` method - Replaced positional arguments with named arguments to improve readability and maintainability. - Adjusted log messages for consistency and conciseness.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #784