rabbit_quorum_queue: Shrink batches of QQs in parallel #15081
+47
−34
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Shrinking a member node off of a QQ can be parallelized. The operation involves
ra:remove_member/3rabbit_amqqueue:update/2ra:force_delete_server/2if the node can be reachedAll of these operations are I/O bound. Updating the cluster membership and metadata store involves appending commands to those logs and replicating them. Writing commands to Ra synchronously in serial is fairly slow - sending many commands in parallel is much more efficient. By parallelizing these steps we can write larger chunks of commands to WAL(s).
ra:force_delete_server/2benefits from parallelizing if the node being shrunk off is no longer reachable, for example in some hardware failures. The underlyingrpc:call/4will attempt to auto-connect to the node and this can take some time to time out. By parallelizing this, eachrpc:call/4reuses the same underlying distribution entry and all calls fail together once the connection fails to establish.Discussed in #15003