|
1 | 1 | # Production Example |
2 | 2 |
|
3 | | -This is an example of a good starting point for a production RabbitMQ deployment. It deploys a 3-node cluster with enough resources to handle reasonable traffic. |
| 3 | +This is an example of a good starting point for a production RabbitMQ deployment. |
| 4 | +It deploys a 3-node cluster with sufficient resources to handle 1 billion messages per day at 8kB payload and a replication factor of three. |
| 5 | +The rest of the workload details are outlined in the monthly cost savings calculator on https://rabbitmq.com/tanzu |
4 | 6 |
|
5 | 7 | Please keep in mind that: |
6 | 8 |
|
7 | | -1. It may not be suitable for YOUR production deployment. Please go through the [Production Checklist](https://www.rabbitmq.com/production-checklist.html) to learn more about production deployment considerations. |
| 9 | +1. It may not be suitable for **your** production deployment. |
| 10 | + The official [RabbitMQ Production Checklist](https://www.rabbitmq.com/production-checklist.html) will help you with some of these considerations. |
8 | 11 |
|
9 | | -2. While it is important to correctly deploy RabbitMQ cluster for production deployment, it is even more important to correctly use RabbitMQ from your applications. [Production Checklist](https://www.rabbitmq.com/production-checklist.html) covers some of the common issues such as connection churn and polling consumers. Please also consider using [Quorum Queues](https://www.rabbitmq.com/quorum-queues.html) since they provide better data safety. |
| 12 | +2. While it is important to correctly deploy RabbitMQ cluster for production workloads, it is equally important for your applications to use RabbitMQ correctly. |
| 13 | + [Production Checklist](https://www.rabbitmq.com/production-checklist.html) covers some of the common issues such as connection churn and polling consumers. |
| 14 | + This example was tested with [Quorum Queues](https://www.rabbitmq.com/quorum-queues.html) which provide excellent data safety for workloads that require message replication. |
10 | 15 |
|
11 | | -You can deploy this example like this: |
| 16 | +Before you can deploy this RabbitMQ cluster, you will need a multi-zone Kubernetes cluster with at least 3 nodes, 12 CPUs, 30Gi RAM and 1.5Ti disk space available. |
| 17 | +A `storageClass` named `ssd` will need to be defined too. |
| 18 | +We have [a GKE-specific example](ssd-gke.yaml) included in this example. |
| 19 | +Read more about the expected disk performance [in Google Cloud Documentation](https://cloud.google.com/compute/docs/disks/performance#ssd_persistent_disk). |
| 20 | +For what it's worth, disk write throughput is the limiting factor for persistent messages with a payload of 8kB. |
| 21 | + |
| 22 | +To deploy this RabbitMQ cluster, run the following: |
12 | 23 |
|
13 | 24 | ```shell |
14 | 25 | kubectl apply -f rabbitmq.yaml |
15 | 26 | kubectl apply -f pod-disruption-budget.yaml |
16 | 27 | ``` |
17 | 28 |
|
18 | | -Please keep in mind that you need a multi-zone Kubernetes cluster with 3 nodes, 12 CPUs, 30Gi RAM, 1.5Ti disk space available as well as a `storageClass` called `ssd` to deploy this example as-is. Of course you can adjust these values to your environment if needed. |
| 29 | +## Q & A |
| 30 | + |
| 31 | +### Is 4 CPUs per RabbitMQ node the minimum? |
| 32 | + |
| 33 | +No. The absolute minimum is 2 CPUs. |
| 34 | + |
| 35 | +For our workload - 1 billion messages per day at 8kB payload and a replication factor of three - 4 CPUs is the minimum. |
| 36 | + |
| 37 | +### Will RabbitMQ work with 1 CPU? |
| 38 | + |
| 39 | +Yes. It will work, but poorly, which is why we cannot recommend it for production workloads. |
| 40 | +A RabbitMQ with less than 2 full CPUs cannot be considered production. |
| 41 | + |
| 42 | + |
| 43 | +### Can I assign less than 1 CPU to RabbitMQ? |
| 44 | + |
| 45 | +Yes, this is entirely possible within Kubernetes. |
| 46 | +Be prepared for unresponsiveness that cannot be explained. |
| 47 | +The kernel will work against RabbitMQ's runtime optimisations, and anything can happen. |
| 48 | +A RabbitMQ with less than 2 full CPUs cannot be considered production. |
| 49 | + |
| 50 | +### Does CPU clock speed matter for message throughput? |
| 51 | + |
| 52 | +Yes. Queues are single threaded, and CPUs with higher clock speeds can run more cycles, which means that the queue process can perform more operations per second. |
| 53 | +This will not the case when disks or network are the limiting factor, but in benchmarks with sufficient network and disk capacity, faster CPUs translate to higher message throughhput. |
| 54 | + |
| 55 | +### Are vCPUs (virtual CPUs) OK? |
19 | 56 |
|
20 | | -An SSD storage class can be defined using [the example](ssd-gke.yaml) (which is GKE-specific and needs to be adjusted for other environments). Read more about the expected disk performance [in Google Cloud Documentation](https://cloud.google.com/compute/docs/disks/performance#ssd_persistent_disk). |
| 57 | +Yes. The workload that was used for this production configuration starting point ran on Google Cloud and used 2 real CPU cores with 2 hyper-threads each, meaning 4 vCPUs. |
| 58 | +While we would recommend real CPUs and no hyper-threading, we also operate in the cloud and default to using vCPUs, including for our benchmarks. |
0 commit comments