-
Notifications
You must be signed in to change notification settings - Fork 90
Batched Handler
This image shows the basic structure of the handler. It's a common design pattern with requests coming in the front, some kind of relay (realized or just logical), and then a pool of connections servicing the requests coming in the front end.

The handler frontend, the bit that is synchronous with the external connection, is a relatively simple component. For every request that comes in it will submit the request to the relay to be sent on one of the pooled connection and then block on a response channel. The request that is submitted contains a response channel that the frontend blocks on.

The relay selects a backend connection randomly for each incoming request. To avoid lock contention, each external connection has an independent random number generator seeded by true random data from the OS. This means each connection can completely independently choose which connection it will submit each request to. A submission is just a channel send.
On the side, each relay has a monitor that constantly checks on the load that each backend connection is taking and adds a connection if it's needed. This happens on a regular schedule, every few seconds or so. It currently uses a high water mark algorithm, meaning it only expands the pool and does not contract it.
The monitor pulls the average batch size from each pooled connection to calculate the overall load factor and will add a connection if it above the configured ratio. It will also scale up if some connections have a disproportionately high load during the interval, the ratio of which is also configurable. This is to avoid situations where one or two connections in a small pool are getting overloaded but the overall load is not above the load factor limit.
It should be noted here that there are no locks in the hot path to making a request except the ones around the channel sends and receives. Adding a connection, choosing one to send to, and reading statistics from the pooled connections are all done with lock-free methods.
Maxima used to be collected to calculate some heuristics but since they were a bit unreliable in terms of scaling, averages are used. Some strategies that were tried but did not pan out include multiple maxed out batches in a row, notifying the monitor to run asynchronously, and simply using the max batch sizes in the load factor calculation. In local testing, these didn't tend to accurately represent the conditions as much as the average did.

Each pooled connection is actually a connection to the external store (Memcached) and two goroutines, one for batching up requests and writing and the other for reading the responses off the connection, matching them with the response channels, and sending off those responses. This also handles the cases like multi-gets where there may be multiple responses going to the same response channel.
In the picture below you can think of the construction of each pooled connection as three asynchronously running domains, very similar to how network hardware works. At the top is the domain that includes the constructs dedicated to each external connection (a bunch of goroutines connected to clients), on the left is a goroutine that batches requests up and writes them out, and on the right is a goroutine that reads responses off the socket and sends the responses back to the goroutines in the external connection domain. The reader knows what to expect because the batcher sends over a map of request ID -> response channel for each batch it sends.
