rpc: add DRPC stream multiplexing behind an env gate#170248
rpc: add DRPC stream multiplexing behind an env gate#170248shubhamdhama wants to merge 4 commits into
Conversation
Adds a small helper that bumps the DRPC replace directive in go.mod to a specific commit. It takes a GitHub commit URL (cockroachdb or shubhamdhama fork), resolves the pseudo-version with `go list`, rewrites the replace line, runs `go mod tidy`, and regenerates the Bazel files via `./dev generate bazel --mirror`. Bumping DRPC was a multi-step ritual; the script removes the chance of forgetting one of the steps. Release note: None Epic: none
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
🚨 Remove
|
Bumps DRPC to pick up stream multiplexing from cockroachdb/drpc#58, where a single transport connection can carry multiple concurrent streams. The multiplexing dial path is guarded by COCKROACH_EXPERIMENTAL_DRPC_MUX_ENABLED, defaulted to false. With the gate off, DialDRPC keeps using drpcpool, which checks out a connection per active stream and dials a new one when none is idle. The upgraded multiplexing-capable library is therefore exercised on every connection, but the multiplexing capability itself stays unused: every stream still ends up on its own underlying connection, matching the prior behavior. This keeps the initial scope small. The dial-mux path is in place so it can be turned on for validation, with a follow-up to flip the default once we are confident. Both paths now share the same dial-option setup, so the mux path also picks up the client metrics and the request-recording gate that the pool path was already attaching. Release note: None Epic: none
b4179ad to
cc1edf7
Compare
Nukitt
left a comment
There was a problem hiding this comment.
LGTM! approving since you're already tracking the do-not-merge items.
| log.Dev.Infof(ctx, "dialing DRPC mux connection to %s", target) | ||
| return dialDRPCMux(ctx, target, drpcDialOptions) | ||
| } | ||
| log.Dev.Infof(ctx, "dialing DRPC non-mux connection to %s", target) |
There was a problem hiding this comment.
Won't this and the one above this make the logs a lot noisier since we'll be logging every dial. Maybe lets have some verbosity flag if thats possible.
There was a problem hiding this comment.
Oh yes. I'll remove these. They were better for debugging.
|
Multiplexing off |
|
Multiplexing disabled Multiplexing enabled |
|
Here is my analysis of understanding the (expected) regression (let me know if something isn't clear here), benchmark results
TLDR Turning on DRPC stream multiplexing costs us about 2.5% on the parallel On send, the old path flushed inline: On receive, the old path assembled packets in the reader and handed them up Most of the transport CPU shifted rather than being added: inline-flush work Analysis Here is the normalized CPU diff, so we get a fair comparison between old and OLD=benchdiff/15db668/artifacts/profiles/merged
NEW=benchdiff/3918e6f/artifacts/profiles/merged
go tool pprof -top -flat -nodecount=40 -normalize -diff_base=$OLD/cpu.prof $NEW/cpu.profThe functions that grow sit on the new send and receive paths: The functions that shrink are the old single-stream paths the new code replaced, Lined up against each other, most of the transport CPU is the same work For memory, we look at the allocation diff on the parallel variant: go tool pprof -top -flat -nodecount=25 -normalize -sample_index=alloc_space \
-diff_base=$OLD/mem_ParallelSysbench_SQL_3node_oltp_read_write.pb.gz \
$NEW/mem_ParallelSysbench_SQL_3node_oltp_read_write.pb.gzThe result is even more concentrated. A single function dominates: Everything else in the diff is negative or noise. The ring buffer allocates on The reason is the ring's shape. Each stream owns its own 256-slot ring, and the The clearest single picture is the differential flame graph, where red marks go tool pprof -http=:8081 -normalize -diff_base=$OLD/cpu.prof $NEW/cpu.profIn flame graph view the |
Bumps DRPC to pick up stream multiplexing from cockroachdb/drpc#58,
where a single transport connection can carry multiple concurrent
streams.
The multiplexing dial path is guarded by
COCKROACH_EXPERIMENTAL_DRPC_MUX_ENABLED, defaulted to false. With the
gate off, DialDRPC keeps using drpcpool, which checks out a connection
per active stream and dials a new one when none is idle. The upgraded
multiplexing-capable library is therefore exercised on every connection,
but the multiplexing capability itself stays unused: every stream still
ends up on its own underlying connection, matching the prior behavior.
This keeps the initial scope small. The dial-mux path is in place so
it can be turned on for validation, with a follow-up to flip the
default once we are confident.
Both paths now share the same dial-option setup, so the mux path also
picks up the client metrics and the request-recording gate that the
pool path was already attaching.
Release note: None
Epic: none