client/v3: fix double resolver update in EtcdManualResolver.Build#21662
client/v3: fix double resolver update in EtcdManualResolver.Build#21662BootstrapperSBL wants to merge 1 commit into
Conversation
EtcdManualResolver.Build called the embedded manual.Resolver.Build first and then pushed the endpoints and the round_robin ServiceConfig through a follow-up updateState. gRPC therefore saw an initial resolver state without the ServiceConfig and then a second update carrying it, which forced it to switch balancers mid-connection and tear down an in-flight SubChannel. The resulting "grpc: addrConn.createTransport failed ... operation was canceled" warnings are noisy, and every new grpc.ClientConn pays for a throwaway TCP dial and TLS handshake. Seed the initial state via manual.Resolver.InitialState before calling manual.Resolver.Build so the first and only UpdateState dispatched to gRPC already carries both the endpoints and the ServiceConfig. The endpoint-to-state conversion is factored into a small buildState helper so SetEndpoints keeps sharing the same code path. Fixes etcd-io#21660 Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: BootstrapperSBL The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @BootstrapperSBL. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
@BootstrapperSBL: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
... and 149 files with indirect coverage changes @@ Coverage Diff @@
## main #21662 +/- ##
==========================================
+ Coverage 61.81% 68.47% +6.66%
==========================================
Files 418 432 +14
Lines 34276 35408 +1132
==========================================
+ Hits 21188 24247 +3059
+ Misses 11522 9756 -1766
+ Partials 1566 1405 -161 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Fixes #21660.
Background
EtcdManualResolver.Buildused to invoke the embeddedmanual.Resolver.Buildfirst and only then push the endpoints plus theround_robinServiceConfigvia a follow-upupdateStatecall. gRPC therefore observed an initial resolver state without theServiceConfigand a second update shortly after that carried it, which forced gRPC to switch load balancers mid-connection and tear down the in-flightSubChannel. The warning visible in client logs was:Every new
grpc.ClientConn(everynewClient, every per-endpointmaintenance.Status/HashKV/ etc.) paid for a throwaway TCP dial and TLS handshake on top of the noise in the logs. The root cause was flagged in the issue by @zyriljamez and the proposed ordering change was endorsed by @ahrtr.Fix
Seed the initial resolver state via
manual.Resolver.InitialStatebefore callingmanual.Resolver.Build. That way the underlying resolver dispatches the firstUpdateStateto theClientConnitself, already carrying the endpoints and theround_robinServiceConfigas one atomic update, and theupdateStatetrailer is no longer needed on the Build path.A small
buildStatehelper is extracted so thatupdateState(used bySetEndpoints) keeps sharing the exact same state construction.Before / after
Build(empty initial + ServiceConfig second); gRPC balancer switch;addrConn.createTransport ... operation was canceledwarnings; one wasted TCP dial + TLS handshake per client.Build, already carrying endpoints and ServiceConfig; no balancer switch; no warnings; no wasted dial.Tests
Added
client/v3/internal/resolver/resolver_test.gowith:TestBuildSendsSingleUpdateWithServiceConfig— assertsBuildforwards exactly oneUpdateStatecall and that call already carries both the endpoints and theServiceConfig.TestSetEndpointsAfterBuild— assertsSetEndpointskeeps propagating updates through the sharedbuildStatepath.cc @ahrtr for review — the approach here matches the sequencing change you endorsed on the issue.