[Bug/Question] RpcException: timeout when waiting for send fragments RPC (exec_plan_fragment_prepare) #59561
Unanswered
fukai321
asked this question in
A - General / Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
When executing a query, the FE fails with a java.util.concurrent.TimeoutException. It seems the FE waited for 5000ms but didn't receive a response from the BE while sending the execution plan fragment (exec_plan_fragment_prepare).
Environment
Doris Version: 1.2.1
Cluster Scale: 1FE 16G32G、3BE 16C64G
Error Log
The following error was captured in fe.log:
Caused by: java.util.concurrent.TimeoutException: Waited 5000 milliseconds (plus 35 milliseconds, 423986 nanoseconds delay) for io.grpc.stub.ClientCalls$GrpcFuture@2eafe466[status=PENDING, info=[GrpcFuture{clientCall={delegate=ClientCallImpl{method=MethodDescriptor{fullMethodName=doris.PBackendService/exec_plan_fragment_prepare, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@2b90f55c, responseMarshaller=io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@20aab861, schemaDescriptor=org.apache.doris.proto.PBackendServiceGrpc$PBackendServiceMethodDescriptorSupplier@2a53d3aa}}}}]]
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:506) ~[spark-dpp-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.Coordinator.waitRpc(Coordinator.java:716) ~[doris-fe.jar:1.0-SNAPSHOT]
... 13 more
2026-01-04 23:44:15,017 WARN (mysql-nio-pool-193363|1797346) [StmtExecutor.execute():591] execute Exception. stmt[429323510, c363048331a44cfb-b3b40b967e6e7e65]
org.apache.doris.rpc.RpcException: timeout when waiting for send fragments RPC. Wait(sec): 5, host: 192.168.130.8
at org.apache.doris.qe.Coordinator.waitRpc(Coordinator.java:749) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.Coordinator.sendFragment(Coordinator.java:677) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.Coordinator.exec(Coordinator.java:552) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.sendResult(StmtExecutor.java:1140) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleQueryStmt(StmtExecutor.java:1120) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:520) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:407) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:322) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:463) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:690) ~[doris-fe.jar:1.0-SNAPSHOT]
at org.apache.doris.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_92]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_92]
Additional Context
Frequency: This issue is intermittent, occurring about 10+ times per day.
Performance: Most of the time, executing the exact same SQL on the FE returns results very quickly. This suggests that the plan fragment distribution is not consistently slow, but rather fails due to sporadic RPC timeouts.
Impact: It causes occasional, unpredictable query failures. I am looking for help to identify if this is due to gRPC connection pooling issues or transient BE thread exhaustion in version 1.2.1.
Any guidance on troubleshooting or relevant parameters to tune would be greatly appreciated. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions