Skip to content

Clean up dead RPC thread config and use node-specific selectorNum#17551

Open
JackieTien97 wants to merge 10 commits intomasterfrom
worktree-rpc-thread-opt
Open

Clean up dead RPC thread config and use node-specific selectorNum#17551
JackieTien97 wants to merge 10 commits intomasterfrom
worktree-rpc-thread-opt

Conversation

@JackieTien97
Copy link
Copy Markdown
Contributor

@JackieTien97 JackieTien97 commented Apr 27, 2026

Summary

This PR cleans up dead/misconfigured RPC thread parameters, ensures ConfigNode and DataNode each use their own selectorNumOfClientManager value when constructing async client pools, and adds maxIdleClientNumForEachNode support for client pool eviction control.

1. Fix config naming mismatch

  • Rename dn_selector_thread_count_of_client_managerdn_selector_thread_nums_of_client_manager in both the properties template and IoTDBDescriptor to match the name already used in CommonDescriptor.

2. Remove dead config parameters

  • dn_rpc_selector_thread_count: Parsed and stored in IoTDBConfig.rpcSelectorThreadCount but never actually consumed by any server component. Removed the field, getter, setter, template entry, and loading code in IoTDBDescriptor.
  • coordinator_write_executor_size: Created a writeOperationExecutor thread pool in Coordinator that was passed to TreeModelPlanner but never referenced in any of its methods — completely dead code. Removed the config field, the thread pool, the TreeModelPlanner constructor parameter, and the MPP_COORDINATOR_WRITE_EXECUTOR thread name.

3. Fix ClientPoolProperty.Builder hardcoded default

  • The Builder.maxClientNumForEachNode was initialized to the compile-time constant DefaultProperty.MAX_CLIENT_NUM_FOR_EACH_NODE (1000), ignoring any runtime config override from dn_max_client_count_for_each_node_in_client_manager or cn_max_client_count_for_each_node_in_client_manager. Changed to read from CommonDescriptor.getInstance().getConfig().getMaxClientNumForEachNode() so callers that don't explicitly call setMaxClientNumForEachNode() still respect the configured value.

4. Use node-specific selectorNum for async client pools

Previously, AsyncDataNodeInternalServiceClientPoolFactory always read selectorNumOfClientManager from CommonConfig, which is shared. ConfigNode and DataNode may configure different values via cn_selector_thread_nums_of_client_manager and dn_selector_thread_nums_of_client_manager respectively.

Changes:

  • AsyncDataNodeInternalServiceClientPoolFactory: Removed the no-arg constructor; the only constructor now requires an explicit selectorNumOfAsyncClientManager int.
  • Coordinator: Passes IoTDBConfig.getSelectorNumOfClientManager() when constructing the factory.
  • DataNodeInternalServiceRequestManager: Added a final int selectorNumOfAsyncClientManager field and a protected constructor so each subclass supplies its own value.
  • DnToDnInternalServiceAsyncRequestManager: Reads from IoTDBConfig.getSelectorNumOfClientManager().
  • CnToDnInternalServiceAsyncRequestManager: Reads from ConfigNodeConfig.getSelectorNumOfClientManager().
  • ConfigNodeConfig: Added selectorNumOfClientManager field with getter/setter.
  • ConfigNodeDescriptor: Loads cn_selector_thread_nums_of_client_manager into ConfigNodeConfig.

5. Auto-compute selector threads and add idle client eviction control

  • selector_thread_nums default changed to 0 (auto-compute): Both cn_selector_thread_nums_of_client_manager and dn_selector_thread_nums_of_client_manager now default to 0 in the properties template. When <= 0, the system auto-computes the value as max(1, CPU core number / 4). The descriptor files (CommonDescriptor, ConfigNodeDescriptor, IoTDBDescriptor) only update the config when the parsed value is > 0.
  • ThriftClientProperty default updated: Changed from hardcoded 1 to max(1, availableProcessors() / 4) to match the auto-compute logic.
  • ConfigNodeConfig / IoTDBConfig defaults updated: Both now use max(1, availableProcessors() / 4) as the default selectorNumOfClientManager.
  • maxIdleClientNumForEachNode support added: New config parameter cn_max_idle_client_count_for_each_node_in_client_manager / dn_max_idle_client_count_for_each_node_in_client_manager (default 0, meaning no idle eviction limit). Added the field and getter/setter to CommonConfig, ConfigNodeConfig, and IoTDBConfig. ClientPoolProperty.Builder now reads this value from CommonDescriptor and uses it to set maxIdlePerKey (previously hardcoded to maxClientNumForEachNode). Negative values in the properties file are ignored.

Test plan

  • Verify the cluster starts successfully with default config (1C1D)
  • Verify custom dn_selector_thread_nums_of_client_manager / cn_selector_thread_nums_of_client_manager values take effect (only when > 0)
  • Verify dn_max_client_count_for_each_node_in_client_manager is respected by client pools that don't explicitly set it
  • Verify dn_max_idle_client_count_for_each_node_in_client_manager / cn_max_idle_client_count_for_each_node_in_client_manager controls idle client eviction (0 = no limit, negative values ignored)
  • Verify selector thread count defaults to max(1, CPU/4) when set to 0

1. Fix dn_selector_thread_count_of_client_manager naming mismatch:
   rename to dn_selector_thread_nums_of_client_manager to match
   CommonDescriptor
2. Remove unused dn_rpc_selector_thread_count config and its
   backing field rpcSelectorThreadCount in IoTDBConfig
3. Remove unused coordinator_write_executor_size config and the
   dead writeOperationExecutor in Coordinator/TreeModelPlanner
4. Fix ClientPoolProperty.Builder to read maxClientNumForEachNode
   from CommonConfig instead of hardcoded constant
…nt pool

Add a parameterized constructor to AsyncDataNodeInternalServiceClientPoolFactory
so that callers can override the selector thread count. The no-arg constructor
still defaults to CommonConfig.getSelectorNumOfClientManager(). Coordinator now
passes IoTDBConfig.getSelectorNumOfClientManager() to use the DataNode-specific
value.
Add selectorNumOfAsyncClientManager field to DataNodeInternalServiceRequestManager
so each subclass can supply the correct selector thread count:
- CnToDnInternalServiceAsyncRequestManager reads from ConfigNodeConfig
- DnToDnInternalServiceAsyncRequestManager reads from IoTDBConfig

Also add selectorNumOfClientManager to ConfigNodeConfig and load
cn_selector_thread_nums_of_client_manager in ConfigNodeDescriptor.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 49.63504% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.21%. Comparing base (f72b3ee) to head (6eecd1a).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...apache/iotdb/commons/client/ClientPoolFactory.java 14.28% 24 Missing ⚠️
...he/iotdb/confignode/conf/ConfigNodeDescriptor.java 0.00% 12 Missing ⚠️
...ion/config/executor/ClusterConfigTaskExecutor.java 0.00% 8 Missing ⚠️
...apache/iotdb/confignode/conf/ConfigNodeConfig.java 54.54% 5 Missing ⚠️
...sync/CnToCnInternalServiceAsyncRequestManager.java 0.00% 3 Missing ⚠️
...t/cn/DnToCnInternalServiceAsyncRequestManager.java 0.00% 2 Missing ⚠️
...dn/DataNodeExternalServiceAsyncRequestManager.java 0.00% 2 Missing ⚠️
...tocol/client/dn/DataNodeIntraHeartbeatManager.java 0.00% 2 Missing ⚠️
...ient/dn/DataNodeMPPServiceAsyncRequestManager.java 0.00% 2 Missing ⚠️
...t/dn/DnToDnInternalServiceAsyncRequestManager.java 0.00% 2 Missing ⚠️
... and 5 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17551      +/-   ##
============================================
+ Coverage     40.05%   40.21%   +0.15%     
  Complexity     2554     2554              
============================================
  Files          5176     5177       +1     
  Lines        348528   348712     +184     
  Branches      44558    44599      +41     
============================================
+ Hits         139595   140225     +630     
+ Misses       208933   208487     -446     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…idate idle client count

- selector_thread_nums_of_client_manager defaults to 0 (auto: max(1, CPU/4))
- max_idle_client_count_for_each_node_in_client_manager ignores negative values
- ThriftClientProperty default changed from hardcoded 1 to CPU/4
- Added maxIdleClientNumForEachNode support in ClientPoolProperty
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 6, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
5.2% Duplication on New Code (required ≤ 5%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant