Skip to content

Improve TopologyService and HeartbeatService scalability for large clusters#17595

Draft
CRZbulabula wants to merge 6 commits intomasterfrom
upgrade-heartbeat-service
Draft

Improve TopologyService and HeartbeatService scalability for large clusters#17595
CRZbulabula wants to merge 6 commits intomasterfrom
upgrade-heartbeat-service

Conversation

@CRZbulabula
Copy link
Copy Markdown
Contributor

@CRZbulabula CRZbulabula commented May 5, 2026

Summary

This PR improves the scalability of TopologyService, HeartbeatService, and related components for clusters with a large number of DataNodes.

TopologyService — probing scalability

  • √N sampling with batch rotation: Each probing cycle selects only ceil(√N) DataNodes as probers instead of all N, rotating across cycles for full coverage. Reduces per-cycle RPC fan-out from O(N) to O(√N) and total connection tests from O(N²) to O(N√N).
  • Batch-scoped failure detection: The failure detector runs only on heartbeat pairs from the current prober batch. Non-prober pairs carry forward their previous reachable state from heartbeat history.
  • Only probe Running DataNodes: Uses LoadManager.getNodeStatus() to filter, replacing the manually maintained startingDataNodes list.
  • Configurable interval and timeout: topology_probing_base_interval_in_ms (default 5000) and topology_probing_timeout_ratio (default 0.5) replace hardcoded constants.

TopologyService — topology distribution

  • New updateClusterTopology Thrift RPC: Added TUpdateClusterTopologyReq struct and updateClusterTopology service method to datanode.thrift, fully decoupled from the heartbeat interface.
  • Dedicated PUSH_TOPOLOGY request type: Topology updates are pushed via CnToDnInternalServiceAsyncRequestManager with PUSH_TOPOLOGY async request type, calling updateClusterTopology on DataNodes.
  • Per-DataNode topology: Each DataNode receives only its own reachable set plus the full dataNodes location map, reducing push payload from O(N²) to O(N) per node.
  • Incremental push: lastPushedTopology tracks what was last sent; only DataNodes whose reachable set has changed receive a push. Updated only on successful delivery (in response callback).
  • Removed topology from heartbeat: HeartbeatService.genHeartbeatReq() no longer sets topology or dataNodes fields. Topology handling removed from getDataNodeHeartBeat handler on DataNode side.

ClusterTopology (DataNode side)

  • Simplified to per-node view: Stores only myReachableNodes (this node's reachable set) instead of the full O(N²) topology map. isPartitioned is computed from myReachableNodes.size() != dataNodes.size().
  • Graceful handling of unknown topology: When TopologyService hasn't probed yet (myReachableNodes is empty), getValidatedReplicaSet, getReachableCandidates, and filterReachableCandidates return full replica sets instead of empty results.
  • Simplified getReachableCandidates: Filters by this node's own reachable set instead of brute-force searching across all nodes' views. Complexity reduced from O(N×R) to O(R).

Timeout protection and thread isolation

  • Bounded CountDownLatch.await: testAllDataNodeConnectionInHeartbeatChannel and all 4 service-type test methods (submitTestConnectionTask) now use sendAsyncRequestWithTimeoutInMs(timeout) instead of unbounded await().
  • Dedicated probing thread pool: submitInternalTestConnectionTask offloads blocking work to TOPOLOGY_PROBING_EXECUTOR (sized max(1, cores/4)) with Future.get(timeout), keeping the DataNodeInternalRPCService thread pool free for sendFragmentInstance and other critical RPCs.

HeartbeatService scalability

  • Increased selector threads: Heartbeat client pools use a dedicated heartbeat_selector_num_of_client_manager config (default 0 = auto: max(1, cores/4)), up from the general default of 1.

Bug fixes

  • Topology diff logging: LoadCache.updateTopology() and ClusterTopology.updateTopology() had copy-paste bugs where originReachable read from latestTopology instead of the old topology, making the diff log dead code.

New configuration parameters

Parameter Default Description
topology_probing_base_interval_in_ms 5000 Base interval for topology probing
topology_probing_timeout_ratio 0.5 Ratio of timeout to interval
heartbeat_selector_num_of_client_manager 0 (auto) Selector threads for heartbeat client pools. 0 = max(1, cores/4)

Test plan

  • Deploy 3-node cluster, confirm topology probing selects √3 ≈ 2 probers per cycle via [Topology] debug logs
  • Scale to 9+ DataNodes, confirm prober rotation covers all nodes over √N cycles
  • Stop one DataNode, confirm other DataNodes' internal RPC threads are not blocked beyond dnConnectionTimeoutInMS
  • Trigger a network partition, confirm [Topology] DataNode X is now unreachable to myself(Y) log entries appear on the affected DataNode
  • Verify queries work correctly before TopologyService has completed its first probing cycle (empty topology → full replicas returned)
  • Run existing TopologyService and HeartbeatService integration tests

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 17.73050% with 116 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.05%. Comparing base (f72b3ee) to head (b0e2df7).

Files with missing lines Patch % Lines
...nfignode/manager/load/service/TopologyService.java 7.27% 51 Missing ⚠️
...ol/thrift/impl/DataNodeInternalRPCServiceImpl.java 17.24% 24 Missing ⚠️
...che/iotdb/db/queryengine/plan/ClusterTopology.java 4.16% 23 Missing ⚠️
...he/iotdb/confignode/conf/ConfigNodeDescriptor.java 0.00% 8 Missing ⚠️
...apache/iotdb/confignode/conf/ConfigNodeConfig.java 25.00% 6 Missing ⚠️
...apache/iotdb/commons/client/ClientPoolFactory.java 50.00% 2 Missing ⚠️
...iotdb/confignode/manager/load/cache/LoadCache.java 0.00% 1 Missing ⚠️
...otdb/db/pipe/agent/task/PipeDataNodeTaskAgent.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #17595   +/-   ##
=========================================
  Coverage     40.05%   40.05%           
  Complexity     2554     2554           
=========================================
  Files          5176     5176           
  Lines        348528   348594   +66     
  Branches      44558    44557    -1     
=========================================
+ Hits         139595   139626   +31     
- Misses       208933   208968   +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 5, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

- Only run failure detector on current prober batch instead of all pairs
- Push topology via dedicated PUSH_TOPOLOGY request type (not heartbeat)
- Update lastPushedTopology only on successful push
- Simplify onNodeStatisticsChanged to only clean up on removal/Removing
- ClusterTopology returns full replicas when topology not yet probed
- Add Javadoc to ClusterTopology
@CRZbulabula CRZbulabula force-pushed the upgrade-heartbeat-service branch from bfbda1a to 431e2b3 Compare May 6, 2026 06:35
- Add TUpdateClusterTopologyReq struct and updateClusterTopology RPC to datanode.thrift
- Implement updateClusterTopology handler in DataNodeInternalRPCServiceImpl
- Change PUSH_TOPOLOGY action to call updateClusterTopology instead of getDataNodeHeartBeat
- Use DataNodeTSStatusRPCHandler (default) instead of custom TopologyPushRPCHandler
- Remove topology handling from heartbeat handler (getDataNodeHeartBeat)
- Delete TopologyPushRPCHandler (no longer needed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant