Improve TopologyService and HeartbeatService scalability for large clusters#17595
Draft
CRZbulabula wants to merge 6 commits intomasterfrom
Draft
Improve TopologyService and HeartbeatService scalability for large clusters#17595CRZbulabula wants to merge 6 commits intomasterfrom
CRZbulabula wants to merge 6 commits intomasterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17595 +/- ##
=========================================
Coverage 40.05% 40.05%
Complexity 2554 2554
=========================================
Files 5176 5176
Lines 348528 348594 +66
Branches 44558 44557 -1
=========================================
+ Hits 139595 139626 +31
- Misses 208933 208968 +35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
- Only run failure detector on current prober batch instead of all pairs - Push topology via dedicated PUSH_TOPOLOGY request type (not heartbeat) - Update lastPushedTopology only on successful push - Simplify onNodeStatisticsChanged to only clean up on removal/Removing - ClusterTopology returns full replicas when topology not yet probed - Add Javadoc to ClusterTopology
bfbda1a to
431e2b3
Compare
- Add TUpdateClusterTopologyReq struct and updateClusterTopology RPC to datanode.thrift - Implement updateClusterTopology handler in DataNodeInternalRPCServiceImpl - Change PUSH_TOPOLOGY action to call updateClusterTopology instead of getDataNodeHeartBeat - Use DataNodeTSStatusRPCHandler (default) instead of custom TopologyPushRPCHandler - Remove topology handling from heartbeat handler (getDataNodeHeartBeat) - Delete TopologyPushRPCHandler (no longer needed)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Summary
This PR improves the scalability of
TopologyService,HeartbeatService, and related components for clusters with a large number of DataNodes.TopologyService — probing scalability
ceil(√N)DataNodes as probers instead of all N, rotating across cycles for full coverage. Reduces per-cycle RPC fan-out from O(N) to O(√N) and total connection tests from O(N²) to O(N√N).LoadManager.getNodeStatus()to filter, replacing the manually maintainedstartingDataNodeslist.topology_probing_base_interval_in_ms(default 5000) andtopology_probing_timeout_ratio(default 0.5) replace hardcoded constants.TopologyService — topology distribution
updateClusterTopologyThrift RPC: AddedTUpdateClusterTopologyReqstruct andupdateClusterTopologyservice method todatanode.thrift, fully decoupled from the heartbeat interface.CnToDnInternalServiceAsyncRequestManagerwithPUSH_TOPOLOGYasync request type, callingupdateClusterTopologyon DataNodes.dataNodeslocation map, reducing push payload from O(N²) to O(N) per node.lastPushedTopologytracks what was last sent; only DataNodes whose reachable set has changed receive a push. Updated only on successful delivery (in response callback).HeartbeatService.genHeartbeatReq()no longer setstopologyordataNodesfields. Topology handling removed fromgetDataNodeHeartBeathandler on DataNode side.ClusterTopology (DataNode side)
myReachableNodes(this node's reachable set) instead of the full O(N²) topology map.isPartitionedis computed frommyReachableNodes.size() != dataNodes.size().myReachableNodesis empty),getValidatedReplicaSet,getReachableCandidates, andfilterReachableCandidatesreturn full replica sets instead of empty results.getReachableCandidates: Filters by this node's own reachable set instead of brute-force searching across all nodes' views. Complexity reduced from O(N×R) to O(R).Timeout protection and thread isolation
CountDownLatch.await:testAllDataNodeConnectionInHeartbeatChanneland all 4 service-type test methods (submitTestConnectionTask) now usesendAsyncRequestWithTimeoutInMs(timeout)instead of unboundedawait().submitInternalTestConnectionTaskoffloads blocking work toTOPOLOGY_PROBING_EXECUTOR(sizedmax(1, cores/4)) withFuture.get(timeout), keeping theDataNodeInternalRPCServicethread pool free forsendFragmentInstanceand other critical RPCs.HeartbeatService scalability
heartbeat_selector_num_of_client_managerconfig (default 0 = auto:max(1, cores/4)), up from the general default of 1.Bug fixes
LoadCache.updateTopology()andClusterTopology.updateTopology()had copy-paste bugs whereoriginReachableread fromlatestTopologyinstead of the old topology, making the diff log dead code.New configuration parameters
topology_probing_base_interval_in_mstopology_probing_timeout_ratioheartbeat_selector_num_of_client_managermax(1, cores/4)Test plan
[Topology]debug logsdnConnectionTimeoutInMS[Topology] DataNode X is now unreachable to myself(Y)log entries appear on the affected DataNode🤖 Generated with Claude Code