Huawei’s FusionInsight HD platform was tested by the China Academy of Information and Communications Technology (CAICT), which has been entrusted by the Data Center Alliance, between December 25 and December 29, 2017. CAICT tested FusionInsight’s functionality, O&M, multi-tenancy, availability, security, compatibility, scalability, and ease of use while hosting 5001 nodes, based on the Methods for Testing the Hadoop Platform’s Basic Capabilities 2.0. Huawei’s FusionInsight successfully executed all of its test cases and became the first Big Data platform to pass this test.
Huawei encountered many technical challenges in developing a platform that could accommodate ultra-large clusters, including limitations on equipment room capacities, network switching layers, cluster management, and file system capabilities. The FusionInsight HD platform addresses these challenges in the following ways:
I. Superior Scheduler (advantages for large cluster management):
The open source community uses heartbeats to determine scheduling periods, but this creates system scalability and scheduling performance bottlenecks as a cluster grows. FusionInsight uses its Superior Scheduler to remove these bottlenecks by leveraging dedicated scheduling threads to separate scheduling jobs from heartbeats. After receiving the heartbeat information reported by each NodeManager, the scheduler stores resource information in memory to control the cluster’s overall resource usage. Superior Scheduler uses push scheduling for increased accuracy and efficiency, which greatly improves resource utilization in large clusters. It delivers excellent performance even when the interval between NodeManager heartbeats is long, effectively preventing heartbeat storms in large clusters. Superior Scheduler matches jobs with resources to provide each scheduled job with a global resource view and to increase scheduling accuracy. In comparison to the open-source scheduler, Superior Scheduler excels at system throughput, resource usage, and data affinity.
II. Dynamic heartbeat mechanism
NodeManager and ApplicationMaster regularly report periodic YARN heartbeats to ResourceManager, which can be used in most scenarios. However, if a NodeManager has more than 1000 nodes, for example, the overloaded ResourceManager will not be able to process this heartbeat information in time, resulting in heartbeat congestion. If you extend the heartbeat report duration, scheduling performance will deteriorate, and if tasks must wait a long time to obtain cluster resources, the cluster’s computing resource usage will be low. FusionInsight HD solves the heartbeat congestion problem by using a dynamic heartbeat mechanism (Throttle Heartbeat) to help ResourceManager determine the report duration of the next heartbeat based on the current load. Additionally, NodeManager and ApplicationMaster can trigger event-based heartbeats if an emergency occurs, significantly improving a cluster’s resource utilization.
III. Powerful management capabilities
- Provides cluster installation and deployment tools and supports template-based installation, allowing large numbers of nodes to be deployed quickly.
- Provides best-in-class fault tolerance for large clusters, allows all maintenance operations to be re-entered, supports fault tolerance based on the quantity of instances during installation and deployment, and isolates hosts to prevent clusters from shutting down due to the failure of a single host.
- Establishes large cluster environments for heterogeneous hardware and supports instance group management, enabling hosts with different hardware specifications to have different configurations.
- Provides clusters with elastic scalability and can configure clusters of different sizes with different system configurations, enhancing system resource utilization.
- Provides rapid cluster recovery capabilities and supports host reinstallation, cluster recovery, and IP address changes.
- Processes large amounts of surveillance data in large clusters.
These key technologies enable Huawei’s FusionInsight Big Data platform to host ultra-large clusters with more than 5000 nodes to provide customers with large amounts of storage space, extensive data sharing capabilities, robust scalability, and high availability.
The post Huawei’s FusionInsight Becomes First 5000+ Node Cluster Big Data Platform appeared first on Huawei Enterprise Blog.
Source: Huawei Enterprise Blog
—