As we covered in our previous post ScaleIO can easily be configured to deliver 6-9’s of availability or higher using only 2 replicas that saves 33% of the cost compared to other solutions while providing very high performance. In this blog we will discuss the facts of availability using math and demystify the myth behinds ScaleIO’s high availability. For data loss or data unavailability to occur in a system with two replicas of data (such as ScaleIO) there must be two concurrent failures or a second failure must occur before the system recovers from a first failure. Therefore one of the following four scenarios must occur:
Let us choose two popular ScaleIO configurations and derive the availability of each.
Note: ScaleIO best practices recommend a maximum of 300 drives in a storage pool, therefore for the first configuration we will configure two storage pools with 240 drives in each pool. To calculate the availability of a ScaleIO system we will leverage a couple of well know academic publications:
We will adjust the formulas in the paper to the ScaleIO architecture and model the different failures. Two Drive FailuresWe will use the following formula to calculate the MTBF of ScaleIO system for a two drive failure scenario: Where:
Note: This formula assumes that two drives that fail in the same ScaleIO SDS (server) will not cause DU/DL as the ScaleIO architecture guarantees that replicas of the same data will NEVER reside on the same physical node. Let’s assume two scenarios – in the first scenario the rebuild process is constrained by network bandwidth – in the second scenario the rebuild process is constrained by drive performance bandwidth. Network BoundIn this case we assume that the rebuild time/performance is limited by the availability of network bandwidth. This will be the case if you deploy a dense configuration such as the DELL 740xd servers with a large number of SSDs in a single server. In this case, the MTTR function is: Where:
Plugging in the relevant values in the formula above, we get a MTTR of ~1.5 minutes for the 20 x R740, 24 SSDS @ 1.92TB w/ 4 X 10GbE network connections configuration (two storage pools w/ 240 drives per pool). The 20 x R640, 10SSDs @ 1.92TB w/ 2 X 25GbE network connections config provides MTTR of ~2 minutes. These MTTR values reflect the superiority of ScaleIO’s declustered RAID architecture that result in a very fast rebuild time. In a later post we will show how those MTTR values are critical and how they impact system availability and operational efficiency. SSD Drive BoundIn this case, the rebuild time/performance is bound by the number of SSD drives and the rebuild time is a function of the number of drives available in the system. This will be the case if you deploy less dense configurations such as the 1U Dell EMC PowerEdge R640 servers. In this case, the MTTR function is: Where:
System availability is calculated by dividing the time that the system is available and running, by the total time the system was running added to the restore time. For availability we will use the following formula: Where:
Note: the only purpose of RTO is to translate MTBF to availability. Node and Device FailureNext, let’s discuss the system’s MTBF when a node fails and followed by a drive failure, for this scenario we will be using the followed model: Where:
In a similar way, one can develop the formulas for other failure sequences such as a drive failure after a node failure and a second node failure after a first node failure. Network Bound Rebuild ProcessIn this case we assume that rebuild time/performance is constrained by network bandwidth. We will make similar assumptions as for drive failure. In this case, the MTTR function is: Where:
Plugging the relevant values in the formula above, we get a MTTR of ~30 minutes for the 20 x R740, 24 SSDS @ 1.92TB w/ 4 X 10GbE network connections configuration (two storage pools w/ 240 drives per pool). The 20 x R640, 10SSDs @ 1.92TB w/ 2 x 25GbE Network config provides MTRR of ~20 minutes. During system recovery ScaleIO rebuilt about 48TB of data for the first configuration and about 21TB for the second configuration. SSD Drive BoundIn this case we assume that the Rebuild time/performance is SSD drive bound and the rebuild time is a function of the number of drives available in the system. Using the same assumptions as for drive failures, the MTTR function is: Where:
Based on the provided formulas let’s calculate the availability of ScaleIO system based on the two different configurations: 20 x R740, 24 SSDS @ 1.92TB w/ 4 X 10GbE Network (Deploying 2 storage pools w/ 240 drives per pool)
20 x R640, 10SSDs @ 1.92TB w/ 2 x 25GbE:
Since these calculations are complex, ScaleIO provides its customers with FREE online tools to build HW configurations and obtain availability numbers that includes all possible failure scenarios. We advise customers to use this tool, rather than crunch complex mathematics, to build system configurations based on desired system availability targets. As you can see, yet again, we prove that the ScaleIO system easily exceeds 6-9’s of availability with just 2 replicas of the data. Unlike other vendors, neither extra additional data replicas nor erasure coding is required! So do you have to deploy three replica copies to achieve enterprise availability? No you do not! The myth is BUSTED. |
|||||||||||||||||||||||||||||||||||||
Source: Dell Blog
—