By Qin Xuan from Huawei
In recent years, flash has become a hot topic. All-flash arrays are critical for enterprises to cope with explosive data growth and to accelerate mission-critical applications. According to data from IT Brand Pulse, the market share of enterprise-level HDDs has been declining since 2015. In 2017, the market share of SSDs exceeded that of traditional HDDs. It is expected that all-flash arrays will soon become mainstream data center storage and be used for key enterprise business.
Today, let’s talk about the performance of all-flash arrays. We all know that the all-flash arrays have high performance, and we often see storage vendors painstakingly promoting all-flash performance via mainstream media in the market. All-flash arrays can offer a performance ranging from hundreds of thousands of IOPS to millions of IOPS or even higher. Therefore, we must ask four questions to learn more about the performance of all-flash arrays.
What IOPS and latency can all-flash arrays provide?
Vendors must provide their IOPS and latency during promotion.
Generally, random read/write performance of an enterprise-class 10,000 rpm SAS disk offers about 200 to 300 IOPS. In comparison, the performance of a single SSD can reach thousands or even tens of thousands of IOPS, depending on the SSD type (SLC, MLC, or TLC). When the performance of an all-flash array exceeds a certain threshold, the latency increases accordingly. Therefore, when vendors claim that they can provide very high levels of performance, we must ask one question: how long is the latency? 0.5 ms? 1 ms? or 5 ms?
Huawei OceanStor Dorado V3 excels in both performance and latency. The following figure shows the performance data of OceanStor Dorado6000 V3, which was tested by the Storage Performance Council. The average latency is just 0.38 milliseconds, and when performance reaches the maximum IOPS of 1 million, its latency is less than 0.5 ms.
In addition, we must check whether the IOPS is stable or maximum performance.
What are application scenarios and workloads?
The performance of all-flash arrays varies with application scenarios, application models, and workloads. Workloads we are discussing here include:
- I/O size (8 KB, 32 KB, 64 KB, or 128 KB)
- Ratio of reads to writes (that is, the ratio of read I/Os or write I/Os in a host I/O request)
- Sequence and randomness of I/Os
- Other factors (such as cache hit ratio)
In this case, any change to a parameter (for example, 8 KB I/Os and 64 KB I/Os, 100% random reads/writes and sequential reads/writes, 7:3 mixed reads/writes, and 100% reads) can greatly affect performance. Therefore, when seeing very high levels of performance, you must ask the vendor what the system workloads like in order to achieve this performance.
In most real-world production environments, database OLTP is a typical application scenario. Therefore, when evaluating the performance of all-flash arrays, the commonly used workload models are 8 KB I/Os, 100% random I/Os, and read/write ratio 7: 3.
What is the performance of all-flash arrays when value-added functions such as deduplication and compression are enabled?
For all-flash arrays, inline deduplication and compression not only improve the service life of SSDs, but also help you reduce space and power consumption, further slashing costs. IDC reports show that data reduction functions such as inline deduplication and compression are key factors and technologies for reducing the costs of all-flash arrays.
To keep up with this trend, most traditional storage vendors have launched all-flash array models or versions. However, due to inherent weakness in the architecture, even though deduplication or compression is supported, all-flash array performance is seriously affected when deduplication, compression, or snapshots are enabled.
Take vendor H’s all-flash product as an example. The all-flash array is configured with two controllers, 48 x 480 GB SSDs, and RAID 10 groups, allowing the array to provide about 150,000 IOPS (workloads: random 8 KB I/Os, read/write ratio 7:3, and 1 ms latency). When RAID 5 groups are used, the array can only offer about 130,000 IOPS. However, after enabling the deduplication function, performance is halved, providing only 60,000 to 70,000 IOPS. If other value-added features (such as snapshots, replication, or active-active) are enabled, performance will see further massive decreases.
Therefore, we should focus on maintaining excellent performance while value-added functions are enabled. This is much closer to real-life application environments.
What is the long-term steady performance when capacity usage reaches 80%?
For example, at a customer’s POC site, vendor E’s all-flash arrays showed outstanding performance to begin with. However, as the amount of test data increased, performance deteriorated. The root cause of this was that the garbage collection mechanism started running in the background to handle increasing data volumes. The mechanism’s running mode and efficiency have a major impact on all-flash array performance.
A simple way to establish or simulate a real-life application scenario is to embed around 80% of data (or perform random overwrites for one time) before starting an official performance test. By doing this, you can easily test the long-term steady performance of all-flash arrays.
The OceanStor Dorado V3 uses the global garbage collection mechanism, which is triggered as a background task when the ratio of idle blocks reaches a specified threshold. The OceanStor Dorado V3 uses system-level end-to-end I/O priority to ensure that I/Os generated by garbage collection have the lowest possible impact on host I/O performance, thereby ensuring the stable latency of host I/Os.
- On-demand start and stop of garbage collection: Controls matching between garbage collection bandwidth and front-end write I/O bandwidth, migrates data on demand, and reduces write amplification, minimizing the impact of garbage collection on host performance.
- Optimized cost-benefit policy used to select target data for garbage collection: Separately stores cold and hot data (separation of metadata and data, frequently accessed data blocks and seldom accessed data blocks), moves invalid data to idle blocks based on the cost of releasing space, and batch reclaims data written at the same point in time. This reduces unnecessary data migrations and minimizes the impact of garbage collection on application performance.
The following figure shows a performance comparison between Huawei OceanStor Dorado5000 V3 and other mainstream all-flash arrays when under strict test conditions (two controllers, random 8 KB I/Os, read/write ratio 7:3, 80% occupation rate, inline deduplication and compression enabled, and 0.5 ms latency). We can see that Huawei’s product provides superior performance compared to the other storage arrays.
Summary
Equipped with native, flash-architecture design capabilities, Huawei is the only vendor that develops SSD chips, SSDs, and all-flash systems. Huawei enables disk and system collaboration to make the most of all-flash features, and Huawei all-flash arrays deliver the highest performance and reliability in the industry by using operating systems specially designed for all-flash arrays, unique FlashLink technology, and dedicated performance acceleration chips.
We hope that this article helps you to look at dazzling promotions and understand the true performance capabilities of all-flash arrays.
Huawei has demonstrated its amazing performance in multiple all-flash competition tests. Thanks to its strong product competitiveness, Huawei’s all-flash arrays are growing rapidly. In the first three quarters of 2017, Huawei’s all-flash storage revenue enjoyed the fastest growth speed in the world, and started to serve many famous enterprises worldwide such as Vodafone, Saudi Arabia Aramco, Russian Post, Brazil’s Caixa, and China Pacific Insurance.
The post Four Questions to Understand All-Flash Performance appeared first on Huawei Enterprise Blog.
Source: Huawei Enterprise Blog
—