The OceanStor 9000 has incorporated many SMR-oriented optimizations, eliminating the technical barriers that hinder the application of SMR disks. In addition to adapting to new interfaces of SMR disks, OceanStor 9000 converts modifications into sequential writes using the Redirect on Write (RoW) technology. The efficient GC, separation of hot and cold data, and write cache optimization technologies ensure the read and write performance of SMR disks. The fast data reconstruction technology ensures the reliability of data during the failure time of large-capacity SMR disks. Next, let’s take a closer look at the I/O optimization technologies used by the OceanStor 9000 to support SMR disks.
1. RoW
RoW technical diagram
When data is being modified, the RoW technology allocates new space for newly written data, and redirects the pointer that points to the data to the new space, without changing the original data block. As shown in the preceding figure, when data blocks B and C of the file system are modified, the file system directly allocates two new spaces (B ‘and C’), and writes the modified content into the them. Then, the pointers that point to blocks B and C are redirected to B’ and C’. Blocks B and C are not modified.
Only some small changes need to be made to a traditional file system (for example, ext2/3/4) that supports direct modify for it to support RoW and adapt to the sequential I/O write model of SMR disks. The file system of OceanStor 9000 is designed based on the RoW technology and the allocation algorithm in SMR disks has been specifically optimized to adapt to the sequential write model of SMR disks.
2. Efficient GC
Principle of GC on an SMR disk
The continuous write area formed by continuous tracks on an SMR disk is called a zone. The SMR disk manages data by zone. The size of a zone is usually 256 MB. A zone can be re-written only after it is cleared. GC is a mandatory background task on SMR disks. GC is the process of replicating and migrating effective data. The efficiency of GC greatly affects the performance of the entire system. As shown in the above figure, to release the space occupied by invalid data in the Zone X, the valid data in Zone X needs to be migrated to an idle Zone Y before the invalid data in the Zone X is cleared.
OceanStor 9000 optimizes the file layout and designs an efficient GC algorithm, which carefully selects the size, location, distance to destination of GC objects, timing, frequency, and number of concurrences of GC to maximize the benefits of GC. This algorithm can effectively control the GC overhead, greatly improving GC efficiency. In addition, the reserved space for GC only accounts for 1% of the SMR disk space on OceanStor 9000, improving the space utilization.
3. Separation of Hot and Cold Data
Data on an SMR disk may be migrated many times in its life cycle due to GC. This phenomenon is called Write Amplification. One important reason of write amplification is that data with different life cycles (cold and hot) is stored in the same recycling unit, which contains both valid data and invalid data. The valid data needs to be migrated to other zones. Therefore, separating hot and cold data is critical for SMR disks, reducing the write amplification coefficient and improving the GC efficiency.
OceanStor 9000 separately stores the metadata and data of a file system. In a traditional log-structured file system, metadata and file data are stored together. Frequent modifications of metadata seriously affect the performance of the file system, increasing the write amplification coefficient. OceanStor 9000 uses a small Conventional Zone (usually occupies less than 1% of the total capacity) reserved for random write on SMR disks to store metadata, improving the metadata operation performance.
In addition, OceanStor 9000 can separately store hot and cold file data. Different types of file data may have different life cycles. Users can set the life cycle based on the file type and directory. OceanStor 9000 can store data of the same life cycle in the same zone. The frequencies of modifying different types of files are different. OceanStor 9000 can intelligently identify files that are frequently modified. During GC, these files and rarely-modified files are stored in different zones, further reducing the write amplification coefficient.
These technologies work together, enabling OceanStor 9000 to reduce the write amplification coefficient by more than 50% and keep the write amplification under good control.
4. Fast Reconstruction
The capacity of an SMR disk is several times larger than that of a traditional disk, so quickly reconstructing data of an SMR disk is a big challenge.
Quick data recovery after a disk failure
OceanStor 9000 supports distributed erasure code (EC) for data protection. Data is fragmented and distributed into different storage nodes and hard disks. As shown in the above figure, when Disk 2 on Node 3 is faulty, redundancy calculation is performed using data of other nodes, and data is concurrently restored to multiple different hard disks. Compared to a conventional RAID, the data recovery speed is improved by over 10 times.
5. Write Cache Optimization
An SMR disk usually has over 256 GB write cache. Enabling the write cache greatly improves the performance of a file system. However, the data that has not been flushed in the write cache may be lost if the power supply suddenly fails. OceanStor 9000 adopts the distributed EC mode to ensure fast data reconstruction after data on the cache is lost. The data transaction mechanism is adopted to ensure data consistency of the entire system. In addition, OceanStor 9000 properly handles the errors related to write cache in SMR disks, to deliver high performance and reliability.
6. Application Analysis
OceanStor 9000 with SMR disks is mainly used in video surveillance, archiving, and backup scenarios, which feature massive data, sequential write of large files, and write-once read-many. Compared to the mainstream solution using 8 TB disks, OceanStor 9000 with 14 TB SMR disks helps reducing equipment room footprint and power consumption by over 40%.
In the video surveillance project of the Qatar Hamad Airport, the storage system needed to store 13,000 channels of 2 Mbit/s videos for 30 days. The total available capacity is about 9.36 PB (with 10% reserved). If 8 TB disks were used, 45 4U storage nodes (36 disk slots per node) needed to be configured (12.9 PB capacity in total with redundancy being considered). The power consumption of each node is 500 W, so the total power consumption would be 22.5 KW. In comparison, only 26 OceanStor 9000 nodes (4U, 36 slots, 14 TB SMR disks) needed to be configured. The power consumption of a 14 TB SMR disk is almost the same as that of a traditional 8 TB or 10 TB disk. Therefore, the power consumption of a node is about 500 W, and the total power consumption would be about 13 KW. In general, the footprint is reduced by 44%, and the power consumption is reduced by 42%.
TCO comparison between different storage solutions for the Hamad Airport
Summary
OceanStor 9000 adopts a variety of new technologies, such as RoW, efficient GC, hot and cold data separation, fast reconstruction, and write cache optimization, eliminating the problems of using large-capacity SMR disks in enterprise storage, helping enterprises cope with data flood and perform digital transformation.
Huawei storage is dedicated to providing users with faster, better, and more cost-effective products and solutions through technical innovation and full-tack optimizations to enable business success of customers.
(Contributed by Cui Yuxiang)
The post Saving 40% Space and Power with Huawei’s New SMR Storage (2) appeared first on Huawei Enterprise Blog.
Source: Huawei Enterprise Blog
—