Transcript
+86 10 Room 02/03/05/06 F12 Tower D, Global Trade Center No.36 North Third Ring Road Dongcheng District Beijing 100013 People's Republic of China 5889 1666
WHITE P APER Enterprise Storage System: Secure and Trusted Sponsored by: Huawei William Zhang July 2013
IDC OPINION As the IT world is transforming from being "information-centered" to "data-centered", more and more enterprises realize the value of data. Corruption or loss of data may cause enormous damage to enterprises and even bankruptcy in serious cases. Therefore, enterprises prioritize system reliability when choosing a data storage system, and pose stricter requirements for the reliability of a storage system especially for key applications. For this reason, they often select high-end storage systems for core applications. According to IDC’s statistics, the growth rate of disk storage system worth more than US$100,000 in China was up to 43.3% in 2012, twice the average growth rate (19.8%) of the whole disk storage market. Selecting a storage system of high reliability is very important. IDC recommends that IT organizations purchase high-end storage systems meeting the following requirements: Hardware architecture reliability of the storage system: The storage system employs full redundant architecture and redundant design for its key components, such as the controller, power module, fan, and network to ensure that the storage system does not fail upon a single point of failure. Data storage reliability of the storage system: The storage system must be able to provide end-to-end protection for data. That is, the system must provide a full range of data protection measures including fault self-detection, fault prehandling, quick fault rectification, and post-rectification data integrity check to protect data storage reliability. Storage service reliability of the storage system The storage system is able to bear the load of multiple key applications, to have service priority monitoring and management features, as well as advanced features, such as snapshot, mirroring, remote replication, and other solutions ensuring service continuity.
ABOUT THIS REPORT In this report, we explore the secure and trusted architecture of a high-end storage product — the HUAWEI OceanStor 18000 Series Enterprise Storage System(in short of HUAWEI 18000 series) — and further look into the overall design, system architecture, data storage, and service operation protection of HUAWEI 18000 series and how it fully satisfies user demand of secure and trusted and maximizes its value for users.
Market Overview According to IDC’s statistics, the size of the storage market in China based on revenue reached US$1.4761 billion in 2012, 19.8% up from 2011. The market size of storage systems worth above US$100,000 was US$421.9 million, accounting for 28.6% of total market share. The CAGR(Compound Annual Growth Rate) was up to 43.3%.
FIGURE 1 China disk storage market overview , 2012
Source: IDC, 2012
2
#CN12016V
©2013 IDC China
FIGURE 2 China disk storage market overview by price band, 2012
Source: IDC, 2012
HUAWEI OCE ANSTOR 18000 SERIES ENTERPRISE STORAGE S YSTEM SOLUTIONS Secure and Trusted System Architecture Fully Redundant System Architecture Design HUAWEI OceanStor 18000 Series Enterprise Storage System employs a fully redundant design. Specifically, the storage system has redundant components including power modules, fans, and controllers, redundant network planes, and fully redundant cross-bay service switching fabrics. This redundancy design ensures normal storage system operation when a single point of failure occurs on any component, module, or device in the storage system.
Smart Matrix Architecture HUAWEI 18000 series employs the Smart Matrix architecture, as shown in Figure 3. This Smart Matrix is a system architecture of full redundancy and full switching based on PCI-E 2.0, which takes the mutually redundant dual plane PCI-E high-speed switch modules as the core of data storage/switching. Failure of any PCI-E highspeed switch module does not affect data reading and writing, ensuring service continuity. PCI-E high-speed matrix switch modules are highly advantageous in implementing data switching. They feature significantly reduced latency in protocol conversion, unobstructed interconnection of resources, and more efficient data switching. This system can offer up to 192 GB/s bandwidth only 300 ns latency, which is only one-fifth of the latency of other similar products under the same conditions. HUAWEI 18000 series supports controller scale-out to 16 controllers. HUAWEI 18000 series supports global cache, which is a unique feature of the Smart Matrix architecture design. On HUAWEI 18000 series, host volume is partitioned into several LUNs belonging to different controllers. Each controller has its own cache that
©2013 IDC China
#CN12016V
3
is used to speed up data reads and writes, and at the same time all controllers share their caches to form a global cache. In this way, all caches of the system can contribute to the speeding-up of the same host volume, enhancing the hit rate of data cache in an all-round way and minimizing serialization. Moreover, the system acquires the maximum speed up, realizes quicker response of storage to service needs, and significantly enhances the data processing speed. Meanwhile, the global cache protects service operation from the problems in traditional storage, such as cache exclusively occupied by a certain application and competition for cache resources. The global cache makes HUAWEI 18000 series more secure and trusted.
FIGURE 3 Smart Matrix Architecture
Source: Huawei, 2013
Certification of Seismic Fortification Intensity 9 With regard to overall design, Huawei’s storage has passed the certification of seismic fortification intensity 9 held by the Communication Equipment Seismic Fortification Performance Quality Supervision and Inspection Center, Ministry of Information Industry, making Huawei the only certified professional storage manufacturer in China in this regard. With the certified overall design, HUAWEI 18000 series ensures no loss of data during severe earthquakes and an effective defense against more than 90% of earthquakes in the next 50 years.
4
#CN12016V
©2013 IDC China
Secure and Trusted Data Storage Extreme Virtual Engine (XVE) Storage Operating System HUAWEI 18000 series employs the XVE (extreme virtual engine) storage operating system, whose core concept is full virtualization, including fully virtualized kernel, fully virtualized RAID (RAID 2.0+), and fully virtualized resource pool. The XVE storage operating system utilizes the virtualized underlying storage to achieve balanced allocation of resources, to avoid a bottleneck, and to achieve more stable business operation. In the meantime, with the Smart series, the Hyper series, and the virtual volume management software for the virtualized resource pool, HUAWEI 18000 series supports advanced data protection and better support the disaster recovery of services. The fully virtualized design of the XVE operating system realizes more stable service operation, making HUAWEI 18000 series more secure and trusted.
RAID 2.0+: Making Data Reconstructing 20 Times Faster Huawei’s innovative RAID 2.0+ technology enables fully virtualized storage management, and puts forward the idea of two-layer virtualization. The underlying virtualization achieves disk reads and writes and protection for basic data. The upperlayer virtualization provides consolidated storage resource pools, smart resource scheduling, and advanced data protection. As shown in Figure 4, for underlying storage medium, HUAWEI 18000 series implements disk virtualization. Physical disks in the system form three storage tiers according to different performance characteristic. An external storage connected to the system in any other way is recognized as a large pool by related devices. Each disk in the system is divided into 64-MB chunks. Chunks from different hard disks are combined into chunk group (CKG) based on a RAID level. A CKG is divided into extents. According to needs, one to N pieces of extents form a volume or a file. In RAID 2.0+ mechanism, a RAID group is not made of fixed disks. Instead, the physical disks in the system are virtualized into many logical chunks, and then those chunks form a RAID group based on a certain algorithm. When one chunk is faulty, only data on that chunk is reconstructed, so the reconstruction time is within 3 seconds. When one physical disk is faulty, only chunks that contain data are reconstructed. In addition, more disks participate in the reconstruction. In this way, the RAID 2.0+ accelerates the data reconstruction speed by 20 times, reduces the reconstruction time of one TB of data to within 30 minutes. Because data reconstruction is performed in a distributed manner, reducing the pressure on each disk, and the adverse impact on the system is minimized.
©2013 IDC China
#CN12016V
5
FIGURE 4 Logical structure of RAID 2.0+ two -layer virtualization: 20 times faster data reconstruction
Source: Huawei, 2013
As shown in Figure 5, 64 x 2 TB nearline SAS disks form a storage pool in the RAID 2.0+ reconstruction time test. 50 x 1 TB LUNs are created in the storage pool. These LUNs are formatted, and data has been written to them. After one disk is removed, the 18000 series starts reconstruction and data is written to the hot spare space of the storage pool.
FIGURE 5 RAID 2.0+ data reconstruction test result
Source: Huawei, 2013
6
#CN12016V
©2013 IDC China
It can be inferred from the log that the data reconstruction takes 22 minutes and 15 seconds. The average reconstruction speed is 1229 MB/s. Based on the speed, it can be inferred that reconstruction of 1 TB data takes about 15 minutes. The RAID 2.0+ technology also virtualizes upper-layer resources. The upper layer recognizes the underlying storage resources as a big resource pool. Volumes and files provided for host operating systems are created in the pool. Extent is logical addresses, and basic units for storage resource mapping to hosts. That is, resources in the storage resource pool are mapped to hosts by extent. Extent makes storage resource allocation flexible and enables users to adjust the allocation dynamically, freeing storage resource allocation from the restriction of the number of disks in a RAID group. On this basis, the storage system uses HUAWEI Smart series data storage management software to deliver efficient and flexible data management, intelligent data flow inside the storage resource pool, automatic load balancing, maximized disk utilization, maximized capacity utilization, and improved storage management efficiency. The Smart series software include SmartThin (thin provisioning), SmartTier (intelligent storage tiering), SmartMotion (quick data migration), SmartVirtualization (storage virtualization for heterogeneous environments), SmartQoS (QoS control), and SmartPartition (cache partitioning). When planning a storage system deployment, an administrator only needs to calculate the total storage capacity and performance required by the current applications and to adjust the actual capacity and performance values higher to allow for future expansion. When configuring the storage system, the administrator only needs to do simple resource allocation. The storage system will automatically adjust the quantity of extents for each application based on the actual capacity and performance requirements of the applications. When the storage resources in the storage system become insufficient because of application system expansion, the storage system reminds the administrator of adding storage resources. To add storage resources, the administrator only needs to insert disks and add the disks to corresponding disk pools. Then the storage system automatically adjusts distribution of extents in the background to achieve a global balance.
Data Self-Check and Self-Recovery HUAWEI 18000 series offers end-to-end data protection measures including fault self-check, fault pre-handling, quick fault rectification, and post-rectification data integrity check to ensure data reliability. The storage system periodically collects disk information, performs disk health analysis (DHA) based on the disk run time, internal error count, and disk I/O module, and generates a document containing a score. IT maintenance personnel can determine whether the disk is in a normal state based on the score. For example, if the score for a disk ranges from 60 to 100, the disk is in a normal state; if the score is lower than 60, the disk is in an abnormal state and needs to be replaced. For a disk in an abnormal state, the storage system starts data selfrecovery. If there is a bad sector, the storage system automatically starts bad sector repair. If there is a slow disk or a disk about to fail, the storage system starts data precopy to transfer data on the disk to a healthy disk. When the pre-copy is complete, the storage system automatically checks whether the data copy is consistent with the source data. With the data self-check and self-recovery technology, HUAWEI 18000 series can predict faults and take data protection measures in advance. This effectively prevents faults and improves data security and reliability.
©2013 IDC China
#CN12016V
7
Data Integrity Check Besides data self-check, data self-recovery, and RAID 2.0+ quick data reconstruction, HUAWEI 18000 series also provides a data integrity check mechanism, which employs the Protection Information (PI) technology in compliance with the T10 standard to protect data integrity. With the PI technology, the storage system automatically adds an eight-byte data integrity field to the data of each sector, checks data written from host bus adapters (HBAs) through storage area network (SAN) optical cables to disks, and reads integrity information about data from disks. Such data integrity fields are forwarded, transmitted, and stored on storage media together with corresponding user data. Before user data is read by a host again, the storage system uses the PI technology to check that the user data is correct and complete. If data is incorrect or incomplete, the storage system uses data redundancy (such as RAID) to recover the data. In this way, user data reliability is protected. The same data protection mechanism is also employed by the host to protect data on the path from the application to the host HBA. In this way, the host can automatically checks for and rectify faults before users are affected. Deployment of the same data protection mechanism on the host is called data integrity extensions (DIX). DIX extends the application range of the PI technology. The PI technology and DIX can achieve end-to-end (application-to-disk) data protection.
Secure and Trusted Storage Services Ensuring Quality of Service (QoS) and Performance of Key Applications SmartQoS ensures stable operations of key applications by prioritizing the LUNs for key applications so that the LUNs can use sufficient storage resources such as CPUs and memory and their access requests are processed first. Moreover, HUAWEI 18000 series can ensure that prioritized key applications' requests are processed first by limiting the IOPS and bandwidth for LUNs used by other applications. As shown in Figure 6, there is a storage resource pool with 192 disks, containing two 2 TB LUNs, which have the same owning controller. A database service model test (all random I/Os, among which 80% I/Os are reads and the rest are writes, I/O latency shorter than 10 ms) is performed first. The test result indicates that the two LUNs deliver similar performance (12,000 IOPS). Then, SmartQoS is configured to constrain IOPS of LUN1 to 6000. After the SmartQoS configuration takes effect, the IOPS of LUN1 gradually drops to 6000 and that of LUN2 grows to 18,000.
8
#CN12016V
©2013 IDC China
FIGURE 6 Database service model test with IOPS for one LUN limited by SmartQoS
Source: Huawei, 2013
As shown in Figure 7, there is a storage resource pool with 192 disks, containing two 100 GB LUNs, which have the same owning controller. A database service model test (all random I/Os, among which 80% I/Os are reads and the rest are writes, I/O latency shorter than 10 ms) is performed first. The test result indicates that the two LUNs deliver similar performance (48,000 IOPS). After a while, a SmartQoS policy is configured to prioritize LUN1. After the policy takes effect, LUN1 has access to more storage system resources. As a result, the IOPS of LUN1 reaches 62,000, while that of LUN2 drops to 34,000.
©2013 IDC China
#CN12016V
9
FIGURE 7 Database service model test with one LUN prioritized by SmartQoS
Source: Huawei, 2013
SmartPartition partitions the physical cache of HUAWEI 18000 series into multiple independent areas. SmartPartition allows users to assign exclusive cache areas to LUNs used by key applications, and enables the storage system to intelligently adjust concurrent requests of hosts and those of disks. In this way, SmartPartition ensures that applications of low priority do not compete for cache resources with applications of high priority in busy hours, ensuring high efficiency and stable operation of key applications.
Disaster Recovery for Core Applications HUAWEI 18000 series offers complete data disaster recovery (DR) solutions that are applicable to a variety of applications such as Oracle, SAP, and ERP. In addition, the storage system provides the Hyper series advanced data protection features, including HyperSnap (snapshot), HyperClone (cloning), HyperCopy (LUN copy), HyperReplication/S (synchronous remote replication), and HyperReplication/A (asynchronous remote replication), to ensure data consistency between sites. Moreover, the storage system employs the Ultra series DR management programs to achieve one-click switchover between sites. The Ultra series programs free users from complex DR task management, reduce the possibility of misoperation, and greatly accelerate application recovery. These programs increase the service continuity to 99.9999% and ensure system secure and trusted. The shortest recovery point objective (RPO) in the industry (0 to 5 seconds). In addition, HUAWEI 18000 series allows users to set time stamps in the cache to copy data in the cache at the production center to that at the DR center. If a link is interrupted and then recovers from the interruption during a DR process, the storage system automatically recovers and performs resynchronization, which ensures that
10
#CN12016V
©2013 IDC China
data on the secondary LUN is consistent with the data on the secondary LUN before the interruption. UltraAPM, a DR management platform developed by Huawei, is customer application-centered rather than storage-centered. It manages DR-related work centrally according to a specified work process on a graphical user interface. In this way, UltraAPM helps customers construct, maintain, and use the disaster recovery system.
FIGURE 8 Oracle disaster recovery test using HUAWEI 18000 series
Source: Huawei, 2013
In this test, two HUAWEI RH5885 V2 high-performance servers are deployed in the production end and disaster recovery (DR) end. Oracle databases are installed. HUAWEI UltraAPM and OceanStor 18000 Series Storage System are combined to implement database DR tests. A storage fault at the production end is simulated for a DR failover. During the UltraAPM DR, data of the primary storage array at the production end is remotely copied to the secondary storage array at the DR end. LUNs on the secondary storage array become readable and writable. Then the LUNs are mapped to the application server at the DR end, and Oracle databases start. The test result indicates that the Oracle database has been started at the DR end, and the data at the DR end is the same as that at the production end after disaster-triggered migration is executed. The DR switchover takes about five minutes. UltraVR is DR management software and integrates with the virtualization architecture to set and manage DR of virtual machines (VMs) in a virtual environment. UltraVR supports deep Vcenter integration and is applicable to VMware environments. By cooperating with value-added functions of Huawei's storage devices, it provides functions including remote recovery, local VM recovery, failover, failback, DR rehearsal, one-click recovery, data verification, and planned migration. Manual operations are turned into automatic operations based on a specified process. UltraVR does not change the virtualization infrastructure and meets various DR
©2013 IDC China
#CN12016V
11
requirements of users. VMs are fully utilized at the layer of application server software, improving data protection efficiency. Currently, UltraVR supports VMware virtualization platform and HUAWEI FusionSphere virtualization platform. In the future, it will support virtualization platforms such as Xen and Hyper-V. Users can employ one set of software to manage DR of multiple virtualization platforms.
FIGURE 8 VMware DR test by using HUAWEI 18000 series
Source: Huawei, 2013
In this test, two RH5885 V2 high-performance servers are used as ESX servers. Multiple Windows 2008 R2 VMs are deployed on ESX servers and HUAWEI 18000 series is mounted at the bottom layer. Oracle databases are installed on the VMs. Remote replication is configured for the production end. The storage system at the DR end is not mounted. A storage fault is simulated at the production end for a mandatory system service switchover. During the UltraVR DR, remote replication is forcibly switched between the primary and secondary arrays. LUNs on the secondary storage array become readable and writable. Then the LUNs are mapped to the ESXi server at the DR end, and VMs automatically start. The test result indicates that the VMs have automatically started, and the database data at the DR end is the same as that at the production end after the DR switchover is complete. The DR switchover takes about five minutes.
Two-Site Three-Center Solution: Supporting Data Flows Between Products in different level Different level Huawei storage devices are deployed in the production center, intracity DR center, and remote DR center. The remote replication enables data to flow between the production center and DR center, achieving remote data protection and service continuity. When the production center encounters a disaster, a primary/secondary switchover is implemented in the intra-city DR center to enable the intra-city DR center to take over services. Meanwhile, the production center and intracity DR center keep a DR relationship with the remote DR center. If both the production center and intra-city DR center become faulty, a primary/secondary switchover can be implemented in the remote DR center to enable the remote DR
12
#CN12016V
©2013 IDC China
center to take over services, ensuring service system continuity. Users can create different data replication policies based on the actual service requirements and adjust the replication process according to service characteristics, improving the flexibility of the DR solution. Users deploy different level storage devices in the production center and DR center to implement remote data DR, overcoming the obstacle of intercommunication between different level storage devices in traditional DR systems and dramatically cutting the construction cost of DR systems. Therefore, the overall value of the DR solution is improved.
Up to 32 to 1 Centralized Disaster Backup Traditional DR modes support 4:1 replication only, that is, data is copied from four storage devices to one storage device. However, centralized DR requirements of numerous branches of large-sized enterprises and governments remarkably increase the cost of DR system construction and management difficulty. HUAWEI 18000 series applies to this scenario and support 32:1 replication mode. This mode reduces storage devices in the DR center and cuts the cost of DR system construction. Meanwhile, the 32:1 centralized DR is combined with Huawei's DR management platform, simplifying the management and maintenance of large DR systems and opening a new era for intensive DR construction.
Supporting Third-Party Storage System HUAWEI 18000 series enables heterogeneous integration between storage devices of various types from different vendors. Storage devices from other vendors are taken as resources of HUAWEI 18000 series and centrally managed. These storage devices can provide storage services for external applications. Meanwhile, HUAWEI 18000 series cooperates with the Smart and Hyper series software, improving resource utilization of storage devices, protecting customers' current investment, simplifying user management, and enhancing system reliability. Heterogeneous integration enables data to be migrated between heterogeneous storage devices and HUAWEI 18000 series, improving storage efficiency of original data and cutting total cost of ownership (TCO).
Challenges and Opportunities Faced by HUAWEI OceanStor 18000 Series Enterprise Storage System The IDC observes that high-end storage products are appreciated by users in finance and telecom industries and governments with their excellent performance, high stability, and solid reliability. These high-end storage products apply to critical services such as the core billing system (BS) and operation system. High-end storage devices have high requirements for development strength of vendors. For many years, vendors outside China dominate the high-end field. Nowadays, as a vendor form China, Huawei first delivers its self-developed OceanStor 18000 Series Enterprise Storage System. In recent two years, popular technologies such as cloud computing and big data result in the deployment and application of high-end storage devices. Huawei seizes this opportunity to enter the high-end storage market. Meanwhile, the IDC also notices that Huawei faces some challenges in popularizing high-end storage solutions. A mature storage system needs to pass the test of the storage market. The development of storage products is the first step. In the follow-up process, the storage products need to be continuously updated and improved in actual application environments to meet requirements of complicated IT environments.
©2013 IDC China
#CN12016V
13
Particularly, HUAWEI 18000 series confronts critical applications that have demanding requirements. Therefore, HUAWEI 18000 series must be powerful enough to meet those demanding requirements. According to the statistics from Huawei, over 50 HUAWEI 18000 series have been successfully deployed in the fields of governments, finance, telecom, and energy since they were delivered on September 5, 2012. Until now, these storage systems are running properly. By the end of 2013, more than hundreds of HUAWEI 18000 series will be delivered to customers.
Conclusion In the foreseeable future IT budget is expected to remain unchanged or to decrease, but the flood of data will still exist. Hence, IT organizations also need efficient and simplified solutions to improve capital utilization while ensuring the security of key data. In this report, IDC describes three secure and trusted features of high-end storage, namely, system architecture, data storage, and storage service, for IT organizations. While evaluating a storage solution, IT organizations should also consider the following: Storage system performance when ensuring secure and trusted system hardware architecture Storage system efficiency when ensuring secure and trusted data storage Management simplification when ensuring implementation of DR functions
Copyright Notice External Publication of IDC Information and Data — Any IDC information that is to be used in advertising, press releases, or promotional materials requires prior written approval from the appropriate IDC Vice President or Country Manager. A draft of the proposed document should accompany any such request. IDC reserves the right to deny approval of external usage for any reason. Copyright 2013 IDC. Reproduction without written permission is completely forbidden.
14
#CN12016V
©2013 IDC