Transcript
Technical Note
www.virtuozzo.com
VIRTUOZZO STORAGE VS. CEPH I/O Performance Comparison April 29, 2016
Executive Summary Software-defined storage (SDS) is one of the key technologies IT organizations are looking toward as they explore innovative tools to help them become faster, more flexible, more efficient, and more competitive. Yet to do so, they must try to evolve their data center without increasing capital expenditures or costs to their customers. Two of the most prominent solutions for providing cost-effective SDS are Virtuozzo Storage and Ceph, a popular open source project. This paper describes how Virtuozzo Storage is significantly faster than Ceph in almost every respect, ranging from 50 percent to as much as 1000 percent. Specifically, for highly concurrent, randomized workloads, Virtuozzo Storage was up to 10 times faster than Ceph, and for more sequential workloads, Virtuozzo Storage was 50 percent faster on average. In summary, Virtuozzo Storage is a clear leader in the software-defined storage industry in terms of both I/O performance and feature set. Virtuozzo Storage’s best-in-class vertical scalability, reliability, flexibility, and affordability make it an ideal solution for any service provider or enterprise looking for high performance, fault tolerant, and cost-effective storage solutions.
© 2016 Virtuozzo. All rights reserved.
1
Technical Note
www.virtuozzo.com
Introduction Data is generated, stored, and catalogued with nearly everything we do—from personal computing and phone calls to the transportation we use and how our cities operate. Our “digital universe” is predicted to grow 50-fold from 2010 to 2020, and nearly half of all data will be handled by cloud computing providers.1 The future of data presents both challenges and opportunities for IT business that have an interest in data management. They must become faster, more efficient, more flexible. They must innovate their data centers in a manner that doesn’t erode their capital expense budget nor increase customer costs. Demand for high performance and high data availability is greater than it’s ever been, but so is the expectation of affordability. Virtuozzo Storage is designed to solve those challenges. By eliminating the need for expensive infrastructure, such as SAN or NAS, and by using SDS, rather than hardware, Virtuozzo Storage recaptures lost capacity and delivers the highest levels of performance and reliability. With Virtuozzo Storage, IT businesses can keep up with data center demand, gain competitive advantage, and be a low-cost solution for their customers. To illustrate how well Virtuozzo Storage performs, we compared it with another industryleading storage software. This technical note compares the features and I/O performance results of Virtuozzo Storage and Ceph.
1
IDC IVIEW: “The Digital Universe in 2020, ” http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in2020.pdf
© 2016 Virtuozzo. All rights reserved.
2
Technical Note
www.virtuozzo.com
Background About Virtuozzo Virtuozzo is a pioneer in the development of virtualization technologies, including virtualized storage. Virtuozzo Storage2 is an industry-leading storage solution that was designed from scratch to work with virtualized environments. Virtuozzo decouples computation from storage, enabling VMs and containers to instantly migrate to another physical server whenever the original server becomes unavailable. Rather than attach containers to expensive SANs or be limited by a server’s disk, Virtuozzo Storage pools capacity, providing expandable storage at a much lower price point. With Virtuozzo Storage, unused disk space on server nodes are turned into low cost cloud storage. Its highly available, distributed storage system has built-in replication and disaster recovery. And most importantly, IT organizations can build fault-tolerant, multi-machine storage clusters using nothing more than their servers’ pre-existing, locally attached hard drives, with no additional hardware costs.
About Ceph Ceph3 is also a software-defined storage solution. Similar to Virtuozzo Storage, its object store and file system stores data on a single distributed cluster and is designed to be faulttolerant, scalable, and highly available. Ceph’s foundation is the Reliable Autonomic Distributed Object Store (RADOS), which provides excellent data storage scalability. Its storage system serves as a flexible foundation for many data storage needs. And as an open source software storage platform, Ceph has become widely used in the open source community.
2 3
About Virtuozzo Storage: https://virtuozzo.com/products/ About Ceph: http://ceph.com/
© 2016 Virtuozzo. All rights reserved.
3
Technical Note
www.virtuozzo.com
I/O Performance Tests As IT businesses evaluate which products they will use to manage data, it is important to compare different product’s performance levels and key features. We investigated how Virtuozzo Storage and Ceph I/O performance compare. To do so, we deployed Ceph and Virtuozzo Storage on the same set of hardware and ran several tests.
Sequential Versus Random Reads and Writes We have observed, over the course of many years, that in most cases, multiple concurrent users will effectively make storage usage patterns appear random; this is particularly true in horizontal use cases such as hosting service providers or large enterprises. Since the vast majority of I/O operations on virtualization platforms are random, these were the most critical measurements to take. There are a relatively small number of cases where the use case emphasizes sequential reads and writes, for example, creating backups with a limited number of concurrent writers. We, therefore, included one sequential read and write measurement.
I/O Load Pattern Benchmarks Our tests used the following I/O load pattern benchmarks: 1. A sequential read of a 16M block (measured in MB/sec). This I/O workload is equivalent to a sequential read of a large file; it may represent an operation of creating off-storage copy of a large file (for example, creating a backup). 2. A sequential write of a 16M block (measured in MB/sec). This I/O workload is equivalent to a sequential write of a large file. A typical example would be writing a backup file, deploying an application with few large files, or copying a large database to the storage in question. 3. A random read of a 4K block (measured in IOPS). This pattern represents any workload with a random read, such as one involving small files, script codes, file system journals, or file searches. 4. A random write of a 4K block (measured in IOPS). This pattern represents any workload with a random write, such as file system journals or small files. 5. A random write of a 4K block + fdatasync (measured in IOPS). In this pattern, 32 random writes are followed with a sync (flush). This workload is the most important,
© 2016 Virtuozzo. All rights reserved.
4
Technical Note
www.virtuozzo.com
because it simulates work that a relational database must do to save simple database changes during a transaction. During a single test, identical benchmarks were running on every server in the cluster. The represented results are an aggregation of results from all servers. The test configuration for both Ceph and Virtuozzo Storage products was set up on a cluster out of 14 servers, each containing 36 SATA hard drives, 4TB raw capacity per disk. Our tests measured the vertical scalability—the effects of adding more workload threads to a single server—up to a total of 16 threads. In general, we believe this set of load patterns provides an in-depth evaluation of storage performance, allowing fair comparison of different storage technologies for a wide range of applications. For a more complete description of the testing methodology, please refer to the appendix.
© 2016 Virtuozzo. All rights reserved.
5
Technical Note
www.virtuozzo.com
Test Results for Vertical Scalability Vertical scalability is the ability to maintain the desired I/O performance as the number of workload threads on a server increases. For these tests, we ran 1, 4, and 16 threads of I/O workload on a single server in the cluster. The following figure presents the results of each benchmark. Figure 1. Virtuozzo Storage vs. Ceph Performance Map
The five sections—sequential read, sequential write, random read, random write, and random write x32—correspond to the I/O load patterns described in the overview. Each
© 2016 Virtuozzo. All rights reserved.
6
Technical Note
www.virtuozzo.com
section has three corners that show results for 1-, 4-, and 16-thread workload, respectively. The larger footprint shows higher performance. In random workloads, the most common usage pattern, Virtuozzo is up to 10 times faster than Ceph. Virtuozzo also outperforms Ceph in sequential workloads by about 1.5 times. In 1-thread sequential reads, however, Ceph demonstrates better performance. Overall, these tests illustrate that Virtuozzo Storage excels in handling random I/O and demonstrates better vertical scalability than Ceph.
Key Performance Features It is hard to overestimate the importance of storage performance, which is a very common bottleneck for an entire system’s performance. Virtuozzo Storage offers several options for enhancing performance, including: •
Dividing data into chunks: Similar to striping in RAID, dividing data into chunks enables I/O parallelization between multiple hard drives. As each chunk is located on a separate drive, one I/O thread works with only one disk at any given moment. Real-life scenarios almost always produce parallel I/O workloads that simultaneously involve multiple files or file segments. Examples of parallel I/O workload include databases and running multiple VMs on a single host. Virtuozzo Storage is architecturally tailored for parallel I/O.
•
Chained writes: To optimize network bandwidth use, Virtuozzo Storage writes and updates data replicas in a chained manner: Initially the client tries to store the first data replica locally, then the first node transfers the second replica to the second node, the second node transfers the third replica to the third node, and so on.
© 2016 Virtuozzo. All rights reserved.
7
Technical Note
www.virtuozzo.com
Figure 2. Chained Data Replication
Chained data replication allows the use of full network bandwidth for data transfer. For example: With a 1Gbps link, the speed of writing three replicas to three nodes in parallel would peak at 33MB/s. The chained approach, however, would use the entire bandwidth for each replica, peaking at 100MB/s. •
SSD cache for a chunk server journal: SSD drives can be attached to a node in the cluster, and the drive can be configured to store a write journal. The performance of random write operations in the cluster is boosted by a factor of two or more. The journal can also maintain chunk checksums to improve storage reliability.
•
SSD read cache on the client: SSD drives can attach to a client and the drive can be configured to store a local cache of frequently accessed data. Overall cluster read operations performance increases by a factor of 10 or more.
•
FUSE optimizations: Virtuozzo Storage uses an improved, much faster version of FUSE, developed as part of Virtuozzo's own Linux distribution.
•
Locality-based data placement: Virtuozzo Storage stores data as close to where it's used as possible. For example, if a cluster host has a VM running on it, one replica of that VM's data will be stored on the same host to make I/O local. This approach reduces data access latency as well as storage network traffic.
•
Automatic data balancing: To maximize the I/O performance of nodes in a cluster, Virtuozzo Storage automatically balances load by moving hot data chunks from hot nodes to colder ones. A node is considered hot if its request queue depth exceeds
© 2016 Virtuozzo. All rights reserved.
8
Technical Note
www.virtuozzo.com
the cluster-average value by 40 percent or more. With data chunks, "hot" means "most requested." •
Thin replication: Virtuozzo Storage supports “thin” replication—only the changed parts of chunks are updated on recovery, rather than the whole chunks. This approach significantly boosts overall performance, because it involves fewer reads, writes, and data transfers.
For a more complete feature comparison of Virtuozzo Storage and Ceph, please refer to the Appendix.
Conclusion IT organizations ranging from hosting service providers to enterprises are looking for ways to turn their storage infrastructure into a competitive advantage. The challenge is to find storage solutions that offer best-in-class vertical scalability, reliability, flexibility, and affordability. Across all these characteristics, it is clear that Virtuozzo Storage is a leading provider of high performance, cost-effective software-defined storage capabilities. In comparing Virtuozzo Storage with Ceph, we have demonstrated that, in almost all cases, Virtuozzo Storage provides significantly improved storage performance. On average, Virtuozzo was 50% faster than Ceph for sequential workloads, and up to about 10 times faster for the highly concurrent, randomized workloads that are typical of most large enterprises and service providers. Of all tested workloads, Ceph had better performance than Virtuozzo Storage in only one scenario: sequential reads with low concurrency, i.e. a single reader. An example would be backing up a large file, since backing up requires a combination of reading and writing data. If there is only a single process reading data from this shared node at that moment in time, then Ceph is somewhat faster. It should be noted, however, that in such a backup scenario, Virtuozzo easily outpaced Ceph as soon as the number of simultaneous readers
© 2016 Virtuozzo. All rights reserved.
9
Technical Note
www.virtuozzo.com
increased, which would of course be expected for a shared node. For all but the smallest organizations using clustered storage, the single-reader scenario is likely to be rare compared to more concurrent usages. Virtuozzo Storage is tailored for the random I/O workloads common to IT organizations, and is, therefore, an ideal option for IT businesses that deploy virtual servers or that manage relational databases. Our storage solution offers best-in-class vertical scalability. It’s reliable, flexible, and has the ability to recapture lost or unused storage within existing hardware. In short, Virtuozzo Storage is an excellent solution for hosting companies, services providers, and other data management companies that want to dramatically improve their performance and profitability.
© 2016 Virtuozzo. All rights reserved.
10
Technical Note
www.virtuozzo.com
Appendices I/O Performance Test Methodology In brief, the methodology used for I/O performance testing was as follows: 1. We used a custom-developed I/O load tool to simulate various workloads (random and sequential reads and writes in multiple threads). The volume of test data was 16GB for each server. 2. The set up for Virtuozzo Storage clusters consisted of 14 servers, with a client mount on each server. We tested performance in several configurations: with 1, 4, and 16 workload threads running on 14 servers. 3. For a fair comparison between Virtuozzo Storage and Ceph, giving each the same level of redundancy, we used the same replication level: two replicas. 4. SSD drives were added for both CS journaling and client read caching. With Ceph, SSD has been added for write-back cache and metadata journaling. For the complete description of the methodology, see the Virtuozzo Storage I/O Benchmarking Guide4.
Testbed Configuration Software •
Virtuozzo 6.0.10-23 (3 MDS, 448 CS, replication=3:2)
•
Ceph 0.94.3 (16 monitors, 448 OSD, 1 MDS, replicas=3; cluster mounted by cephfuse)
Hardware •
Physical servers: 14
• •
CPU: 2 x 10-core Intel Xeon 2.3 GHz RAM: 128 GB
4
Virtuozzo Storage I/O Benchmarking Guide: http://updates.virtuozzo.com/doc/pstorage/Virtuozzo_Storage_IO_Benchmarking_Guide.pdf
© 2016 Virtuozzo. All rights reserved.
11
Technical Note
www.virtuozzo.com
• •
HDD: 36 x 4TB SATA SSD: 2 x Intel P3700 400G
•
Network: 10Gbit Ethernet
Feature Comparison Scalability Virtuozzo Storage
Ceph
Maximum number of nodes
Unlimited
Unlimited
Maximum number of disks in cluster
Unlimited
Unlimited
Up to petabytes
Up to petabytes
Yes
Yes
Virtuozzo Storage
Ceph
RAID 1
Yes
Yes
Supported replica
Any
Any
RAID6 for archives
Yes
Yes
RAID6 for hot data
Coming in Q2
No
Failure domain
Any
Yes (zones)
Automatic recovery
Yes
Yes
Virtuozzo Storage
Ceph
Yes (LSI only)
No
Yes (4 tiers)
Any
Yes
No
Capacity Use of nodes for computing
Redundancy
Disks
Special RAID controller support Disks pools (tiers) Disk hotplugging
© 2016 Virtuozzo. All rights reserved.
12
Technical Note
www.virtuozzo.com
Cache Virtuozzo Storage
Ceph
SSD cache: read
Yes
No
SSD cache: write-back
Yes
Yes
Yes (autobalancing)
No
Virtuozzo Storage
Ceph
Coming in Q2
No
Yes
Yes
Coming in Q2
No
Virtuozzo Storage
Ceph
Checksumming and scrubbing
Yes
Yes
Deduplication
No
No
Encryption
No
No
Geo-replication
No
No
Cloudboot
No
No
Backup of S3
Yes
No
SSD cache: auto-tiering
Management
UI Command line UI multitenancy
Miscellaneous
© 2016 Virtuozzo. All rights reserved.
13
Technical Note
www.virtuozzo.com
Parallels IP Holdings GmbH Vordergasse 59 8200 Schaffhausen Switzerland Tel: + 41 52 632 0411 Fax: + 41 52 672 2010 www.virtuozzo.com Copyright © 1999-2016 Parallels IP Holdings GmbH and its affiliates. All rights reserved. This product is protected by United States and international copyright laws. The product’s underlying technology, patents, and trademarks are listed at https://virtuozzo.com/wpcontent/uploads/2016/02/virtuozzo_legal_notices_20160215.pdf. Microsoft, Windows, Windows Server, Windows NT, Windows Vista, and MS-DOS are registered trademarks of Microsoft Corporation. Apple, Mac, the Mac logo, Mac OS, iPad, iPhone, iPod touch, FaceTime HD camera and iSight are trademarks of Apple Inc., registered in the US and other countries. Linux is a registered trademark of Linus Torvalds. All other marks and names mentioned herein may be trademarks of their respective owners.
© 2016 Virtuozzo. All rights reserved.
14