Transcript
White Paper
For MongoDB
PerfAccel(TM) Performance Benchmark: NoSQL Database MongoDB
White Paper
EXECUTIVE SUMMARY
for MongoDB •
NoSQL databases run better with low latency SSD systems
•
Running all SSD systems are expensive and can limit scalability
•
PerfAccel provides the best combination of price performance
•
SSD adoption is increasing due to its low latency and high throughput properties
•
Longevity of SSDs is still a concern, specially for heavy write workloads
NoSQL databases are fast gaining popularity and are now being used quite extensively. Both small and large organizations are rapidly adopting these databases for various purposes and seeing instant results. A large portion of these NoSQL database deployments use fast storage devices due to the high IOPS requirement that the NoSQL databases have. While SSDs are fast and provide extremely low-latency and high I/O throughput, they are expensive, driving up the deployment costs. This paper describes how PerfAccel can be used to deliver the high IOPS requirement of NoSQL databases with much smaller SSD storage devices, rather than have the entire data set on the SSD. Tis approach provides the best of both worlds, that is, improved performance at much lower cost.
FAST STORAGE DEVICES (SSD/FLASH) SSD/Flash storage is gaining prominence due to their properties of low latency and high throughput. In the past few years they have become almost mainstream. A large number of high end server configurations now come pre-installed with SSD drives. SSD devices are available over a large spectrum of price/performance. The low end devices are not designed for data-center use as they come with low endurance ratings and lower performance. The high end PCIe based drives have high endurance, greater capacity and extremely high performance. The mid-range SSDs are the most commonly used. To ensure longevity and continued performance from SSDs, it is important to throttle writes to the disk. To unlock the true I/O performance potential, specific I/O techniques need to be used. For example, reads/writes in blocks in 1MB multiples suits SSDs. Small writes are invariably bad for the durability and performance of such devices. That is why write intensive applications can overrun these drives posing a threat to data durability. Hence, it becomes very important to use SSDs in a very considered way. In a way that derives the best value out of these devices, keeping costs in control while high performance.
White Paper
NOSQL DATABASE I/O PERFORMANCE ISSUES
for MongoDB •
NoSQL databases are designed to scale horizontally, presenting untraditional IO challenges
•
NoSQL databases are affected by performance cliff
•
Writes and Updates have to be followed up with compaction, which is an extremely I/O intensive process
•
In the absence of compaction, the disk usage grows, as stale copies of data continue to exist
NoSQL (or Not Only SQL as it is referred to in some contexts), is a new method of data management that is different from the more traditional relational model of data management and scaling and finer control over consistency, availability and partition tolerance, which are some of the key requirements of modern applications. Adoption of NoSQL databases has increased with the growing size of datasets. These “Big Data” applications store data in a way that makes application design and development simpler. It’s a good alternative to a normalized relational data model which causes impedance mismatch and makes application design and development slow and feature expansion difficult. Traditional databases typically scale vertically. NoSQL databases however, scale out and on-demand. Horizontal or scale out model used by NoSQL databases presents a new dimension to I/O Performance handling. Database I/O optimization techniques work well with vertical scaling model, are not always a good approach for the horizontal scaling. NoSQL databases are particularly susceptible to the performance cliff, which happens when the working set of the application exceeds the system RAM. Due to the inherent way in which NoSQL databases and the applications that use them work, where most of the data access patterns are random and even with considerate design and good choice of primary key to drive a working data set, I/O performance issues usually crop up. It is fast becoming a standard practice to deploy NoSQL databases completely on SSDs. While this provides the best performance, it comes at a high cost. The SSDs are not cheap and per-node storage is typically kept low. As the dataset sizes increase more nodes have to be added to the cluster, further increasing the cost. Moreover, most NoSQL databases optimize the insert/update transactions using the log structured write mechanism. Due to which there are multiple copies of data residing on the disk, also called write amplification. All NoSQL databases require some form of compaction to remove stale data and re-arrange the data files. Compaction is a very time and consuming I/O intensive operation which significantly impacts system performance. Hence, compaction is usually run at off-peak hours. In the absence of compaction, disk usage swells, When using small SSD storage on database nodes, this is not a desirable situation. As the data set size might exceed the SSD size.
White Paper
MONGODB
for MongoDB
PerfAccel provides: •
Storage visibility through deep file-level analytics
•
Intelligent caching & deterministic placement of hot files
•
High performance using fewer SSDs used optimally
•
Increased scale by leveraging spinning disks
MongoDB is one of the leading NoSQL databases. It is preferred for its properties of high-performance and ability to scale. MongoDB now supports two underlying storage engines, namely the default MMAPv1 and the new and improved one called the WiredTiger. WiredTiger supports document level locking and compression on disk. It has better performance as compared to the MMAPv1, specially in case of writes. However, the choice of running an all SSD system substantially increases the cost of deployment.
PERFACCEL SOLUTION PerfAccel provides a unique solution that delivers deep analytics to observe I/O behavior, helping determine better data placement and improve performance of NoSQL database deployments. In addition, using its intelligent caching capabilities, PerfAccel can deliver much higher performance. The result is a significant reduction in infrastructure costs while providing rich analytics and much higher performance. PerfAccel can be used to take advantage of SSDs in a manner which is beneficial. PerfAccel can use the device in a an efficient manner, even while the data set resides on conventional devices. But still use the low latency SSD to gain I/O advantage, by ensuring the hot data resides on the faster storage providing much better performance from the same instance type. PerfAccel would use the faster device available as a cache and will ensure optimal placement of frequently used hot data. The application directly benefits since all the reads coming from this device are much faster. Moreover, most applications using NoSQL databases exhibit high temporal locality. Regardless of the size of the entire dataset, most applications have a working set, which is a sub-set of the entire dataset. If the working set resides on the fast cache device, the application performance will be the same as if the entire dataset were residing on a fast SSD device. PerfAccel is able to cache the hot data and hence provide the entire working set of the application from the fast SSD cache, even while the entire dataset can reside on a relatively slower, but vastly cheaper storage.
White Paper
TEST AND BENCHMARK CONFIGURATION
for MongoDB
To demonstrate the capabilities of PerfAccel. We benchmarked MongoDB with both the storage engines that it supports, that is the WiredTiger and the MMAPv1. The respective test configurations are presented below. WiredTiger Configuration Data Set Size
150 Million Keys
System Memory
24 GB
To demonstrate the capability of Working Set Size 40 Million Keys PerfAccel. The following tests On disk data set size (compacted) 200 GB were performed using the above mentioned configuration. Working Set size 55 GB Use the cbc-pillowfight workload generation tool that is Commit Log part of the couchbase tools. Source Device / Disks Run the workload with the entire dataset residing on the source device (3 x 1 TB SATA Disks (7200 RPM) in RAID0 config. Run the workload with the entire dataset residing on a SSD device. Run the workload with the entire dataset on the source device, and use the SSD device as a cache with PerfAccel. All the tests were run for 20 Million ops with each config, with a warmup phase of 60 minutes for each configuration.
20 GB Partition of a 240 GB Samsung SATA SSD 3 x 1 TB SATA Disks (7200 RPM) in RAID0
Cache Device / Disk
Samsung 840 PRO 240 GB SSD
CPUs
24 (Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz)
Benchmark Tool
YCSB
Benchmark: Number of ops
20 Million ops
Mmapv3 Configuration Data Set Size
100 Million Keys
Working Set Size
25 Million Keys
On disk data set size (compacted) 200 GB Working Set size
55 GB
System Memory
8 GB
Commit Log
20 GB Partition of a 240 GB Samsung SATA SSD
Source Device / Disks
3 x 1 TB SATA Disks (7200 RPM) in RAID0
Cache Device / Disk
Samsung 840 PRO 240 GB SSD
CPUs
24 (Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz)
Benchmark Tool
YCSB
Benchmark: Number of ops
20 Million ops
White Paper
TEST RESULTS
for MongoDB
WiredTiger Storage Engine – Results 20000
OBSERVATIONS: 1. RAID0 devices perform very badly on their own. When paired with PerfAccel using an small SSD cache, as compared to the dataset, the performance improves dramatically.
Number of ops/second
18000 16000 14000 12000 RAID0
10000 8000
SSD
6000
PerfAccel (80G Cache)
4000 2000 0
2. The performance achieved with PerfAccel is nearly the same as if the entire dataset were residing on the SSD.
1.00/0.00
0.75/0.25
0.50/0.50
Read/Upate ratio
MMAPv1 Storage Engine – Results 3. Performance improvements with PerfAccel are seen with both the storage engines.
6000
4. Even as the ratio of writes increases, the improvements are consistent. This is due to the fact that the writes are sent directly to the commit-log and does not impact the source data much. 5. This is a good demonstration of the capabilities of PerfAccel as a performance enhancer, which can deliver the improved I/O performance of SSD, with a much smaller SSD, so the entire dataset does not reside on SSD.
Number of ops/second
5000 4000 RAID0
3000
SSD 2000
PerfAccel (80G Cache)
1000 0 1.00/0.00
0.75/0.25 Read/Update ratio
0.50/0.50
White Paper
for MongoDB
Disclaimer: All of the documentation provided in this document, is copyright Datagres Technologies Inc. Datagres PerfAccel is a patent pending technology from Datagres Technologies Inc. Information in this document is provided in connection with Datagres products. No license, express or implied, by estoppel or otherwise, to any Datagres intellectual property rights is granted by this document. Except as provided in Datagres's Terms and Conditions of Sale for such products.
Datagres and PerfAccel are trademarks or registered trademarks of Datagres Technologies Inc. or its subsidiaries in the United States and other countries. Copyright © 2015, Datagres Technologies Inc. All Rights Reserved. Datagres may make changes to specifications and product descriptions at any time, without notice.