Intro
A major focus for MongoDB 3.0 has been improving performance, especially write performance and hardware utilization. To help illustrate what we’ve done and how to take advantage of the changes we’ve made, we will be publishing a series of blog posts comparing MongoDB 2.6 and 3.0 performance.
As with any benchmarking, the applicability of the results to your application is not always clear. Use cases for MongoDB are diverse, and it is critical to use performance tests that reflect the needs of your application and the hardware you will use for your deployment. As such, there’s really no “standard” benchmark that will inform you about the best technology to use for your application. Only your requirements, your data, and your infrastructure can tell you what you need to know.
To help us measure performance, we use hundreds of different tests that we have developed working with the community. These tests reflect the diverse applications users build, and the ever-evolving environments in which they are deployed.
YCSB is used by some organizations as part of their performance testing for several different database technologies. YCSB is fairly basic and probably does not tell you everything you need to know about the performance of your application. However, it is fairly popular and understood by users of MongoDB and other systems. In this post we’ll compare YCSB results for MongoDB 2.6 and 3.0.
Throughput
In YCSB tests, MongoDB 3.0 provides around 7x the throughput of MongoDB 2.6 for multi-threaded, batch inserts. We should expect to see the biggest improvement for this workload because it is 100% writes, and WiredTiger’s document-level concurrency control is most beneficial to multi-threaded write workloads on servers with many processor cores.
The second test compares the two systems for a workload that is 95% reads and 5% updates. Here we see approximately 4x better throughput with WiredTiger. This is a smaller improvement than observed for the load because writes are only 5% of all operations. In MongoDB 2.6 concurrency control is managed at the database level, and writes can block reads, reducing overall throughput. Looking at this test, the more fine-grained concurrency control of MongoDB 3.0 clearly improves overall throughput.
Finally, for the balanced workload we see over 6x better throughput with MongoDB 3.0. This is better than the 4x improvement we see with the 95% read workload because there are more writes.
Latency
Measuring throughput isn’t enough – it is also important to consider the latency of operations. Average latency measured across many operations is not the best metric. Developers who want to ensure a consistently great, low-latency experience worry about the worst performing queries in their deployment. High latency queries are measured at the 95th and 99th percentiles – where observed latency is worse than 95% or 99% of all other latencies. (One could argue these are insufficiently precise – most web sessions involve hundreds of requests, and so it is very likely that most users will experience latency at the 99th percentile during their session.)
We see very little difference between MongoDB 2.6 and MongoDB 3.0 in terms of read latency: reads are consistently 1 ms or less across workloads. For update latency, however, the results are more interesting.
Here we compare the update latency at the 95th and 99th percentiles using the read-intensive workload. Update latency is significantly improved in MongoDB 3.0: it has been reduced by almost 90% at both the 95th and 99th percentiles. This is important - improving throughput should not come at the cost of greater latency as this will ultimately degrade the experience for users of the application.
In the balanced workload, update latency is lower still. At the 95th percentile, update latency for MongoDB 3.0 is almost 90% lower than MongoDB 2.6, and over 80% lower at the 99th percentile. As a result of these improvements, users should experience better, more predictable performance.
We believe these tests for throughput and latency demonstrate a major improvement in the write performance for MongoDB.
Small Changes That Make A Big Impact
In future posts we will describe a number of small changes that can make a big impact to MongoDB performance. As a preview, let’s take a look at one of the factors we see people overlook frequently.
Providing Sufficient Client Capacity
The default configuration for YCSB uses one thread. With a single thread you will likely observe fairly poor throughput with any database. Don’t use a single threaded benchmark unless your application runs single threaded. Single threaded tests really only measure latency, not throughput, and capacity planning should consider both factors.
Most databases work best with multiple client threads. Determine the optimal number by adding threads until the throughput stops increasing and/or the latency increases.
Consider running multiple clients servers for YCSB. A single client may not be able to generate sufficient load to determine the capacity of the system. Unfortunately, YCSB does not make it easy to use more than one client – you have to coordinate starting and stopping the individual clients, and you have to manually aggregate their results. When sharding, start by allocating one mongos for every 1-2 shards, and one YCSB client per mongos. Too many clients can overwhelm the system, initially adding latency, but eventually starving the CPU. In some cases it may be necessary to throttle client requests.
Finding the right balance of latency and throughput should be a part of any performance tuning exercise. By monitoring both and increasing the number of threads through a series of tests, you can determine a clear relationship between latency and throughput, and the optimal number of threads for a given workload.
We can make two observations based on these results:
- The 99th percentile for all operations is less than 1ms up to 16 threads. With more than 16 threads, latency begins to rise.
- Throughput rises from 1 to 64 threads. After 64 threads, increasing the thread count does not increase throughput, yet it does increase latency.
Based on these results, the optimal thread count for the application is somewhere between 16 and 64 threads, depending on whether we favor latency or throughput. At 64 threads, latency still looks quite good: the 99th percentile for reads is less than 1ms, and the 99th percentile for writes is less than 4ms. Meanwhile, throughput is over 130,000 ops/sec.
YCSB Test Configurations
We tested many different configurations to determine the optimal balance of throughput and latency.
For these tests we used 30 million documents and 30 million operations. Documents included 1 field of 100 bytes (151 bytes total). Records were selected using the Zipfian distribution. Results reflect the optimal number of threads, which was determined by increasing the number of threads until the 95th and 99th percentile latency values began to rise and the throughput stopped increasing.
All tests use a replica set with journaling enabled, and environments were configured following our best practices. Always use replica sets for production deployments.
The YCSB client ran on a dedicated server. Each replica set member also ran on a dedicated server. All servers were Softlayer bare metal machines with the following specifications:
- CPU: 2x Deca Core Xeon 2690 V2 - 3.00GHz (Ivy Bridge) - 2 x 25MB cache
- RAM: 128 GB Registered DDR3 1333
- Storage: 2x 960GB SSD drives, SATA Disk Controller
- Network: 10 Gbps
- OS: Ubuntu 14.10 (64 bit)
- MongoDB Versions: MongoDB 2.6.7; MongoDB 3.0.1
To learn more about what's new in MongoDB 3.0, download the white paper here:
About the Author - Asya
Asya is Lead Product Manager at MongoDB. She joined MongoDB as one of the company's first Solutions Architects. Prior to MongoDB, Asya spent seven years in similar positions at Coverity, a leading development testing company. Before that she spent twelve years working with databases as a developer, DBA, data architect and data warehousing specialist.