MongoDB 4.0 adds the ability to read from secondaries while replication is simultaneously processing writes. To see why this is new and important let's look at secondary read behavior in versions prior to 4.0.
Background
From the outset MongoDB has been designed so that when you have sequences of writes on the primary, each of the secondary nodes must show the writes in the same order. If you change field "A" in a document and then change field "B", it is not possible to see that document with changed field "B" and not changed field "A". Eventually consistent systems allow you to see it, but MongoDB does not, and never has.
On secondary nodes, we apply writes in batches, because applying them sequentially would likely cause secondaries to fall behind the primary. When writes are applied in batches, we must block reads so that applications cannot see data applied in the "wrong" order. This is why when reading from secondaries, periodically the readers have to wait for replication batches to be applied. The heavier the write load, the more likely that your secondary reads will have these occasional "pauses", impacting your latency metrics. Given that applications frequently use secondary reads to reduce the latency of queries (for example when they use "nearest" readPreference) having to wait for replication batches to be applied defeats the goal of getting lowest latency on your reads.
In addition to readers having to wait for replication batch writes to finish, the writing of batches needs a lock that requires all reads to complete before it can be taken. That means that in the presence of high number of reads, the replication writes can start lagging – an issue that is compounded when chain replication is enabled.
What was our goal in MongoDB 4.0?
Our goal was to allow reads during oplog application to decrease read latency and secondary lag, and increase maximum throughput of the replica set. For replica sets with a high write load, not having to wait for readers between applying oplog batches allows for lower lag and quicker confirmation of majority writes, resulting in less cache pressure on the primary and better performance overall.
How did we do it?
Starting with MongoDB 4.0 we took advantage of the fact that we implemented support for timestamps in the storage engine, which allows transactions to get a consistent view of data at a specific "cluster time". For more details about this see the video: WiredTiger timestamps.
Secondary reads can now also take advantage of the snapshots, by reading from the latest consistent snapshot prior to the current replication batch that's being applied. Reading from that snapshot guarantees a consistent view of the data, and since applying current replication batch doesn't change these earlier records, we can now relax the replication lock and allow all these secondary reads at the same time the writes are happening.
How much difference does this make?
A lot! The range of performance improvements for throughput could range from none (if you were not impacted by the replication lock - that is your write load is relatively low) to 2X.
Most importantly, this improves latency for secondary reads – for those who use readPreference "nearest" because they want to reduce latency from the application to the database – this feature means their latency in the database will also be as low as possible. We saw significant improvement in 95 and 99th percentile latency in these tests.
Thread levels | 8 | 16 | 32 | 64 |
Feature off | 1 | 2 | 3 | 5 |
Feature on | 0 | 1 | 1 | 0 |
95th percentile read latency (ms)
Best part of this new feature? You don't need to do anything to enable it or opt-into it. All secondary reads in 4.0 will read from snapshot without waiting for replication writes.
This is just one of a number of great new features coming in MongoDB 4.0. Take a look at our blog on the 4.0 release candidate to learn more. And don’t forget, you’ve still got time to register for MongoDB World where you can meet with the engineers who are building all of these great new features.