Previously in this series
In part one of this series we introduced NOAA, the ISD data set, and the way we transformed the ISD data from a packed ASCII storage format into a MongoDB collection.
In part two we examined the Weather of the Century app, which uses the ISD to display weather information all over the globe, from any point in time, on the Google Earth browser plugin. We discussed the queries the app uses in detail to get a sense for how MongoDB works with this data.
Now, in part three, we're going to get into the ops side of things. We’ll use the ISD data set to explore two very different kinds of MongoDB deployments, starting with an overview and a comparison of bulk loading performance. In our next installment, we'll look at how they can handle queries in production.
Let's review of the data we're dealing with.
Data Recap
Each record in the ISD data set is a single surface weather observation, composed of some identifying information such as the observation station ID, the time, and location of the observation, some mandatory fields including air temperature and pressure, and potentially hundreds of optional fields such as "total opaque coverage", which according to the 132 page data format guide is "the fraction of the total celestial dome covered by opaque clouds or other obscuring phenomena".
In MongoDB, weather observations are stored in a collection as a single document like this:
{
"st" : "u725053",
"ts" : ISODate("2013-06-03T22:51:00Z"),
"position" : {
"type" : "Point",
"coordinates" : [
-96.4,
39.117
]
},
"elevation" : 231,
"airTemperature" : {
"value" : 21.1,
"quality" : "1"
},
"sky condition" : {
"cavok": "N",
"ceilingHeight": {
"determination": "9",
"quality": "1",
"value": 1433
}
}
"atmosphericPressure" : {
"value" : 1009.7,
"quality" : "5"
}
[etc]
}
With 2.5 billion observations, at roughly 1.6k per document, plus indexes, the total storage needed for the data is about 4.5TB. While the bar for “Big Data” is set higher every year, at the present moment, those numbers certainly rate at least "Moderately Big Data". By considering the graph below, which displays the data set growth rate over time, and considering that NOAA is adding more stations, integrating more data sets from partners, and will eventually add non-surface weather data, it's easy to imagine this data growing by an order of magnitude in the next decade.
A Big-Data Playground, and a Big Data-Playground
Besides just being a large data set, you can ask all the weather data for the past 115 years some pretty fun questions. What's the biggest one-day temperature swing in recorded history? Has it ever snowed in Miami? How much of the globe is covered with clouds at any given time, on average? What's the deviation there? These properties of the ISD give us a great opportunity to explore the performance implications of deploying MongoDB in different ways. André Spiegel, Consulting Engineer at MongoDB, did exactly that, and presented his findings in a talk at MongoDB World 2014.
The methodology of putting his experiment together was different than typical capacity planning, where you target a specific throughput and latency. It was purely an experiment in what kind of performance could be squeezed out of two very different deployments of MongoDB, and the results are quite illuminating.
The Deployments
André wanted to demonstrate how MongoDB performs at tasks involving the ISD when scaled either vertically or horizontally. To provide some structure, he confined his deployments to AWS and chose from the options they offer. To make referring to them easier, we'll give them both fun names.
The Death Star: One Single Server with a Really Big Disk
It used to be that vertical scaling was the only option, and today it remains viable in many cases. For our vertically scaled deployment, André used a i2.8xlarge instance, the largest of the instance types optimized for storage and I/O. That instance type has 32 vCPUs, 251GB of RAM, and 8 x 800GB SSD instance storage devices, which André assembled into one giant 6TB volume.
This is pretty much the best you can do on a single server from AWS. When we look at this setup, we see that the ratio of storage to RAM is large (23x), and the ratio of total data to RAM is also large (about 16x). Although we can't fit even a 10th of the data into RAM, SSDs have very low latency, so fetching single pages that aren't currently in RAM should only have a slight impact on total query time.
Cost: $60,000/year
The Force: A Massive Cluster with EVERYTHING in RAM
André's second deployment was far into the other end of the spectrum, clustered to the point that absolutely everything was in RAM. He went with a cluster of 100 shards, each of them an r3.2xlarge instance, a middle-of-the-road instance from the memory-optimized family of AWS instances. They had 61GB of RAM apiece, 8 vCPUs, and a single 160GB SSD (not that the disk matters much here, because all data will be in RAM). All together that gives us 6TB of RAM, plenty of room to fit the whole data set. He initially ran both the application itself and a mongos for the cluster on a c3.8xlarge machine, the fastest compute-optimized machine available on AWS. As we'll see later in the performance section, he added more mongos instances to achieve even greater throughput via parallelism when the situation called for it.
Cost: $700,000/year
Bulk Loading Performance
Before we can use the data, we need to load it up. This isn't an operation you do one time and then don't care about, it's an integral part of disaster recovery planning. When running a business on top of data like this, you need to know how long it would take to build your infrastructure to a workable point from scratch. So let's see how fast each of these deployments can ingest all the data. First, though, we should review how to optimize bulk loading.
Keep in mind that all of these tests were performed on MongoDB 2.6. We believe that the results could be significantly different for MongoDB 3.0 with the WiredTiger storage engine.
Bulk Loading Optimizations
We'll start with optimizations on the client side. The first one is batch size. By using bulk inserts, you reduce network overhead that would be incurred if you did each insert serially. Depending on your network setup, there are diminishing returns for batch size increases; you should experiment to find the optimal batch size for your infrastructure and application.
The next client-side optimization is parallelizing inserts. A single thread alone will not saturate a single mongod, or mongos, but as with the batch size, the optimal number of threads will depend on your environment. In our comparison of deployments in this article, you'll see the results of our trials for the ISD.
Lastly, utilize unordered bulk writes made available in the 2.6 release. Bulk writes, both ordered and unordered, will allow MongoDB to execute inserts in batches, greatly speeding write operations by eliminating the application overhead of individual inserts called in serial. Using unordered writes with sharded collections will allow inserts to multiple shards to be run in parallel, potentially multiplying your insert throughput by the number of shards in the cluster.
Moving on to server-side optimizations, load your data while MongoDB is running without journaling, and only add indexes after you have loaded all the data. Both of these speed loading dramatically, but ARE NOT FOR PRODUCTION SYSTEMS! They are only for initial loading of data into systems that are not handling live CRUD operations. Once your data is loaded and your indexes are built, restart MongoDB with journaling enabled.
Finally, for sharded clusters, you should pre-split your collections and shut off the balancer during the load. This way you aren't slowed down inserting documents into shards they will migrate off of several times during the load, not to mention the migrations themselves. This optimization can also be done to a cluster in production.
If you want to look through the exact code André wrote to do the loading, you can check out the 'load' folder in the Weather of the Century App github repository.
And now, without further ado, the performance results.
The Death Star: Bulk Loading
To determine the best that could be done with this server, a matrix of load trials were performed, varying over thread count and batch size. It turned out that the sweet spot was at 8 threads and a batch size of 100 documents, with which we were able to achieve a peak insert rate of 85,000 documents per second.
At this optimal point, the entire ISD could be loaded into The Death Star in 10.33 hours. The effective insert rate was actually 70,000 documents per second, because of the index which is automatically built on _id for every insert, and throughput variation. Still, not bad!
However, building a secondary index on this collection, which we will need to answer queries efficiently, takes a staggering 7.66 hours! That's almost as long as it took to insert the whole collection. That index, by the way, is ts_1_st_1, a compound index on the timestamp in ascending order and the station ID in ascending order. Here is where we're seeing the effect of the limited amount of RAM available -- although latency for a load from SSD is low, we have to funnel all 4.5TB of the ISD data through a mere(!) 251GB of RAM.
The Force: Bulk Loading
Before the trials, André needed to select a good shard key, and for these purposes he chose a hashed index on st, the station ID. This provides a random distribution of stations across the cluster, while putting all the data for one particular station onto a single shard. This suits the use case very well.
As with The Death Star, a matrix of trials was performed to find the thread count and batch size sweet spot, which in this case turned out to be 144 threads, and a batch size of 200, for a peak throughput of 220,000 document inserts per second. Now, that's faster than the single server Death Star, but not even close to 100 times faster. It turns out that with that number of threads, the single mongos is a bottleneck. To achieve higher throughput, The Force would need additional mongos instances. So André added 9 more c3.8xlarge instances running mongos, for a total of 10.
Unfortunately, at that point he ran into the network bandwidth limitations of AWS. Amazon provides 10 Gb/sec, which it turns out is the actual bottleneck. Even though the theoretical limit under that AWS limitation is between 500,000 and 600,000 document inserts per second, the sustained throughput was only 228,000, yielding a total time of 3.15 hours for the initial load. In a private datacenter with dedicated networking gear, MongoDB could certainly insert over a million per second, but André's experiment was limited to AWS. Not to worry, though. While we may not be able to saturate The Force during the initial bulk load, we see its true potential emerge during indexing. With 100 shards working in parallel, and every document in RAM, the ts_1_st_1 index was done in a shocking 5 minutes.
Next Time
So far we've described the two deployments and looked that their cost and performance in bulk loading the ISD. Coming up next, we'll finish our review of the face-off with an analysis of query performance on both indexed and unindexed queries.
If you’re interested in learning more about the performance best practices of MongoDB, download our guide:
About the Author - Avery
Avery is an infrastructure engineer, designer, and strategist with 20 years experience in every facet of internet technology and software development. As principal of Bringing Fire Consulting, he offers clients his expertise at the intersection of technology, business strategy, and product formulation. He earned a B.A in Computer Science from Brown University, where he specialized in systems and network programming, while also studying anthropology, fiction, cog sci, and semiotics. Avery got his start in internet technology in 1993, configuring apache and automating systems at Panix, the third-oldest ISP in the world. He has an obsession with getting to the heart of a problem, a flair for communication, and a devotion to providing delight to end users.