Welcome to our multi-part series on operations best practices. This post -- the first of three -- focuses on the essentials you need to know to get started. We will provide an overview here, but you can get more detail by downloading the MongoDB Operations Best Practices guide.
You should familiarize yourself with MongoDB’s architecture even if you are not involved in designing the application. The decisions your team makes during the planning phase has direct implications on achieving the SLA requirements on your MongoDB application.
Consider each of the following aspects and its impact on running MongoDB when getting started. As with any technology, the more you know and plan ahead the better the deployment outcome.
Get started on the right foot with your schema design
Based on our experience with thousands of customer deployments, we know that nothing impacts performance more than a well-designed schema. This point holds true for any database. Developers and data architects should collaborate early on in the project to nail down the right data model to align with the application requirements.
As a best practice, the application’s data access patterns should govern schema design with specific understanding of the read/write ratio of database operations, the types of queries and updates performed by the database, and the life-cycle of the data and growth rate of documents.
You should document the operations performed on the application’s data, comparing how these are currently implemented in the existing database and how MongoDB could implement them. This analysis is particularly useful to identify the ideal document schema for the application data and workload, based on the queries and operations to be performed against it. You can find more detail on this and an example in our RDBMS to MongoDB Migration Guide.
And it’s not just about getting the schema right early on. MongoDB’s dynamic schema means that you can continue to iterate on the data model throughout the project lifecycle to optimize performance and storage efficiency, as well as support the addition of new application features. So pay close attention to the schema throughout and make sure its optimize to keep your deployment running smoothly.
Indexing for optimal performance
Proper indexing is perhaps the next most important factor in application performance. Too little, too many or the wrong indexes can severely downgrade performance. You should always create indexes to support queries, but you should not maintain indexes that queries do not use. This consideration is particularly important for deployments with insert-heavy workloads.
To check for proper index coverage, take advantage of the MongoDB explain() method, a new feature available in MongoDB 3.0. This lets you calculate and review query plans without running the query first. The query plan can be applied to a broader set of query types, and error handling is improved as a result.
One of the key advantages of MongoDB is that we offer the flexibility and scale of non-relational databases while having the rich query functionality of relational databases. For example, we offer many different types of secondary indexes including compound, geospatial, text search, unique, array, TTL, sparse, and hash indexes. Reference the MongoDB Architecture Guide for greater details on each of these secondary indexes.
Take advantage of pluggable storage engines
MongoDB 3.0 introduces a new storage engine API that currently allows for two supported storage engines: the default MMAPv1 engine and the new WiredTiger storage engine. More engines are expected to be added in the future.
With WiredTiger, document-level concurrency control and native compression will result in lower storage costs, greater hardware utilization, and better and more predictable performance on most workloads. You also benefit from being able to use multiple storage engines within a single MongoDB replica set, making it easy to evaluate and migrate engines to best suit your workload. Learn more about upgrading to the WiredTiger storage engine in the documentation.
Scaling your MongoDB application
With MongoDB, you can scale-out your deployment horizontally with a technique known as sharding. MongoDB distributes data across multiple physical partitions called shards. There are multiple options for scaling – including range-based, hash-based and location-aware sharding. Data is automatically balanced across shards, and shards can be added and removed without taking the database offline. Sharding is transparent to the application.
But not all deployments require sharding as it doesn’t solve your problem if you have poor schema or incorrect indexes. You should shard when a specific resource becomes a bottleneck on a single machine or replica set, and you can't add more of that resource at a reasonable cost. You may need more disk I/O throughput, or more RAM, or occasionally more storage or more concurrency. In these instances, sharding makes sense. If you have location-aware requirements where the data needs to be assigned to a specific data center for compliance or to support low latency local reads and writes, then you can use sharding for data distribution as well.
If you do find the need to shard, selecting a proper shard key is important. Dive into the documentation for best practices on choosing shard keys. But know that there are at least three key criteria to consider when doing so: A shard key should exhibit high cardinality, writes should be evenly distributed across all shards based on the shard key, and queries should target a specific shard to maximize scalability.
Proper capacity planning
The working set is the set of data and indexes accessed during normal operations. Proper capacity planning is important for a highly performant application. So as a best practice, your working set should fit the RAM. If it doesn’t, consider increasing the RAM or adding additional servers to the cluster and sharding your database.
You should take advantage of the workingSet document, an output included with the serverStatus command document, which provides an estimated size of your working set and lets you know when the working set is approaching current RAM limits. This is useful for proactively taking action to ensuring the system is properly scaled.
Setup and configuration
MongoDB provides repositories for .deb and .rpm packages for consistent setup, upgrade packages for consistent setup, upgrade, system integration, and configuration. MongoDB’s configuration file allows you to store configuration options and implement consistent configurations across entire clusters. Take advantage of automation in MongoDB Management Service (MMS) and Ops Manager if you need to provision and upgrade complex deployments involving replica sets and sharded clusters. You should also upgrade software as often as possible to take advantage of the latest features.
There are many resources to help you get started. In addition to the , there is a plethora of resources on our website that you can plumb: from the webinar archives to conference presentations. You can also take the free, M102-level course on MongoDB University to get in-depth instruction on operating MongoDB. And if you are looking for specific guidance on your particular use case, consider engaging us for a consulting service.
Next week, we’ll continue the series with a look at how to best manage MongoDB.
About the Author - Pam
Pam is a product marketer with over a decade of experience leading marketing programs for B2B technology products. Before MongoDB, she worked at DoubleClick, the ad serving platform company, and then at Google where she worked on marketing display advertising products for over 5 years. Pam earned her BA in History from Barnard College and an MBA from Duke University.