As a top 10 global retail brand with 170+ million active buyers and 1 billion live listings across 190 markets around the world, eBay cannot afford systems downtime. This is why the company relies on MongoDB as one of its core enterprise data platform standards, powering multiple, customer-facing applications that run ebay.com.
At this year’s MongoDB World conference, Feng Qu, eBay’s lead NoSQL DBA, presented Practical Design Patterns for Resilient Applications – a set of architectural blueprints his team has developed to support enterprise-class MongoDB deployments.
Mr. Qu began his session discussing how the concept of availability has changed over the years. In the past, it was acceptable for sites to take scheduled downtime for weekly maintenance events. With the global nature of today’s services, neither users, or the business, are quite so accepting! In addition, most organizations now build out their services on commodity hardware platforms, rather than the exotic Sun Solaris / Sparc servers of yesteryear. While commodity hardware is much less costly, it also fails much more regularly. Both of these factors radically alter how engineering teams consider availability, and has led eBay to create its “Resiliency Design Patterns” to institute database best practices that maximize Mean Time To Failure (MTTF) and minimize Mean Time To Recovery (MTTR).
To build their apps, eBay developers can choose from five corporate-approved database standards. Alongside MongoDB, teams also have the option to use Oracle or MySQL relational databases, and two NoSQL options. Mr. Qu’s DBA team provide guidance on the appropriate database choice, qualifying the selection against the application’s data access patterns, user load, data types, and more.
eBay currently runs over 3,000 non-relational database instances powering a range of applications, managing multiple petabytes of data between them. In the past Oracle was the System of Record, while the non-relational databases handled transient data used in “systems of engagement”. However, the non-relational database landscape has matured. With consistent, point-in-time backup and recovery, MongoDB now also serves System of Record use cases at eBay.
While all of eBay’s non-relational database choices offer built in resilience to failure, they make different design tradeoffs that can impact application behavior. The DBA team assesses these differences across six dimensions: availability, consistency, durability, recoverability, scalability, and performance. For example, those NoSQL databases using peer-to-peer, masterless designs have expensive data repair and rebalancing processes that must be initiated following a node failure. This rebalancing process impacts both application throughput and latency, and can cause connection stacking as clients wait for recovery, which can lead to application downtime. To mitigate these affects, eBay has had to layer an application-level sharding solution, originally developed for its Oracle estate, on top of those masterless databases. This approach enables the DBA team to divide larger clusters into a series of sub-clusters, which isolates rebalancing overhead to a smaller set of nodes, impacting just a subset of queries. It is against these different types of database behaviors that the eBay DBA team builds its Resiliency Design Patterns.
Mr. Qu presented eBay’s standard “MongoDB Resilience Design Pattern”, as shown in Figure 1 below.
Figure 1: eBay design pattern for it’s MongoDB Resilience Architecture. (Image courtesy of eBay’s MongoDB World presentation).
In this design pattern, a 7-node MongoDB replica set is distributed across eBay’s three US data centers. This pattern ensures that in the event of the primary data center failing, the database cluster can maintain availability by establishing a quorum between remaining data centers. MongoDB’s replica set members can be assigned election priorities that control which secondary members are considered as candidates for promotion in the event of a primary failure. For example, the nodes local to DC1 are prioritized for election if the primary replica set member fails. Only if the entire DC1 suffers an outage are the replica set members in DC2 considered for election, with the new primary member selected on the basis of which node has committed the most recent write operations. This design pattern can be extended by using MongoDB’s majority write concern to enable writes that are durable across data centers.
The standard MongoDB design pattern is used as the basis for eBay’s “Read Intensive / Highly Available Read Pattern” discussed in the presentation, which is used to power the eBay product catalog. For the catalog workload, the MongoDB replica set is scaled out to 50 members, providing massive data distribution for both read scalability and resilience.
For more write-intensive workloads, eBay has developed its “Extreme High Read / Write Pattern”, which distributes a sharded MongoDB cluster across its US data centers.
Figure 2: eBay design pattern for the MongoDB Extreme High Read / Write Pattern. (Image courtesy of eBay’s MongoDB World presentation).
Again, eBay developers can configure this design pattern with specific MongoDB write and read concerns to tune the levels of durability and consistency that best meet the needs of different applications.
Mr. Qu noted that with recent product enhancements, MongoDB is being deployed to serve a greater range of application needs:
- The addition of zone sharding to MongoDB 3.4 now enables eBay to serve applications that demand distributed, always-on write availability across multiple data centers.
- Retryable writes, targeted for the forthcoming MongoDB 3.6 release, will allow eBay to reduce application-side exception handling code.
Review the recording of Feng Qu’s presentation at MongoDB World to learn more about eBay’s Design Patterns.
Download the MongoDB Multi-Data Center Deployments guide to get deeper insight into enabling active/active data center deployments and global data distribution with MongoDB.