I had the chance to sit down with Will Button, a Site Reliability Engineer at myList, a tool for brands to leverage influencer relationships across their social platforms.
Tell us a little bit about your company. What are you trying to accomplish? How do you see yourself growing in the next few years?
myList is a SaaS platform that helps brands own their fan relationships, letting them socially enable all of their digital marketing assets in ways that create persistent, viral connections. Brands use myList to present their products (and ecommerce opportunities for those products) in engaging ways that are natively integrated into the social experience, naturally building long-term, direct relationships between brands and their fans, and dramatically expanding reach by helping fans carry brand content to their friends.
myList is in rapid expansion mode, both in terms of clients and product offerings. Already, many Fortune 500 brands like Campbell's Soup, Philips Electronics, and SCJohnson are using myList, both in the US and Internationally. Many more potential clients are in active trials. myList functionality and content are also expanding. We currently have product visibility and ecommerce opportunities for over 16 million products, and growing. myList is regularly adding new tools across all social network, and evolving our robust dashboard/backend that clients use to watch connections and reach grow, and to gain valuable new insights into their audience.
How are you using MongoDB? What problem were you trying to solve and how does MongoDB help you solve that problem?
myList’s app is designed to be a social app that revolves around products. Given that social and ecommerce are both very dynamic markets, we needed a database that would support and enable us to move fast and scale. We’re also a startup- meaning that our product evolves as our customer base grows. Helmuth Von Moltke stated “no plan survives contact with the enemy”, and I believe similarly that no MVP survives contact with the customer. Your database, and your entire infrastructure, really, has to be able to adapt as new information becomes available if you’re going to be successful. We’ve been MongoDB from the start and it has met our needs very well.
If your past experience is with RDBMS, describe your experience learning MongoDB.
I’ve had experience with most of the major RDBMS environments: Microsoft SQL Server, MySQL, Postgres, Oracle and Microsoft Access (that counts, right? I’m pretty sure someone endorsed me for that on LinkedIn!). MongoDB has a ton of great resources that make it easier to learn. MongoDB shares many of the same concepts, operations, policies, and procedures that you’re already familiar with from the relational world. Processes and best practices for monitoring, indexing, tuning, backups, etc. are applicable to MongoDB. And if you want to jumpstart your training, free online classes from developers and DBAs are available through MongoDB University.
The user community is extremely helpful, too. From being able to access MongoDB staff to getting solid answers from the irc chats to local Meetup.com groups, there is a wealth of support out there.
How did you hear about MongoDB?
I first heard about MongoDB several years ago, when the company I was working for at the time was evaluating it.
Did you consider other alternatives, like a relational database or non-relational database? Why did you choose to use MongoDB instead of these other databases?
For myList, using a NoSQL database was a necessity. Not only has our document schema evolved over time, but even in a document-by-document comparison the schema can be different due to the social nature of our environment. In an RDBMS world, that equates to either a lot of work to keep a normalized database, or a lot of empty space trying to account for all possible scenarios. Scalability is another big win for MongoDB (and for us).
Tell us a little bit more about your MongoDB Deployment environment.
We have 3 different production database environments, separated based on role and SLA protection. Each environment is sharded across three replica sets, creating total MongoDB cluster of 27 nodes. Mongos runs as an app on each application server, and is configured for its respective environment. One environment resides in our physical datacenter while the other two reside in AWS. We have a dev environment as well in AWS, but it’s much smaller than our production environment. All MongoDB servers in the datacenter run on standalone hardware on CentOS 6.5 and MongoDB 2.4.8. Out in AWS, we use the Amazon Linux AMIs with MongoDB 2.4.8. All instances are located in the same availability zone in order to control costs by eliminating availability zone data transfer charges. When we first sharded, we had about 8 million products in the database. We now have over 16 million.
Can you share any best practices on scaling your MongoDB infrastructure?
Tons. One of the great things about working for a startup is you get continually challenged to succeed. We found ourselves in a unique set of circumstances that resulted in an unexpected need to shard.
My advice for scaling is this: shard early, shard often, shard before you think you need to. Proper consideration needs to be given to the shard key that is selected, and there is a lot of documentation and best practices out there to help guide you.
Part of any sharding exercise on any database, you need to ensure that the operations and dev teams and the business are properly aligned so implementation can be efficient.
Finally: use MMS and learn the key metrics available to you there. It’s free, and it’s designed by the same guys who gave you MongoDB, so odds are: the metrics displayed on MMS aren’t there on accident!
How is MongoDB performing for you?
It’s performing well. We’ve scaled quite a bit on two of our databases. The dramatic improvement we saw from scaling was page faults dropped from 40+ page faults per second to just a few here and there now (usually during one of the more thorough task processes) and lock rate was consistently over 50% prior to sharding, and now rarely breaks 1%.
Do you think that upgrading your memory would have helped here rather than relying simply on sharding?
Yup, I think so. At the time I felt adding memory was short-sighted. Increasing the memory would have solved the problem for the moment, and a generous dose of RAM would have bought a little extra time, but eventually we were going to reach limits where the RAM requirements for the servers would be pretty significant and the penalty for unexpected events seemed pretty steep.
You mentioned using MMS. Can you describe the tools you use to monitor, manage and backup your MongoDB deployment.
We use MMS for monitoring performance. It’s free and super easy to get going with. I love the new custom dashboards- I have dashboards built for each database environment so at a glance, I can quickly see what’s going on specific to that environment. The dashboards are global too- so I know that everyone is talking about the same thing when we refer to a dashboard.
We’re using MMS to monitor and backup our core databases as well. MMS is phenomenal. After setting up the backups, I got an email from Steve Briskin at MongoDB: “Hey- we noticed that your backups aren’t working correctly, so we turned off the agent so you’re not getting charged for it. When you get a chance, can you send us rs.conf, rs.status, etc...” I sent over the info he requested, he found and corrected the problem and also made some recommendations to our environment to improve a few things. That was an amazing experience and an awesome level of support.
We also have a Zabbix installation. While MMS does the monitoring, I use Zabbix for alerting. Zabbix monitors and alerts for all of our servers in our infrastructure, not just MongoDB so it makes it convenient and eliminates multiple alerts from different tools.
Are you integrating MongoDB with other data analytics, BI or visualization tools like Hadoop? If so can you share any details.
Nothing cool or fancy. We ship our apache httpd logs into hadoop for processing and there’s some querying of MongoDB by some of the hadoop jobs. Metrics is an integral part of what we do, so I would expect to see this area of operations expand in the near future.
Do you have plans to use MongoDB for other applications? If so, which ones?
If there were a MongoDB fanboy shirt, I’d wear it. Seriously, I like MongoDB a lot. It’s got so much going for it in terms of support, sustainability, scalability, performance and potential. I’m sure there are projects where goals and expectations weren’t achieved, but from my experience- MongoDB is very transparent on all fronts. RDBMS still has it’s place, and probably will for a long time, but for many applications: MongoDB is a great choice.
What advice would you give someone who is considering using MongoDB for their next project?
First and foremost: as soon as you click on “Download”, before the download even completes (you might have to be quick, depending on your download speed), head on over to http://university.mongodb.com and enroll in a course. There’s some great stuff in those courses. If you’re serious about MongoDB, the University is a core requirement, and you can’t go wrong with the price--it’s free!
Second- find and meet people who are using MongoDB and ask questions face to face. You can get a lot of knowledge transfer in just a few minutes of face to face communication that is near impossible via IM, email, chat, etc. Plus everyone I’ve ever met who is working with MongoDB is super cool, approachable and was in your shoes just a short time ago.
Like what you see? Sign up for the Monthly Newsletter to get news and updates from MongoDB