As data volumes grow, and business agility becomes the differentiator between success and failure, enterprises are increasingly re-evaluating legacy technology choices. In order to meet customer demands for higher performance and to develop the new, data-rich services demanded by the business, IHS Markit made the decision to migrate to MongoDB for its Data Delivery Service.
I had the opportunity to meet with Sander Van Loo, Director of Data Delivery & Index Administration Services at IHS Markit, to discuss why his team selected MongoDB, and the results of the company’s migration.
Can you start by telling us a little bit about your company?
IHS Markit, headquartered in London, is a world leader in delivering next-generation information, analytics, and solutions to customers in business, finance, and government. Through our information products and services, we provide our customers with 360-degree views of risk, opportunity, and financial impact. IHS Markit has more than 50,000 key business and government customers, including 85% of the Fortune Global 500 and the world’s leading financial institutions.
What is the role of the Data Delivery group?
We generate market and pricing data used by our financial services customers for the valuation and risk management of derivatives. We aggregate and analyze data from both our own internal data products and from third party data sources, which we then present to customers across multiple delivery channels and APIs.
All market and pricing data was stored in a legacy relational database, but we are now migrating that across to MongoDB.
What factors are driving your migration to MongoDB?
Application agility, performance, and scalability.
Can you tell me more about your challenges around application agility, and how MongoDB has helped?
Our data sources don’t have a fixed schema. Each source uses a different structure, and those structures can rapidly change as new attributes are added. We had to develop a schema generator to transform the data dictionaries that defined the data structures into a relational database schema. However, as the number of data sources increased, and as structures themselves changed, maintaining the schema generator consumed too much development resource. Rather than focusing on building application and analytics functionality, my team were increasingly occupied with schema management tasks.
MongoDB’s flexible document data model means we don’t need to predefine schemas. In addition, all of the data we ingest is formatted as JSON objects, so we can store it much more quickly without transformations or flattening it into rows and columns.
As a result of MongoDB, my developers are much more productive, working on projects that differentiate our products and improve customer experience.
How about performance and scalability?
The performance of our legacy relational database wasn’t predictable enough to provide performance SLAs on our deliveries, and its scaling model was not particularly suited to maintain performance with continuously growing data sets. We used eager in-memory caching frameworks to work around these challenges. This approach works well for current market data but it will not scale to years of historical data across millions of time-series data points.
The trigger to evaluate non-relational databases came when we started to develop enriched application functionality. We wanted to allow our customers to request not just current market data, but also provide access to historical time series data and support point in time snapshots on that data, totalling over 30TB. We constantly strive to reduce latency so that our data reaches customers sooner, and enable on-demand delivery of historical data. Any delay in presenting data makes our services less competitive.
Why did you choose MongoDB?
For our use case, MongoDB delivered higher scalability and better price/performance. We also found the document data model to provide richer schema flexibility than alternatives. It was more natural for our developers to work with, which increased their productivity.
How would you rate MongoDB’s performance?
From our proof of concept with 7 billion records, we’ve found that:
- Reading end-of-day market data collected over a week takes 7 milliseconds on MongoDB compared to 758 milliseconds previously.
- Reading end-of-day market data collected over a year takes 10 milliseconds on MongoDB compared to 2,523 milliseconds previously.
Our testing concluded that using MongoDB reads were up 250x faster, writes were 10x faster, and the database required only 35% of the storage space.
Now we can ingest a GB of data in less than 4 seconds, process it, and serve it out to our customers in less than 2.5 seconds. Performance improvements of this magnitude helps us deliver market leading SLAs and increase our customers’ opportunities to act on the data ahead of their competitors.
What does your deployment look like?
We have deployed a 4-shard MongoDB cluster across three sites located in London, Amsterdam, and New York. This configuration provides resilience to data center failures and allows us to co-locate data closer to users. As we onboard new data sets, we expect to scale the cluster by a factor of 3x to 4x over the next couple of years. MongoDB enables us to support our expected business growth with a scalable business model, and an architecture that is ready for the cloud.
How do you manage MongoDB?
Ops Manager is used for provisioning, configuring, and backing up MongoDB. We can drive these operations from the GUI, and so we don’t need to invest in developing complex scripts. Continuous backups and cross-shard snapshots across multiple regions provide the data protection demanded by the business. The Ops Manager API pushes all monitoring telemetry into our enterprise management platform so we can gain complete visibility of the application stack. We also use MongoDB Compass for schema and query optimization.
We have taken advantage of MongoDB's consulting services – our Dedicated Consulting Engineer acts as a trusted advisor in ongoing development, and helps build out our team’s operational capability.
You recently upgraded to MongoDB 3.4? What most excites you about this release?
The features we’re most interested in include:
- Intra-cluster network compression. Our data is time-sensitive. We get lots of traffic bursts as batches of data are ingested into our platform, and then need to be processed across the cluster and released to customers around the world. Reducing network traffic by up to 70% is helping to deliver further latency improvements.
- The new Decimal data type is helping to simplify our complex financial data.
- Multi-faceted aggregations provides multiple filters and categorizations to guide data browsing and analysis.
- Ops Manager server pools will allow more seamless provisioning of new MongoDB instances into our database services.
Are you considering MongoDB for other applications?
We are always looking for opportunities to use MongoDB for new applications. One area of interest is advanced risk factor computations. We plan on using Apache Spark R and Scala APIs to process market data stored in MongoDB, using the Apache Spark Connector for MongoDB.
How are you measuring the impact of MongoDB on your business?
- Higher revenue. In our business, seconds matter. Faster data ingest, processing, and serving translates directly to more revenue for our customers, and improved customer acquisition and retention for us.
- New applications. MongoDB allows our developers to innovate faster and deliver new functionality. We are better able to upsell value-added products to our customers.
Sander, thanks for sharing your experiences with the community.
Want to break free from the constraints of relational databases? Download our Relational Database to MongoDB migration guide.