This will be the first post in an ongoing series based on our popular webinar about the differences in building an application using SQL versus building the same application using MongoDB.
First off - before we get into anything - what is it that we’re all trying to achieve with our data? Sure, there are necessary evils such as reading and writing data from databases, putting data on message buses, and working through open source integrations, but at the end of the day what we really want to do is take information, compute it, and get it in front of the right people to help them make better decisions.
Business demands often seem pretty simple - “I just want to save my trades” or “Can we make a product catalog that handles ¥, £, and $ seamlessly?” - but the way that the data is expressed in code or in the application is different from the way it’s framed in a business use case. When you get to how the use case is implemented in the database, the differences are even more pronounced.
And why is that? One reason is that innovation in business and in the code/application layer has far outpaced innovation in database technologies. RDBMS came onto the scene in 1974. Since then, business goals have changed, the pace of business has increased (time to market actually matters a lot now), and we’re using technologies we could not possibly have imagined would exist 40 years ago. Back then we had fairly simple languages that were well-mated to the ‘rectangular’ and ‘flat’ RDBMS world. Today, we have extremely powerful languages of all types with never-ending streams of updates coming from the open source ecosystems we’ve built. The only thing that’s constant is change.
In 1974... | In 2014... | |
---|---|---|
Business Data Goals | Capture my company's transactions daily at 5:30PM EST, add them up on a nightly basis, and print a big stack of paper | Capture my company's global transactions in real-time plus everything that is happening in the world (customers, competitors, business/regulatory/weather), produce any number of computed results, and pass this all in real-time to predictive analytics with model feedback. Then delivery results to 10s of thousands of mobile devices, multiple GUIs and b2b/b2c channels. |
Release Schedule | Semi-Annually | Yesterday |
Application/Code | COBOL, Fortran, Algol, PL/1, assembler, proprietary tools | C, C++, VB, C#, Java, Javascript, Groovy, Ruby, Perl, Python, Obj-C, SmallTalk, Clojure, ActionScript, Flex, DSLs, spring, AOP, CORBA, ORM, third party software ecosystem, the entire open source movement ... and COBOL and Fortran |
Database | I/VSAM, early RDBMS | Mature RDBMS, legacy I/VSAMColumn & key/value stores, and ... MongoDB |
That’s where NoSQL comes in, in particular MongoDB. What makes MongoDB special is that it stores data in rich structures (maps of maps of lists that eventually drill down to integers, float point numbers, dates, and strings). MongoDB was designed to not only fluidly store these objects, but also to present them in APIs and with a query language that knows how to understand all the types of data you’re storing. This is in stark contrast to the legacy technologies designed and built in the programming environments of the 1970s.
In MongoDB, your data is the schema and there is symmetry between the way data goes into the database and the way it comes out. With traditional technologies, the differences between what it means to put data in and take data out increase as applications get more complex. The examples we cover in this series will demonstrate these concepts.
And finally, no MongoDB primer would be complete without the following diagram:
The image on the left illustrates storing customer data in ‘rectangular’ tables and the schema diagram needed to make sense of it all. The image on the right illustrates how data is stored in MongoDB. Suffice it to say that the diagram on the left is more complicated than the one on the right. Now, to be fair, the diagram on the left does contain more entities than just a customer and his phones, and it in all likelihood, it didn’t start out looking like that. It was probably relatively simple in the beginning. Then someone needed something that wasn’t a scalar, and someone else needed a few other things, and before they knew what happened, what was once manageable exploded into what you currently see.
Now let’s get into the actual differences between SQL and MongoDB and how we transition our thinking using code.
Some ground rules:
- We’ll be using Java
- Assume we have a data access layer in between our application and MongoDB
- In terms of the date counts as we go through the examples, just treat them as progress indicators and not the actual time needed to complete the task.
- We won’t get into exception or error-handling. We won’t muddy the code with boilerplate or persistor logic that does not change from day to day. We won’t get into the database connection or other setup resources. The primary focus will be the core data-handling code.
SQL vs MongoDB: Day 1
We’re going to start with a map, which will let us move data in and out of our data access layer. Maps are rich shapes so we can stick a lot of things into them, and perhaps more importantly, there is no compile-time dependency.
The Task: Saving and Fetching Contact Data
We’ll start with this simple, flat shape in the data access layer:
Map m = new HashMap();
m.put(“name”, “buzz”);
m.put(“id”, “K1”);
We’ll save it in this way:
save(Map m)
And assume we can fetch it by primary key in this way:
Map m = fetch(String id)
In our initial case, we just have two very simple things, “save” and “fetch”. Rich queries will come later on. This is what this might look like in the code for both SQL and MongoDB:
In MongoDB, “data is the schema” and we are not required to create a table. If you look closer at the fetch functions, you’ll notice that they are largely the same. An important takeaway from this is that in MongoDB, your basic way of addressing the database is actually very similar to how you would do it in an RDBMS. You construct the query, you pass it in, you get back the cursor, and you iterate over the cursor. The fidelity of the data moving out will change as we progress through these examples, but for now lets assume that we have parity.
SQL vs MongoDB: Day 2
Let’s add two fields: a title and a date.
The Task: Adding simple fields
m.put(“name”, “buzz”);
m.put(“id”, “K1”);
m.put(“title”, “Mr.”);
m.put(“hireDate”, new Date(2011, 11, 1));
Notice that we’re putting an actual Date object into the “hireDate” slot, not a string representation like “2011-11-01”. In a well-designed data access layer, we always want to use the highest fidelity objects we can. Dates in particular need to be treated this way to avoid YYYYMMDD vs. YYYYDDMM confusion.
This is what the access layer implementation might look like to use SQL:
The first thing to notice is the alter table problem. Before we can touch any code that’s going to interact with the new database, we have to first change the table definition; otherwise, the select statement that’s going after the new fields is simply not going to work. This is nothing new. It’s something that everyone has become numb to over the past 40 years using RDBMS. There are a few other things that developers might also need to consider like case sensitivity, etc.
Now let’s look at what this looks like in MongoDB:
What did we have to change in MongoDB? Nothing.
We put title and hire date into the map, and we simply inserted the whole map into MongoDB. The previously inserted items remain unchanged, without title and hire date fields. If backfilling older items that do not have those fields is important, we can easily write a 4-line javascript program that iterates over the collection and sets default values for title and hire date.
Next week, we’ll cover adding lists - where the real fun begins.
In the meantime, if you’re looking for migration advice, take advantage of our offer to chat with a MongoDB expert for free.
About Buzz Moschetti
Buzz Moschetti is an Enterprise Architect at MongoDB. He is the former Global Head of Architecture for the Investment Bank division of JPMorganChase. His responsibilities there included information and software architecture vision and strategy, business applications and process development, and solutions standards and design leadership for a $2.2b budget funding 9000 professionals worldwide for all lines of business including equity & debt capital markets, listed and OTC derivatives, commodities, institutional, proprietary, and customer trading, settlement and clearance, and research. Prior to that position, he was the Chief Architecture Officer of Bear Stearns from 2000 to 2008 with similar responsibilities. Earlier in his career at Bear Stearns, he was the Director of Technology for the Financial Analytics and Structured Transactions Group (F.A.S.T.) in the Fixed Income dept. and led the design and development of that unit’s HYDRA and BondStudio portfolio valuation applications suite. His areas of expertise include enterprise data design, systems integration, and multi-language tiered software leverage with C/C++, Java, Perl, Python, and Ruby. He holds a bachelor of science degree from the Massachusetts Institute of Technology.