This is a guest post by John A. De Goes, CTO of SlamData Inc.
Thanks to high usability, robust scalability, and powerful query capabilities, MongoDB is by far the most popular NoSQL database in the world. In fact, MongoDB is the only NoSQL database within reach of eclipsing the leading relational databases (including PostgreSQL).
Now there’s another critical reason to choose MongoDB over the competition: SlamData, a new open source project, brings native SQL analytics to MongoDB. There’s no data relocation, no replication, and no ETL: just point and query.
SlamData makes it easier to introduce MongoDB into your company or project, either for building new apps or migrating legacy ones.
The Bane of NoSQL: ETL and Query Migration
MongoDB has a powerful, developer-friendly API for querying data.
But for companies evaluating MongoDB, the fact that the API is the only way to query data can be a serious obstacle to adoption. APIs need developers to use them, and developers should be building cool features, not writing code to do one-off queries for the business.
In fact, for large companies with teams of data-hungry analysts, the ability to query MongoDB data with a familiar SQL interface is a business requirement.
Beyond just data accessibility, many companies who are thinking about migrating an app to MongoDB have a truckload of important reporting and analytical SQL queries they run daily. The thought of translating all these queries to code (or setting up a large ETL machine) can be another reason to stick with RDBMS, even when MongoDB would be a better choice!
Until now, most companies who adopt MongoDB have solved these problems by setting up additional infrastructure and using ETL to relocate, normalize, and homogenize the data so it can be queried with traditional relational technology (MoSQL is a popular choice).
SlamData removes all these barriers to MongoDB adoption, and cements MongoDB’s position as the leading choice for Enterprise NoSQL.
Introducing SlamData
SlamData is an open source project that lets you execute ordinary SQL on MongoDB, with a high-level GUI for non-developers that adds collaboration and reporting features.
SQL queries are compiled into optimized workflows that run against the native MongoDB API (including generating map/reduce, as necessary).
SlamData’s dialect of SQL currently supports SELECT
, FROM
, JOIN
(all types), WHERE
, GROUP BY
, HAVING
, and a large number of functions and operators. In addition, SlamData adds a few syntactical extensions that allow accessing, aggregating, and manipulating nested data (including arrays), such as the syntax foo.bar.baz[*]
, which “unwinds” a nested array.
Many users of the SlamData project are either migrating to MongoDB from RDBMS, or considering migrating. They have a large number of reporting and analytical queries, and SlamData helps them reuse those queries.
SlamData is the first native analytics tool for MongoDB that lets companies have their cake and eat it, too: developers can leverage the hottest NoSQL database in the world, and the business can use (or reuse!) ordinary SQL to drive data exploration and ad hoc analytics.
Application Intelligence
The sweet spot for SlamData is helping companies understand large amounts of MongoDB data generated or captured by web and mobile applications.
The process of understanding this data, which we call Application Intelligence, is vital to multiple stakeholders in the business:
- Product. How can we use this data to improve the product or better understand users? How can we allow users to learn from the data themselves?
- Marketing. What does this data tell us about how users are using the application? Can this application data help us better understand marketing ROI?
- Support. How can this data be used to help identify and resolve issues that users are having?
- Managers. What does this data tell us about resource allocation? How can we tie this data to sales and other data sets?
IT. What type of data is being generated by the application, and how might we tune the database for this kind of data?
In the relational world, to answer these types of questions you’d use a data discovery and ad hoc analytics tool. But for MongoDB, these tools don’t work because of the different data model. SlamData provides a first-class solution to these problems.
The SlamData Application
SlamData’s dialect of SQL (called SlamSQL) extends ANSI SQL to support nested data, heterogeneous data, and aggregation over nested dimensions (for example, summing elements in an array stored inside a document).
If your data is flat and normalized, you can stick with straight-up ANSI SQL. But more than likely, you’ll eventually start nesting data and taking advantage of MongoDB features like arrays, and that’s when the extensions really shine.
An example SlamSQL query is shown below:
SELECT DISTINCT user_name, SUM(music[*].likes[*].strength) AS strength FROM collection WHERE music[*].likes[*].name='davidbowie' GROUP BY user_name ORDER BY strength DESC LIMIT 10
In this query, documents which are doubly-nested in arrays are being used to filter and sum values in the overall result. This query would be impossible in an RDBMS, and the equivalent code for the MongoDB API would be very difficult to write, troubleshoot, and understand (and, of course, accessible only to developers!).
By leveraging industry standard SQL, SlamData makes it possible for a wide range of users and tools to interface with MongoDB, and helps teams quickly and easily understand the data generated or collected by their MongoDB applications.
Opening up the Box
At a technical level, the SlamData project innovates in several key ways:
- Structural type inference. SlamData does not scan the database to learn the structure of the data. Instead, SlamData uses a structural type system, complete with bidirectional type inference, which allows SlamData to parse the intent of a query and generate an execution plan consistent with that intent. For example, if your query uses a field as if it were a string, then SlamData will look for documents in which the field is a string. SlamData will also warn you when you attempt to do nonsensical things, like adding 4 to a string, because even though SlamData doesn’t know what’s in the database, it does know what operations make sense on what data types.
- Multi-dimensional relational algebra. SlamSQL is built on a formal extension of relational algebra called multidimensional relational algebra (MRA). This more powerful (but backward-compatible) foundation allows slicing, dicing, and aggregating nested, nonuniform data. As a pleasant side-effect, it also gives a sensible semantic to many SQL queries which are not allowed in ANSI SQL (for example, SELECT price / SUM(price) AS percent FROM ORDERS).
- Advanced multi-staged compilation. MongoDB has three distinct mechanisms for executing a query (one of them being full-fledged map/reduce), and each has different strengths and weaknesses. In general, efficiently executing a complex query might require a combination of all three. SlamData has an advanced multistage, optimization planner which attempts to find the optimal combination of all three mechanisms (it strongly biases to the aggregation pipeline whenever possible).
- In-database execution. SlamData is extremely aggressive about pushing execution of queries into the database. In fact, 100% of every query will be executed directly in the database, with no streaming back to the client for post-processing. Other attempts at solving this problem rely on client-side processing for most queries, because executing every part of every query inside the database is very difficult to do in a performant way (hence the need for the advanced, multi-staged compilation).
The combination of these features make SlamData “point and query”: point SlamData at your MongoDB database, and do whatever you want on any kind of data. SlamData will generate the optimal query plan and execute it 100% in the database.
Of course, if you plan on handing off SlamData to data analysts or business users, you’d probably want to make sure that queries hit read-only replicas, or possibly even use MMS to fire up a replicate cluster just for data discovery and ad hoc analytics.
Learning More
If you are using MongoDB and would like to try SlamData, you can find installers on the official website, or you can compile the project from source code on Github.
SlamData is a 100% open source project, so if you like what you see, please consider supporting the project in various ways:
- Watching, forking, and starring the repositories.
- Submitting pull requests, bug reports, and feature requests.
- Spreading the word about SlamData (Twitter, Reddit, etc.).