Earlier this week, we had the pleasure of co-presenting a webinar with our partner, Tableau. Buzz Moschetti (Enterprise Architect at MongoDB) and Vaidy Krishnan (Product Marketing at Tableau) rolled out the updated MongoDB Connector for BI. In addition to explaining how the connector works, Buzz created on-the-fly visualizations of a sample data set in Tableau.
When you pair Tableau’s ease of use, MongoDB’s flexibility, and the connector’s agility, your “time to analytics” gets a whole lot shorter.
Here are the highlights from the session.
What is the Connector for BI?
To answer that question, let's look at the ways MongoDB natively manipulates data.
Our highly expressive MongoDB Query Language (MQL) and the many operators in our Aggregation Framework are powerful tools to process and transform data within MongoDB. We have made many improvements to MQL over the years and with each release, we introduce new operators and different ways to manipulate the contents of your collections. While MQL has slowly incorporated much of the functionality of SQL, the Aggregation Framework will always use the pipeline/stage approach rather than the more grammatical style of SQL.
> db.foo.insert({_id:1, "poly": [ [0,0], [2,12], [4,0], [2,5], [0,0] ] });
> db.foo.insert({_id:2, "poly": [ [2,2], [5,8], [6,0], [3,1], [2,2] ] });
> db.foo.aggregate([
{$project: {"conv": {$map: { input: "$poly", as: "z", in: {
x: {$arrayElemAt: ["$$z”,0]}, y: {$arrayElemAt: ["$$z”,1]}
,len: {$literal: 0} }}}}}
,{$addFields: {first: {$arrayElemAt: [ "$conv", 0 ]} }}
,{$project: {"qqq":
{$reduce: { input: "$conv", initialValue: "$first", in: {
x: "$$this.x”, y: "$$this.y"
,len: {$add: ["$$value.len", // len = oldlen + newLen
{$sqrt: {$add: [
{$pow:[ {$subtract:["$$value.x","$$this.x"]}, 2]}
,{$pow:[ {$subtract:["$$value.y","$$this.y"]}, 2]}
] }} ] } }}
,{$project: {"len": "$qqq.len"}}
{ "_id" : 1, “len" : 35.10137973546188 }
{ "_id" : 2, "len" : 19.346952903339393 }
An example of an MQL aggregation pipeline to calculate the perimeter of simple polygons. Note that the polygons themselves are well-modeled as an array of points – each point itself being a two item array.
The native functions of MongoDB are an excellent match for the document data model and processing nested arrays within documents is uniquely suited for the pipeline methodology.
However, the fact remains that MongoDB does not speak SQL.
We were motivated to create the Connector for BI because of the robust ecosystem of SQL-based tools that empower everyone within an organization to get to data-driven insights faster.
Enter the Connector for BI 2.0.
The connector is a separate process that takes a MongoDB database and maps the document schema into a relational structure that is then held in MySQL.
One of the most powerful characteristics of the connector is that it is not bulk ETL processing. The Connector for BI provides a read-on-demand bridge between your MongoDB collections and your SQL-based tools.
How does the Connector for BI work?
As the Connector for BI is a tool built for the enterprise, we designed it with security and access control in mind. The Connector for BI accesses data stored in your MongoDB database using the same authentication and entitlements you created to secure your data. Fundamentally, that means you cannot process data through the connector that would be otherwise inaccessible from MongoDB directly.
Not only does this keep your data secure, it reduces the need for a separate set of credentials for your InfoSec team to manage.
Along with the connector, MongoDB provides a utility called 'mongodrdl' which examines a source MongoDB database and quickly constructs a default set of mappings between the structures it finds in MongoDB and the tables and columns appropriate to project in a relational schema. This utility is governed by the same security and access protocols as the connector itself.
Using Tableau with MongoDB
At MongoDB, we’re committed to helping developers focus on building next-generation apps and not on database operations. Likewise, Tableau's mission is to help people understand the insights behind their data regardless of skill set or functional role.
Part of this mission encompasses the notion that data will be coming from a wide variety of sources. This requires Tableau to work seamlessly with a broad range of data platforms. To accomplish this ever-growing task, the team at Tableau has engineered a range of data connectors in order to expose information to Tableau’s end user, regardless of where the source data sits. This is essential for Tableau to deliver on their promise of “code-free analytics.”
Tableau is also heavily invested in ensuring that queries run in their platform are returned at optimal speeds, regardless of platform.
As Vaidy put it, “Speed to insight is a function not only of query performance but of the entire process of analytics being more agile.”
That’s why MongoDB and Tableau are excited not only to optimize the speed at which data stored in MongoDB can be processed, but also to make the entire user experience more intuitive and seamless. The ability to capture data without ETL or to painstakingly reformat documents into a relational schema results in a significant reduction of cost and complexity.
How are teams using MongoDB and Tableau today?
Big Data today is not just limited to exploratory data science use cases. It's even being used for operational reporting on day-to-day workloads – the kind traditionally handled by data warehouses. Modern organizations are responding to these hybrid needs by pursuing use case-specific architecture design. This design strategy involves tiering data based on a host of factors including volume, frequency of access, speed of data, and level of aggregation. Broadly, these tiers are:
- “Cold” - Data in its rawest form, useful for exploration on large volumes
- “Warm” - Aggregated data for ad hoc diagnostic analyses
- “Hot” - Fast data for repeatable use cases (KPI dashboards etc.)
In most cases, organizations will use different stores for each tier. With that said,If a deployment is well-tuned and well-indexed, MongoDB can serve as a datastore for “cold” data (ex: data late), “warm” data (ex: a semi-structured data warehouse), or “hot” data (ex: computational data stored in-memory).
This means that there is a large spectrum of use cases for how MongoDB and Tableau can be deployed in parallel.
See the connector in action
To demonstrate how the connector works, we will be using a MongoDB dataset with information about 25,000 different New York City restaurants. Here’s what the documents look like:
> db.restaurants.findOne();
{
"_id" : ObjectId("5877d52bbf3a4cfc41ef8a03"),
"address" : {
"building" : "1007",
"coord" : [-73.856077, 40.848447],
"street" : "Morris Park Ave",
"zipcode" : "10462"},
"borough" : "Bronx",
"cuisine" : "Bakery",
"grades" : [
{"date" : ISODate("2014-03-03T00:00:00Z"),
"grade" : "A",
"score" : 2,
"inspectorID" : "Z149"},
{"date" : ISODate("2013-09-11T00:00:00Z"),
"grade" : "A",
"score" : 6,
"inspectorID" : "Z126"},
{"date" : ISODate("2013-01-24T00:00:00Z"),
"grade" : "A",
"score" : 10,
"inspectorID" : "Z39"},
{"date" : ISODate("2011-11-23T00:00:00Z"),
"grade" : "A",
"score" : 9,
"inspectorID" : "Z204"},
{"date" : ISODate("2011-03-10T00:00:00Z"),
"grade" : "B",
"score" : 14,
"inspectorID" : "Z189"}],
"name" : "Morris Park Bake Shop",
"restaurant_id" : "30075445",
"avgprc" : NumberDecimal("12.2500000000000")
}
As you can see, this collection contains data points you’d expect (address, cuisine, etc.), but it also contains time-based sanitation grade ratings as a nested array. In a relational schema, you might expect to see this data stored in a different table whereas in MongoDB, it can be retained within the restaurant object.
To transform this database into a form that a SQL-based tool can parse, we use the mongodrdl
utility to create the mapping file.
Inspecting the output file will reveal that the nested arrays have been transformed into relational tables. Indeed, connecting to the file from the MySQL shell reveals the new schema:
Notice how the geospatial data in the source document ("address.coord") was transformed from an array to 2 doubles corresponding to longitude and latitude.
In MongoDB:
"coord" : [-73.856077,40.848447],
Output from the connector:
_id | address.coord_longitude | address.coord_latitude |
5877d52bbf3a4cfc41ef8a03 | -73.856077 | 40.848447 |
What’s more, if you manipulate data in your original MongoDB collection, the changes will map in real time to the output file.
Now that our data is in a form that a SQL-based tool can understand, let’s move into Tableau.
When connecting to the server through Tableau, we select “MySQL” as that is how Tableau is reading our mapped data set.
You will then see that all the data has been pulled into Tableau with their correct types. For example, if we drill down on our longitude and latitude columns, Tableau knows to toggle into geo mode:
This allows us to create interesting visualizations with our MongoDB data. Say we want to zoom into New York City and filter by Asian and Chinese cuisine...
...you’ll notice a big cluster on the southeast side of Manhattan. We've found Chinatown!
Be sure to watch the full demo to see Buzz explore all of the various ways the connector can be used to pre-aggregate data, hide particular fields, and even do field-level dynamic redaction of data.
Best practices for using MongoDB with Tableau
When preparing a dataset stored in MongoDB for analysis in Tableau, be sure you are following MongoDB best practices. Do you have indexes on frequently-queried fields? Have you pre-joined tables as nested arrays (like the sanitation grades example above)?
As we saw with the translation of geospatial arrays into longitude and latitude doubles, there is great value in letting your data types flow into Tableau. Avoid blurring rich types like datetimes and decimal by down-converting them to strings.
Avoid casting. Some of these operations will be processed in the connector itself, not in MongoDB. For example, complex date arithmetic is not yet pushed down into MongoDB and can greatly impact latency.
Frequently Asked Questions
Should I use the Connector for BI or Tableau Data Extract?
Remember that Tableau will not be able to run queries faster than MongoDB allows. If your data is under-optimized, you may want to consider using Tableau Data Extract instead. Extracts can also be a helpful tool to augment query speed, however they work better for smaller datasets (fewer than 100,000,000 records, 100 columns, etc.). Extracts can reduce load on MongoDB cluster if your cluster is being accessed by many users
Is the Connector for BI available for MongoDB Community?
At this time, the Connector for BI is available as part of MongoDB Enterprise Advanced.
What kind of overhead do the connector and Tableau add to MongoDB response times?
Unless you're running into edge cases where processing is happening in the connector rather than in the database, you will not notice additional latency.
With the previous version of the BI Connector we ran into issues with joins between collections.
The recent release of the Connector for BI (v2.0) introduces significant performance enhancements for these use cases over v1.0.
Be sure to watch the full demo here, and download an evaluation version of the Connector for BI 2.0 for yourself!