MongoDB Connector for Hadoop – A Tour of What’s New in V1.1

The MongoDB engineering team has recently made a series of significant updates to the MongoDB Connector for Hadoop. This makes it easier for Hadoop users to integrate real-time data from MongoDB – the most popular database for big data systems – with Hadoop for deep, offline analytics. The Connector exposes the analytical power of Hadoop's MapReduce to live application data from MongoDB, driving value from big data faster and more efficiently.

The Connector presents MongoDB as a Hadoop-compatible file system allowing a MapReduce job to read from MongoDB directly without first copying it to HDFS, thereby eliminating the need to move Terabytes of data across the network. MapReduce jobs can pass queries as filters, so avoiding the need to scan entire collections, and can also take advantage of MongoDB’s rich indexing capabilities including geospatial, text-search, array, compound and sparse indexes.

As well as reading from MongoDB, the results of Hadoop jobs can also be written back out to MongoDB, to support real-time operational processes and ad-hoc querying.

Version 1.1 of Connector adds support for MongoDB’s native BSON (Binary JSON) backup files, which can be stored locally in HDFS and co-located with TaskTrackers, where they can be processed directly by Hadoop, or on local or cloud-based file systems such as Amazon S3.

In addition to existing MapReduce, Pig, Hadoop Streaming (with node.js, Python or Ruby) and Flume support, the new MongoDB Hadoop connector enables SQL-like queries from Apache Hive to be run across MongoDB data sets. The latest version of the Connector enables Hive to access BSON files, with full support for MongoDB collections scheduled for the next release of the Connector later this year.

MongoUpdateWriteable is another new feature of the Connector. This allows Hadoop to modify an existing output collection in MongoDB, rather than only writing to new collections. As a result, users can run incremental MapReduce jobs, for example to aggregate trends or pattern matching on a daily basis, which can then by efficiently queried in a single collection by MongoDB.

The MongoDB Connector for Hadoop works by
- Examining the MongoDB Collection and calculates a set of splits from the data
- Each of the splits gets assigned to a node in Hadoop cluster
- In parallel, Hadoop nodes pull data for their splits from MongoDB (or BSON) and processes them locally
- Hadoop merges results and streams output back to MongoDB or BSON

Mike O’Brien, MongoDB software engineer and maintainer of the MongoDB Connector for Hadoop demonstrated its new features in a recent webinar – which is now available for viewing on-demand

Following on from Mike’s webinar, we will also host a new session on Wednesday 21st August exploring the big data use cases of MongoDB and Hadoop, and the value of integration between them in creating a big data pipeline

In summary, the MongoDB Connector for Hadoop adds to the broadest set of query and data analysis capabilities of any NoSQL database including: - The MongoDB API, which was recently adopted by IBM as the new standard for building mobile applications;
- The MongoDB aggregation framework, which provides functionality similar to SQL GROUP_BY operators;
- Multiple integrations with leading BI tool vendors, including QlikTech, Actuate, Informatica JasperSoft, Pentaho and Talend, to perform BI on live data;
- Native MapReduce within MongoDB when integration with Hadoop isn’t needed;
- MongoDB Connector for Hadoop enabling integration with Hadoop MapReduce jobs, such as aggregating data from multiple input sources, or as part of Hadoop-based data warehousing or ETL workflows.

You can download MongoDB Connector for Hadoop from GitHub

Review the documentation, including details on how to get started and sample code

If you have any questions, email the mongodb-user Mailing List

We’d also love to hear how you can use the connector to bring together MongoDB and Hadoop – feel free to comment below

MongoDB Connector for Hadoop – A Tour of What’s New in V1.1

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112