MongoDB cozies up to Apache Spark with new Connector
At MongoDB World last week, MongoDB Inc. the company announced a new Connector for Apache Spark, enabling data scientists and developers using its database to enjoy the benefits of real-time analytics from rapidly-moving data.
The MongoDB Connector for Apache Spark enables users to glean insights from “live, operational and streaming data”, MongoDB officials said. The company explained that it collaborated closely with the people over at Databricks Inc., the company that offers a commercial version of Apache Spark, and as such the new Connector has been given Databrick’s Certified Application status, guaranteeing integration and API compatibility between Spark process and MongoDB.
MongoDB’s users have expressed great interest in using Spark with the database, Kelly Stirman, the company’s vice president of strategy and product marketing, told eWEEK. As such, MongoDB simply took its existing Connector for Apache Hadoop and enhanced it to work with Spark as well.
“We learned a lot and decided there’s enough interest there to make an engineering investment to make a dedicated connector for Spark,” Stirman told eWEEK.
With the new connector, Spark jobs can now be executed directly against operational data residing in MongoDB, doing away with the need to carry out extract, transform and load (ETL) operations, explained Eliot Horowitz, co-founder and CTO of MongoDB. He added that the new connector enables MongoDB to load and serve analytics results back into live, operational processes, “making them smarter, more contextual and responsive to events as they happen”.
The MongoDB Connector provides a familiar development experience for Spark users because it’s written in Scala, which also happens to be Spark’s native programming language. Additionally, the Spark connector exposes all of Spark’s libraries, which enables MongoDB to run data sets for analysis with machine learning, graph, streaming and SQL APIs.
“Users are already combining Apache Spark and MongoDB to build sophisticated analytics applications,” said Reynold Xin, co-founder and chief architect of Databricks, in a statement. “The new native MongoDB Connector for Apache Spark provides higher performance, greater ease of use, and access to more advanced Apache Spark functionality than any MongoDB connector available today.”
The connector is already helping customers to build more reliable AI systems in faster times. Jeff Smith, data engineering team lead at x.ai, which has developed an AI-powered digital assistant, said his company uses a combination of Apache Spark and MongoDB to process and analyze the vast amounts of data needed to power its AI-based app.
“With the new native MongoDB Connector for Apache Spark, we have an even better way of connecting up these two key pieces of our infrastructure,” Smith said. “We believe the new connector will help us move faster and build reliable machine learning systems that can operate at massive scale.”
The MongoDB Connector for Apache Spark was announced alongside MongoDB’s new Atlas database-as-a-service offering.
Image credit: ClkerFreeVectorImages via pixabay.com
A message from John Furrier, co-founder of SiliconANGLE:
Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.
We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.