UPDATED 10:25 EDT / OCTOBER 13 2014

Apache Spark heats up with support from 0xdata’s machine-learning platform NEWS

Apache Spark heats up with support from 0xdata’s machine-learning platform

Apache Spark heats up with support from 0xdata’s machine-learning platform

John Furrier Live In theCUBE At Hadoop Summit 2014

Hot on the heels of Hadoop distributor Hortonworks Inc. throwing its weight behind Apache Spark, the ultra-fast analytics engine has found another backer in 0xdata Inc., an emerging provider of machine learning software founded by industry veteran SriSatish Ambati. The Silicon Valley startup is launching a new addition to its flagship platform specifically optimized to tap into the vast processing capacity of Spark.

In-memory is becoming a hot area to create more real-time value in the big data space,” said SiliconANGLE founder John Furrier. “Big Data is becoming about getting low latency data in the hands of apps and users for innovations that combine speed and machine learning.”

Born out of the celebrated AMPLab at UC Berkeley, Spark is an in-memory execution framework for Hadoop that runs up to 100 times faster than the default MapReduce engine included in the batch analytics platform. The open source platrorm is also much better equipped to handle operations that involve looping over the same data in quick succession, which is the underpinning of machine learning.

Google created MapReduce in 2004 to simplify the deployment of parallel applications on distributed clusters, a momentous feat that not only requires effectively spreading a load across individual servers but also enabling rapid inter-node communications and fault-tolerance. The software is perfectly suited for performing that task, hiding most of the complexity and freeing up the user to focus on their application.

As a result of that narrow focus, however, there is no straightforward way to implement an iterative algorithm in MapReduce. Data scientists are left to split loop cycles across disjointed operations that not only take extra effort to input but also run independently of each other, requiring information to be written to disk and reloaded with every iteration.

Spark does away with that hassle by keeping everything in memory as a continuous workflow and thereby killing two birds with one stone: It simplifies life for users while eliminating the massive delays associated with shuffling data around.

That makes it a perfect fit with 0xdata’s H20, which is built from the ground up for performing machine learning calculations in memory. 0xdata says its open source platform provides an environment for data scientists to implement a wide range of machine learning use cases ranging from pricing optimization to predictive analytics using tools they’re already familiar with.

Sparkling Water is the culmination of a four-month effort to integrate H20 with the analytics engine. The technology makes it possible of seamlessly move information back and forth from the two platforms, making the combination much more accessible. Users can now feasibly query Spark for a particular dataset, feed it into H20 to create a machine learning model and push the results back to Spark for rapid execution, which significantly increased the usefulness of both projects.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU