UPDATED 22:30 EST / JULY 19 2017

BIG DATA

Spark doubles down on streaming, data warehousing and deep learning

The Apache Spark community has been wrestling with a wide range of big data challenges in information technology, and Databricks Inc. (which was founded by Spark’s creators), is taking steps to address the enterprise need for machine learning and speedier data processing.

“Rather than giving people the fish, you give them the tools to fish,” said Reynold Xin (pictured), chief architect and co-founder at Databricks.

Xin stopped by theCUBE, SiliconANGLE’s mobile livestreaming studio, and answered questions from hosts David Goad (@davidgoad) and George Gilbert (@ggilbert41), during this year’s Spark Summit 2017 in San Francisco, California. They discussed changes for the Spark platform, the role of storage systems in analytics and the next big challenge for the Spark community. (* Disclosure below.)

Deep learning is a priority

One of the tools announced by Databricks during the Spark Summit was Deep Learning Pipelines, an open-source library designed to give users the ability to create neural networks for data processing. “We’re hoping to democratize deep learning,” Xin said.

Seeking to dramatically speed-up data processing, Databricks has also blended a Structured Streaming tool into its enterprise portfolio. Databricks customers processed 3 trillion records last month using Structured Streaming and brought latency down to the three millisecond range, according to Xin.

Databricks is also working to improve the visibility and what Xin termed “debug-ability” of big data jobs. By improving the performance and capability of data warehousing features in Spark, this will also increase job processing speed, he added.

While storage systems have “matured” and Spark can work effectively with a variety of them, Xin is not prepared to include storage in an analytical role just yet. “It doesn’t make sense to build storage systems for analytics at this point,” said the Databricks co-founder.

Despite the release of new enhancements, the challenge for the Spark community will continue to be finding ways to make data management and deep learning tools easier to use. “The bar to entry is very high for these tools. It’s what we focus on a lot at Databricks,” Xin concluded.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Spark Summit 2017(* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU