UPDATED 22:30 EST / JULY 19 2017

BIG DATA

Spark doubles down on streaming, data warehousing and deep learning

The Apache Spark community has been wrestling with a wide range of big data challenges in information technology, and Databricks Inc. (which was founded by Spark’s creators), is taking steps to address the enterprise need for machine learning and speedier data processing.

“Rather than giving people the fish, you give them the tools to fish,” said Reynold Xin (pictured), chief architect and co-founder at Databricks.

Xin stopped by theCUBE, SiliconANGLE’s mobile livestreaming studio, and answered questions from hosts David Goad (@davidgoad) and George Gilbert (@ggilbert41), during this year’s Spark Summit 2017 in San Francisco, California. They discussed changes for the Spark platform, the role of storage systems in analytics and the next big challenge for the Spark community. (* Disclosure below.)

Deep learning is a priority

One of the tools announced by Databricks during the Spark Summit was Deep Learning Pipelines, an open-source library designed to give users the ability to create neural networks for data processing. “We’re hoping to democratize deep learning,” Xin said.

Seeking to dramatically speed-up data processing, Databricks has also blended a Structured Streaming tool into its enterprise portfolio. Databricks customers processed 3 trillion records last month using Structured Streaming and brought latency down to the three millisecond range, according to Xin.

Databricks is also working to improve the visibility and what Xin termed “debug-ability” of big data jobs. By improving the performance and capability of data warehousing features in Spark, this will also increase job processing speed, he added.

While storage systems have “matured” and Spark can work effectively with a variety of them, Xin is not prepared to include storage in an analytical role just yet. “It doesn’t make sense to build storage systems for analytics at this point,” said the Databricks co-founder.

Despite the release of new enhancements, the challenge for the Spark community will continue to be finding ways to make data management and deep learning tools easier to use. “The bar to entry is very high for these tools. It’s what we focus on a lot at Databricks,” Xin concluded.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Spark Summit 2017(* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.