UPDATED 16:11 EDT / JUNE 08 2016

NEWS

Beyond batch with Spark 2.0: The new continuous data application | #SparkSummit

Building the perfect data application is tricky business. Long hours are spent figuring out what data to use, wrangling and aggregating, writing code — and then new, perhaps contradictory, data arrives upsetting the model at its foundation. The fluctuating nature of data requires applications that are similarly changeable.

Michael Armbrust, software engineer and lead developer of the Spark SQL project at Databricks, Inc., said this very problem led to the development of Spark 2.0. He told  John Walls and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, during Spark Summit 2016 about a common problem he’d run into with customers.

“As soon as they get it working in batch mode, you immediately have the question, ‘Wait, but new data arrived. What’s the answer now?’ And typically, this was starting from scratch,” he said.

Armbrust said that batch should be looked at as a “sandbox” where you experiment and figure out what type of application you need. Then, using the exact same code, make that application streaming and continuous using Spark’s new tools. “The Spark optimizer — this thing we call Catalyst — should be able to figure out how to do that incrementalization,” he said.

The opensource win-win

Armbrust spoke enthusiastically about Databricks’ Community Edition, a new free cloud-based, big data, open-source platform. “Anybody can use this for free. You sign up. You get six gigabyte clusters. All you need is an email address,” he said.

He stated that open source has always been a core value for Spark and Databricks. He said that opening their software to the community allows users to give back by saying, “Hey, you’re missing this optimization,” and adding it. “That is the power of opensource. I think that alone is going to give us a velocity that’s hard to match in closed-source software,” he said.

Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of Spark Summit 2016.

Photo by SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU