

Building the perfect data application is tricky business. Long hours are spent figuring out what data to use, wrangling and aggregating, writing code — and then new, perhaps contradictory, data arrives upsetting the model at its foundation. The fluctuating nature of data requires applications that are similarly changeable.
Michael Armbrust, software engineer and lead developer of the Spark SQL project at Databricks, Inc., said this very problem led to the development of Spark 2.0. He told John Walls and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, during Spark Summit 2016 about a common problem he’d run into with customers.
“As soon as they get it working in batch mode, you immediately have the question, ‘Wait, but new data arrived. What’s the answer now?’ And typically, this was starting from scratch,” he said.
Armbrust said that batch should be looked at as a “sandbox” where you experiment and figure out what type of application you need. Then, using the exact same code, make that application streaming and continuous using Spark’s new tools. “The Spark optimizer — this thing we call Catalyst — should be able to figure out how to do that incrementalization,” he said.
Armbrust spoke enthusiastically about Databricks’ Community Edition, a new free cloud-based, big data, open-source platform. “Anybody can use this for free. You sign up. You get six gigabyte clusters. All you need is an email address,” he said.
He stated that open source has always been a core value for Spark and Databricks. He said that opening their software to the community allows users to give back by saying, “Hey, you’re missing this optimization,” and adding it. “That is the power of opensource. I think that alone is going to give us a velocity that’s hard to match in closed-source software,” he said.
Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of Spark Summit 2016.
Support our open free content by sharing and engaging with our content and community.
Where Technology Leaders Connect, Share Intelligence & Create Opportunities
SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.