Get the inside scoop on Apache Spark 2.0: Will it break code? | #SparkSummit
Apache Spark is going from 1.0 to 2.0, and who better to talk about it than Matei Zaharia, creator of Spark? He did so at Spark Summit East 2016 during a keynote session, along with presentations from Ali Ghodsi, CEO of Databricks, Inc.; Shaun Connolly, VP of corporate strategy at Hortonworks, Inc.; and Anjul Bhambhri, VP of engineering for Big Data and analytics at IBM.
Successes of 2015
“2015 was a really, really great year for Spark,” said Zaharia. Attendees to the Spark Summit quadrupled to 3,500. Meetup groups increased to 60,000 and are in new continents and places each week. And total contributors has also grown to 1,000. New components were being added, such as DataFrames (a distributed collection of data organized into named columns) and SparkR (an R package that provides a light-weight frontend to use ApacheSpark from R).
In 2016, bigger things are coming. There is a major new version release called Spark 2.0 coming in April or May. “We really hate breaking APIs,” noted Zaharia, assuring the audience that they will not be changing the majority of APIs. For most users, the update to 2.0 will not break their code in any way.
So what are the new features? Zaharia highlighted three new features in 2.0: Tungsten Phase 2 (speedups of 5-10x), Structured Streaming (real-time engine on SQL/DataFrames), and Unifying Datatsets and DataFrames.
Democratizing Big Data
Ghodsi was up next to talk about democratizing access to Big Data. “We created Databricks to simplify Data,” he said. Databricks wanted a Cloud platform to ensure everything worked end-to-end, to ensure it could rapidly release new software, and to provide dynamic use-cases for customers. Databricks built a platform with a lot of integrations, and it used Spark.
“Companies still struggle with Big Data projects,” said Ghodsi. There is a big learning curve for developers. Acquiring machines, setting up and configuring infrastructure, and build systems is a huge problem. In 2014, the company set out to train Spark users. In 2015, it launched Massive Open Online Courses (MOOC), and over 20,000 finished the courses.
Ghodsi was excited to announce that Databricks is launching a free Databricks Community Edition. People will get mini Spark clusters, course and MOOC material, and Spark how-to documentation. A demonstration of this Community Edition helped give audience members a visual of what they could do with this material.
Accelerating enterprise Spark
Connolly took the stage next to talk about how enterprises are using Spark. For example, Connolly shared the story of the company Webtrends, which took datastreams from a vast array of desktop and mobile devices, analyzed those on Spark, and launched a new product (Webtrends Explore).
“Its been truly a journey,” Connolly said. “It isn’t just about doing really cool analytics, but they were able to identify and tap into new revenue streams.”
A celebration of Apache Spark
In conclusion of the keynote speakers came a celebration of Spark from IBM. “Today, we’re all really here to celebrate Apache Spark,” opened Bhambhri. “We recognize a good thing when see one, and we get behind it.” IBM has bet on many ground-changing technologies over the last century, and the newest technology is Apache Spark.
“This technology is so fundamental that we think of this as the analytics operating system,” said Bhambhri. Why? According to Bhambhri, never before has such rich set of analytical capabilities come together in one stack. In the past, a company would’ve needed at least a dozen products to pull off analytics, but now one just needs the foundational platform — Apache Spark.
IBM is enhancing Spark, offering it as part of its products, and leveraging it. “We already have about 15 IBM products that we shipped last year which are leveraging Spark. Over a dozen are at work in the labs,” said Bhambhiri.
Watch the full video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of Spark Summit East 2016. And join in on the conversation by CrowdChatting with theCUBE hosts.
Photo by Spark Summit East
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU