UPDATED 23:33 EDT / JULY 27 2016

NEWS

Databricks ships out “easier, faster, smarter” Apache Spark 2.0

The immensely popular open-source cluster computing framework Apache Spark has just reached version 2.0, according to an announcement by the Apache Software Foundation (ASF) yesterday.

Spark’s incredible popularity means it’s become one of the most active open-source Big Data projects of them all, approaching the same level as Apache Hadoop, one of the oldest and most established Big Data technologies around. Much of Spark’s acclaim comes due to the superior functionality it offers over MapReduce, the original Hadoop component that it’s now rapidly replacing. Spark supports numerous modern features not seen in MapReduce, such as real-time analytics of streaming data, in-memory processing, machine learning, interactive queries and more.

Now, with Spark 2.0, that functionality has improved even further.

“Apache Spark 2.0.0 is the first release on the 2.x line,” noted the ASF on the Apache Spark website. “The major updates are API usability, SQL 2003 support, performance improvements, structured streaming, R UDF support, as well as operational improvements. In addition, this release includes over 2,500 patches from over 300 contributors.”

But Databricks Inc., the company founded by Spark’s creators to try and commercialize the platform, framed the improvements as the platform’s “three core attributes” – easier, faster, smarter. It made the announcement in a blog post saying Databricks is the first commercial vendor to support Apache Spark 2.0.

In a separate blog post, Databricks explained some of the most notable new features in the release, which focus on two specific areas – standard SQL support and unifying DataFrame/Dataset API.

First up, Databricks has streamlined Spark’s APIs in the new release, unifying its DataFrame and Dataset APIs in Java and Scala. Also streamlined is the DataFrame API, which is now a type alias for Dataset of Row in Spark 2.0. In addition, the new release comes with expanded SQL support, together with the introduction of a new ANSI SQL parser and subqueries, which refers to queries nested inside another query.

The other main focus in Spark 2.0 was speed. Databricks points to its 2015 Spark Survey, which showed that 91 percent of users rated performance as one of the most important aspects of the software. Responding to this feedback, Databricks took a long, hard look at Spark’s physical execution layer, before redesigning and introducing a second-generation Tungsten engine. The new and improved engine “builds upon ideas from modern compilers and MPP databases and applies them to Spark workloads,” the company said.

Spark 2.0 also comes with a brand new API called Structured Streaming that’s designed to allow applications to make decisions in real-time. Structured Streaming has three main improvements, including integrated APIs with batch jobs, transactional interaction with storage systems and rich integration with Spark’s other components. Spark 2.0 ships with the initial alpha release of Structured Streaming as an extension of the DataFrame and Dataset APIs.

Databricks reckons that with the new improvements, developers will no longer need to keep their apps in sync with batch jobs or manage failures manually, as the streaming job will now always give the same answer as a batch job on the same dataset. In addition, developers can now build complete applications rather than just streaming pipelines.

“One of the things that’s really exciting for me as a developer of Apache Spark is seeing how quickly users start to use new features and APIs we introduce, and in turn, offer almost instantaneous feedback, so that we can continue to improve them,” said Matei Zaharia, CTO and co-founder of Databricks and creator of Apache Spark, in a press release.

On its Spark site, the ASF took pains to point out some essential resources for developers wishing to learn more about Spark, including Scala resources such as “First Steps to Scala,” “Scala tutorial for Java programmers” and “Programming in Scala.” There’s also a general “Spark Programming Guide” with examples of code in all three main languages.

Image credit: Mikegi via pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

https://siliconangle.com/aws-marketplace/

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Databricks ships out “easier, faster, smarter” Apache Spark 2.0

Image credit: Mikegi via pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

Databricks ships out “easier, faster, smarter” Apache Spark 2.0

Image credit: Mikegi via pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026