UPDATED 07:30 EDT / SEPTEMBER 24 2015

NEWS

Study: Spark is outgrowing, and increasingly displacing, Hadoop

A rift is forming in the open-source ecosystem that may drastically alter the trajectory of modern analytics. Apache Spark, the speedy in-memory data crunching engine developed to take Hadoop beyond batch processing, is increasingly drifting away from the project as new use cases drive early adopters to reconsider their implementation choices.

That’s the key conclusion of the first official annual user survey from Databricks Inc., the startup co-founded by Spark creator Matei Zaharia and several peers from UC Berkeley to commercialize the framework. Some 48 percent of the 1417 data scientists and other participants who partook in the poll said that their organizations have deployed the engine as a standalone cluster.

That compares to the 40 percent whose companies are running Spark on Hadoop, which is not particularly encouraging for Cloudera Inc. and the other distributors that have spent the last few years trying to monetize the latter project. Compounding the threat is the growth that Databricks has recorded in the uptake of the in-memory engine’s value-added extensions.

That includes first and foremost Spark SQL, the structured query component, which the study found to have seen adoption nearly quadruple over the past year from four percent of the overall Spark user base to almost a quarter. The technology substitutes the functionality of Cloudera’s Impala and many other alternatives the Hadoop ecosystem.

Trailing behind in second place is the query layer is Spark Streaming, which jumped a more modest 56 percent and is now seeing use with some 14 percent of the entire user base. That growth will likely expand much further as more and more organizations find themselves needing to process data in real-time due to the proliferation of connected devices in the corporate network.

For the time, being, however, the main reason why CIOs are refocusing their analytics efforts from Hadoop to Spark is its raw speed. An overwhelming 91 percent of the respondents to the survey cited performance as a key advantage of the engine, an edge that will only increase as Databricks continues to optimize the underlying architecture.

But not all the credit goes to the startup, however. Much of the work is done by the surrounding ecosystem of outside contributors, which saw its ranks swell by 600 members in the last 12 months, more than twice as many as the previous year according to the study. Among Spark’s newest backers is IBM Corp., which recently committed a billion dollars and 3,500 engineers to accelerating its development.

That makes it plentifully clear where Big Blue thinks the open-source analytics movement is headed, a sentiment that is shared by even the staunchest Hadoop supporters. Cloudera an initiative to make Spark the new default processing engine of the platform in an effort to capitalize on its popularity, citing many of the same reasons as the respondents to Databricks’ survey.

Photo via sethink

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Study: Spark is outgrowing, and increasingly displacing, Hadoop

Photo via sethink

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Freshworks Refresh 2026

IBM Think 2026

Dell Technologies World 2026

KB4-CON 2026

VeeamON 2026

Study: Spark is outgrowing, and increasingly displacing, Hadoop

Photo via sethink

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Freshworks Refresh 2026

IBM Think 2026

Dell Technologies World 2026

KB4-CON 2026

VeeamON 2026

Cookies