UPDATED 07:30 EDT / SEPTEMBER 24 2015

NEWS

Study: Spark is outgrowing, and increasingly displacing, Hadoop

A rift is forming in the open-source ecosystem that may drastically alter the trajectory of modern analytics. Apache Spark, the speedy in-memory data crunching engine developed to take Hadoop beyond batch processing, is increasingly drifting away from the project as new use cases drive early adopters to reconsider their implementation choices.

That’s the key conclusion of the first official annual user survey from Databricks Inc., the startup co-founded by Spark creator Matei Zaharia and several peers from UC Berkeley to commercialize the framework. Some 48 percent of the 1417 data scientists and other participants who partook in the poll said that their organizations have deployed the engine as a standalone cluster.

That compares to the 40 percent whose companies are running Spark on Hadoop, which is not particularly encouraging for Cloudera Inc. and the other distributors that have spent the last few years trying to monetize the latter project. Compounding the threat is the growth that Databricks has recorded in the uptake of the in-memory engine’s value-added extensions.

That includes first and foremost Spark SQL, the structured query component, which the study found to have seen adoption nearly quadruple over the past year from four percent of the overall Spark user base to almost a quarter. The technology substitutes the functionality of Cloudera’s Impala and many other alternatives the Hadoop ecosystem.

Trailing behind in second place is the query layer is Spark Streaming, which jumped a more modest 56 percent and is now seeing use with some 14 percent of the entire user base. That growth will likely expand much further as more and more organizations find themselves needing to process data in real-time due to the proliferation of connected devices in the corporate network.

For the time, being, however, the main reason why CIOs are refocusing their analytics efforts from Hadoop to Spark is its raw speed. An overwhelming 91 percent of the respondents to the survey cited performance as a key advantage of the engine, an edge that will only increase as Databricks continues to optimize the underlying architecture.

But not all the credit goes to the startup, however. Much of the work is done by the surrounding ecosystem of outside contributors, which saw its ranks swell by 600 members in the last 12 months, more than twice as many as the previous year according to the study. Among Spark’s newest backers is IBM Corp., which recently committed a billion dollars and 3,500 engineers to accelerating its development.

That makes it plentifully clear where Big Blue thinks the open-source analytics movement is headed, a sentiment that is shared by even the staunchest Hadoop supporters. Cloudera an initiative to make Spark the new default processing engine of the platform in an effort to capitalize on its popularity, citing many of the same reasons as the respondents to Databricks’ survey.

Photo via sethink

 


A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.