

As Hadoop Summit prepares to kick off tomorrow in San Jose, CA, the Big Data platform that is shaking up the database industry is also struggling with an adolescent crisis and questions about whether it can support a full commercial ecosystem in the long term.
Forecasts for the overall Big Data market continue to be rosy, with Wikibon expecting a 17% annual compound growth rate through 2026, when the total market will exceed $84 billion. Leading vendors continue to post promising results. Cloudera, Inc., which is widely considered to be the market leader, said its revenues topped $100 million in 2014. Analysts are valuing the privately held company at nearly $5 billion.
Hortonworks, Inc. which is the only publicly held pure-play Hadoop vendor, cheered investors with a quarterly earnings report last month that beat estimates and revised its outlook upward, but the company also reported that losses nearly doubled from the previous year.
Voices of doubt have been emerging, however, in particular a Gartner report released last month that found that only about one-quarter of 284 technology and business leaders surveyed are using Hadoop, and even they are implementing it only on a small scale. A much smaller survey of 106 executives released by Actian Corp. this week found that only five percent are beating down the doors for the batch Big Data platform.
The rub is Hadoop’s complexity, which is often cited as one of the platform’s greatest weaknesses. “Hadoop isn’t a product, it’s an ecosystem, and users are gagging on its complexity,” said George Gilbert, Wikibon’s Big Data analyst.
The vendor ecosystem is tackling the problem, but they’re racing against emerging alternatives that could steal some of Hadoop’s momentum. Chief among those is Apache Spark, a data processing and analytics engine that is noted for its speed. Spark is quickly supplanting Hadoop’s native MapReduce processing engine, and some people say it could challenge Hadoop directly.
“Spark is an emerging ecosystem with open-source excitement and innovation,” said Wikibon co-founder David Floyer.
Spark is not a direct replacement for Hadoop. Its most popular use is as a high-speed analysis and reporting engine, where it has earned plaudits as an alternative to the complex MapReduce while also performing up to 100 times faster. However, Spark consumes more memory and machine resources than the Hadoop batch framework. It also doesn’t have a native file system, but rather rides on top of multiple data sources, including Hadoop’s HDFS.
“Hadoop is a Big Data batch process that parallelizes as much as possible for efficiency,” Floyer explained. “Spark is a different model. It’s an integrated, parallel set of micro-batch processes that use much more memory and are designed to deliver results much quicker. It allows the intersection of big data, streaming data and near real-time analytics, and is an important open source technology on the road to systems of intelligence”
That could be where the sweet spot of the market is moving, though. While Hadoop has greatly reduced the cost of managing large amounts of data, its batch orientation doesn’t lend itself well to the sexier world of high-speed analytics. “The Hadoop ecosystem needs to deliver real-time agile applications to support more interactivity and engagement data,” said SiliconANGLE founder John Furrier.
Spark is also relatively untested in the market, having been designated an Apache Top-Level Project only a little more than a year ago. “Spark is still going through the process of being hardened that any large scale engine requires before mainstream adoption,” said Wikibon’s Gilbert.
No one is expecting Hadoop to go away, but with valuations in the stratosphere and a bundle of venture capital awaiting a payoff, expectations are high. The market could be ripe for consolidation, Floyer said. Drawing an analogy to the storage market of five years ago, he pointed out that buyouts quickly reduced the number of contenders in that arena from 10 to two or three. “The same thing will happen with Hadoop,” he predicted. “After consolidation is when the real money will be made.”
And there’s a possibility that the companies that make that money won’t be the ones that currently dominate the landscape. Gilbert suggested that cloud providers like Amazon Web Services and Microsoft may see Hadoop’s complexity as a new-business opportunity. “They can potentially build a tightly integrated platform that doesn’t have that complexity,” and snatch leadership away from the pure-play vendors, he said.
If that happens, then the ultimate winners in the Hadoop market may be companies that aren’t even playing today.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.