UPDATED 07:51 EST / JUNE 08 2015


Wikibon sizes up the skeptics as #HadoopSummit love fest begins

As Hadoop Summit prepares to kick off tomorrow in San Jose, CA, the Big Data platform that is shaking up the database industry is also struggling with an adolescent crisis and questions about whether it can support a full commercial ecosystem in the long term.

Forecasts for the overall Big Data market continue to be rosy, with Wikibon expecting a 17% annual compound growth rate through 2026, when the total market will exceed $84 billion. Leading vendors continue to post promising results. Cloudera, Inc., which is widely considered to be the market leader, said its revenues topped $100 million in 2014. Analysts are valuing the privately held company at nearly $5 billion.

Hortonworks, Inc. which is the only publicly held pure-play Hadoop vendor, cheered investors with a quarterly earnings report last month that beat estimates and revised its outlook upward, but the company also reported that losses nearly doubled from the previous year.

Voices of doubt have been emerging, however, in particular a Gartner report released last month that found that only about one-quarter of 284 technology and business leaders surveyed are using Hadoop, and even they are implementing it only on a small scale. A much smaller survey of 106 executives released by Actian Corp. this week found that only five percent are beating down the doors for the batch Big Data platform.

The rub is Hadoop’s complexity, which is often cited as one of the platform’s greatest weaknesses. “Hadoop isn’t a product, it’s an ecosystem, and users are gagging on its complexity,” said George Gilbert, Wikibon’s Big Data analyst.

The vendor ecosystem is tackling the problem, but they’re racing against emerging alternatives that could steal some of Hadoop’s momentum. Chief among those is Apache Spark, a data processing and analytics engine that is noted for its speed. Spark is quickly supplanting Hadoop’s native MapReduce processing engine, and some people say it could challenge Hadoop directly.

“Spark is an emerging ecosystem with open-source excitement and innovation,” said Wikibon co-founder David Floyer.

Spark is not a direct replacement for Hadoop. Its most popular use is as a high-speed analysis and reporting engine, where it has earned plaudits as an alternative to the complex MapReduce while also performing up to 100 times faster. However, Spark consumes more memory and machine resources than the Hadoop batch framework. It also doesn’t have a native file system, but rather rides on top of multiple data sources, including Hadoop’s HDFS.

“Hadoop is a Big Data batch process that parallelizes as much as possible for efficiency,” Floyer explained. “Spark is a different model. It’s an integrated, parallel set of micro-batch processes that use much more memory and are designed to deliver results much quicker. It allows the intersection of big data, streaming data and near real-time analytics, and is an important open source technology on the road to systems of intelligence”

Real-time advantage

That could be where the sweet spot of the market is moving, though. While Hadoop has greatly reduced the cost of managing large amounts of data, its batch orientation doesn’t lend itself well to the sexier world of high-speed analytics. “The Hadoop ecosystem needs to deliver real-time agile applications to support more interactivity and engagement data,” said SiliconANGLE founder John Furrier.

Spark is also relatively untested in the market, having been designated an Apache Top-Level Project only a little more than a year ago. “Spark is still going through the process of being hardened that any large scale engine requires before mainstream adoption,” said Wikibon’s Gilbert.

No one is expecting Hadoop to go away, but with valuations in the stratosphere and a bundle of venture capital awaiting a payoff, expectations are high. The market could be ripe for consolidation, Floyer said. Drawing an analogy to the storage market of five years ago, he pointed out that buyouts quickly reduced the number of contenders in that arena from 10 to two or three. “The same thing will happen with Hadoop,” he predicted. “After consolidation is when the real money will be made.”

And there’s a possibility that the companies that make that money won’t be the ones that currently dominate the landscape. Gilbert suggested that cloud providers like Amazon Web Services and Microsoft may see Hadoop’s complexity as a new-business opportunity. “They can potentially build a tightly integrated platform that doesn’t have that complexity,” and snatch leadership away from the pure-play vendors, he said.

If that happens, then the ultimate winners in the Hadoop market may be companies that aren’t even playing today.

A message from John Furrier, co-founder of SiliconANGLE:

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Join Our Community 

Click here to join the free and open Startup Showcase event.

“TheCUBE is part of re:Invent, you know, you guys really are a part of the event and we really appreciate your coming here and I know people appreciate the content you create as well” – Andy Jassy

We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.

Click here to join the free and open Startup Showcase event.