UPDATED 10:30 EDT / DECEMBER 26 2014

Is Hadoop over-hyped? Market-watchers say no

hadoop-elephantWhen an organization wants to dig into its Big Data these days, it’s almost inevitable that Hadoop, the open-source data storage and processing framework for dealing with extremely large data sets, will get the call.

It’s easy to see why Hadoop is such an appealing option. The platform offers both distributed and computational capabilities for a relatively low cost (it’s essentially ‘free’ software), and it can be scaled to meet the exponential increase in data that organizations expect to generate from social media, smartphones and the Internet of Things.

These advantages, together with the hyperbole and a few high-profile success stories from the likes of Facebook and Yahoo! are set to drive rapid adoption of Hadoop in the enterprise. That’s what Forrester Research believes anyhow – just last week, the research firm confidently predicted Hadoop would become an “enterprise priority” by this time next year. Wikibon’s analysts display similar optimism, with their most recent Big Data report forecasting revenues of some $38.1 billion in 2015, and over $50 billion by 2018.

Hung up on Hadoop

 

Hadoop might be all the rage in the enterprise right now, but not all data scientists are buying it. In a somewhat dour assesment of Hadoop’s enterprise performance published by the Wall Street Journal earlier this month, Elizabeth Dwoskin wrote that many companies that have implemented Hadoop have been left sorely disappointed with the results.

“Bank of New York Mellon used it to locate glitches in a trading system. It worked well enough on a small scale, but it slowed to a crawl when many employees tried to access it at once, and few of the company’s 13,000 information-technology workers had the expertise to troubleshoot it. David Gleason, the bank’s chief data officer at the time, said that while he was a proponent of Hadoop, “it wasn’t ready for prime time.”

It’s not that Hadoop is just an immature technology – rather, it’s unsuitable for many mainstream Big Data projects, Dwoskin claims. And despite the fact it’s technically “free”, implementing Hadoop can be both time consuming and expensive.

“It can take a lot of work to combine data stored in legacy repositories with the data stored in Hadoop,” she writes. “And while Hadoop can be much faster than traditional databases for some purposes, it often isn’t fast enough to respond to queries immediately or to work on incoming information in real time. Satisfying requirements for data security and governance also poses a challenge.”

The Journal isn’t the only publication to document frustration among early Hadoop adopters. A recent survey of data scientists on the obstacles to big data analytics conducted by Big Data firm Paradigm4 found that 76 percent of those who have used Hadoop or Apache Spark (a computational framework built atop of Hadoop) complained of “significant limitations”.

Among that group, 39 percent of respondents said it takes too much effort to program Hadoop, while another 37 percent said Hadoop is “too slow for interactive, ad hoc queries”, and 30 percent complained it isn’t fast enough for real-time analytics. As a result, about 35 percent of the data scientists surveyed who have used Hadoop have given up on it altogether.

Paradigm4, which makes a competitor database, has an admittedly vested interest in knocking Hadoop, but there have been other murmurs of dissatisfaction going back years. “Hadoop isn’t enough anymore for enterprises that need new and faster ways to extract business value from massive datasets,” warned Jaikumar Vijayan of Computerworld, in a 2012 article.

More than just Hadoop

 

But lack of understanding may be the biggest reason enterprises fail with Hadoop. As I noted in a recent article about the state of Hadoop, there are just as many success stories as there are tales of despair, especially in the business management, advertising, sales & marketing, and security industries. Hadoop often performs better when combined with other technologies, such as NoSQL databases like HBase, Cassandra and MongoDB, to carry out Big Data analysis tasks. The well-documented shortage of Big Data scientists also means many enterprises might lack the necessary expertise to take advantage of Hadoop right now.

Jeff Kelly, WikibonWikibon analyst Jeff Kelly (right) says enterprises will move to address the skills problem and vendors of Hadoop distributions will do a better job determining best use cases, adding design/deploy/maintenance tools, integrating with existing infrastructure/business processes and building applications that actually move the needle. “I expect most Fortune 500 companies will be using Hadoop in some form or another within two to three years, if not sooner,” he said, “and I think this will cut across verticals, but with financial services/banking/insurance, retail, and industrial sectors leading the way.”

Kelly’s argument is backed up by Forrester Research analyst Mike Gualtier, who recently predicted that 2015 will be Hadoop’s breakthrough year in many enterprises:

“Forrester believes that Hadoop is a must-have for large enterprises, forming the cornerstone of any flexible future data platform needed in the age of the customer,” wrote Gualtier in a report. “But, we also believe that Hadoop is becoming more than just a data platform. Given its economics, performance and flexibility, Hadoop will become an essential piece of every company’s business technology (BT) agenda.”

Where Hadoop hype has perhaps gotten out of control is in the expectation that it will replace legacy databases across the board. “Hadoop is just one part of the modern data architecture,” said Kelly. “It will need to play nice with other technologies, both existing (NoSQL data stores, Spark, data visualization, etc.) and new tools yet to be developed.”

Vendors are trying to make things easier for their customers by incorporating open-source and custom tools to address Hadoop’s shortcomings. For example, MapR Technologies Inc. integrates Hadoop with its own noSQL database and boasts that its customers typically see a five-fold return on their investment.

The MapR Hadoop-NoSQL package “provides growing real-time requirements such as on-line retail solutions, managed security services, and the largest bio-metric database in the world,” says Jack Norris, MapR’s Chief Marketing Officer, in response to claims that Hadoop isn’t fast enough to analyze real-time data. “Without these enhancements, organizations are unable to get the results they want.”

While Hadoop has its detractors, enterprises can’t afford to dismiss it. The platform is still immature, which means that mistakes will be made. Some early adopters of Hadoop may have fallen victim to the “hype”, but these are just growing pains that will become far less common as the technology matures.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU