UPDATED 08:21 EST / JUNE 13 2012

Your Complete Guide to Hadoop: Issues, Ecosystem and More

Is your company interested in Hadoop? Want to look like an expert to your boss? Wikibon Principal Research Contributor Jeff Kelly has published a comprehensive assessment of the state of Hadoop and the five Hadoop vendors, focusing on the major remaining issues in this still immature technology rather than on the advantages of Big Data analysis.

The report, “Hadoop: From Innovative Up-Start to Enterprise-Grade Big Data Platform” begins with a summary of Hadoop’s history and then jumps into a discussion of the four main issues that are keeping it out of data centers. These are mainly a product of the immaturity of this revolutionary technology, and they are not inconsequential. They are:

  1. The single-point-of-failure that is the Hadoop namenode. If it crashes, your Hadoop database shuts down until it can be rebuilt, which can take hours or days.
  2. The poor integration with traditional relational databases that makes it hard to combine relational data with nonrelational to get a true 360 degree view of customers, for instance.
  3. The total unfamiliarity of this radically new technology, which does not use industry standards such as SQL, and the general lack of people trained in MapReduce, Hbase, and other Big Data technologies.
  4. The almost total lack of security for Hadoop beyond Kerberos.

These issues are being tackled by both the Apache Open Source Community and the five major commercial Hadoop companies, Cloudera, DataStax, EMC Greenplum, Hortonworks, and MapR. Each of the vendors has its own approach to solving these problems, each of which is at least partially closed, although each has also donated large amounts of important code to the Apache Hadoop community. Kelly provides a summary of the main features and issues of the products of each of these commercial vendors in the second half of this report.

So what doesn’t it have? Mainly, it makes no attempt to discuss the ways in which pioneering companies are leveraging Hadoop and Big Data to achieve competitive advantage. But that side of the coin has been well covered, both in other articles on Wikibon and SiliconAngle, coverage by other media, and of course published use cases from the vendors. This report does not question on invalidate those, it just presents the other side of the coin.

So is Kelly saying that companies should beware of Hadoop? Not at all. All of these issues are being solved both by the commercial vendors and the Apache Open Source community. He is merely warning companies of the issues that still exist that they need to plan for before they start experimenting with Hadoop.

Warning, this is not light reading. It is closer to a white paper than a normal Wikibon Peer Incite of Gartner Research Note. And obviously it cannot cover every issue or question. But if you are looking for a good grounding in Hadoop, this is as comprehensive an examination of the issues as they stand today as you are going to get.

Like all Wikibon community research, this report is available in its entirety without charge on the Wikibon.org Web site. Interested parties are invited to read it and to join Wikibon.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU