UPDATED 11:07 EST / APRIL 12 2011

Cloudera’s New Spin on Hadoop, Open Cloud

A few hours ago Cloudera announced the general availability of the Cloudera Distribution Including Apache Hadoop v3 (CDH3), which carries a very large number of new benefits. Cloudera has drastically set of supporting tools for the open source data management framework, and added nothing short of 7 new programs.

This is really an opportunity for Cloudera to go mainstream, moving beyond the fringe of the Web 2.0 era.  Cloudera continues to mature its codebase, and has a huge advantage to lead its competitors at this point. EMC, IBM and Yahoo are rumored to do their own open cloud initiatives for Hadoop, though this latest development further establishes Cloudera in the industry.  For Yahoo in particular, it’s rumored that EMC is going to fund a collaborative Hadoop distribution with the company.  It’s perfectly logical to see competition to Cloudera, but right now way ahead of any competition. This marks a further maturation of the codebase.

In addition to the core Hadoop system, Hive data warehouse software Pig data flow scripting language, the latest distribution now comes with data aggregation tool Flume, data format converter Sqoop, a Hadoop graphical UI called Hue and a configuration tool called Zookeeper, among others.

Hadoop also features integration with BI and exact load and transform tools:

“With CDH3, Cloudera makes the full power of Apache Hadoop available to the widest range of today’s flexible and varied enterprise IT architectures with improved performance, greater stability and durability, extended authentication support, and integration for business intelligence tools and RDBMS systems.”

The free CDH3 package can be downloaded as an rpm, deb, vm or tarball installation and is compatible with Rackspace and Amazon Cloud as well as Red Hat, CentOS, SuSE and Ubuntu Linux.

CDH3 carries hundreds of bug fixes spanning all the 11 components of the new distribution, as well as greatly improved performance. Small MapReduce jobs run up to 3 times faster, and files system I/O is up to 20 percent faster with double performance in n HBase query throughput. CDH3 is more stable, and also brings improved functionality and usability with a new OBDC driver. Hadoop now elimantes a lot of the work which involved the 40+ steps to get data into Hadoop, reorganize all that data once its Hadoop and then to export the resulting dataset back out.

On the security front, Cloudera added authorization support for MapReduce, HDFS and Oozie. The company will release quarterly updates to CDH3.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU