Cloudera’s Bag of Hadoop Goodies Ships with CDH4

Hadoop distributor Cloudera announced the fourth version of the Cloudera’s Distribution Including Apache Hadoop today, exactly 12 months after the launch of v3.

CDH4 is now available in public beta, giving time for community to improve the feature set that Cloudera’s engineers have released with this fourth edition.  And those are quite impressive on their own already, especially the work done around the apps used alongside the core Hadoop engine.

Cloudera says it made HBase, HDFS, MapReduce, Flume and system-wide data compression much speedier than it was before, going as far as saying that the new versions “set a new standard for Big Data management systems” in terms of performance. Flume, Sqoop, Hue, Oozie and Whirr, which are also all available under an Apache license, have been tweaked as well.

Extensibility is another big focus for CDH4. HBase co-processors and an open source  resource management model can be used by developers looking to build their own real time data-powered apps, in addition to the improved API.  The interface now offers more access, specifically as it pertains to integrating third party BI components with the distribution.

The other changes that Cloudera revealed today include a high-availability Hadoop FileSystem NameNode that makes the platform more relevant to enterprises, and a security update. A permission system has been added to HBase and access control lists to Fair Scheduler.

Here’s a quote from a Nokia and her take on CDH4, which evidently didn’t disappoint.

“Data is imperative to our business and Cloudera’s Distribution Including Apache Hadoop is at the center of our analytics ecosystem,” said Amy O’Conner, Senior Director of Big Data at Nokia. “The new release of CDH reinforces why we selected Cloudera and continue to partner with them: an open system that has been integrated with enterprise functionality and is delivered with robust support.”