Hadoop Seeing a Surge of New Products at EMC World 2011 from Greenplum to Brisk

hadoop_logo EMC World 2011 is casting the umbra of Big Data and the Cloud over Las Vegas today and Hadoop has taken its usual front-and-center stage in the spotlight. Amid announcements of EMC’s own addition to the Hadoop ecology of products, several others have also made their debut including products that enhance storage, delivery, and software distribution.

Companies such as Mellanox, Datastax, NetApp, and Snaplogic are all announcing their own Hadoop products alongside EMC today hoping to add their own weight to the overwhelming mass of Big Data enterprise. Klint Finley of ReadWriteWeb found the five of them for everyone and we decided to profile them here.

EMC Greenplum HD for Big Data analytics

Greenplum HD is an Apache Hadoop appliance that is also a distribution system designed and optimized for dealing with extremely large data sets for enterprise level solutions. As it is built atop open source software, the community edition is fully open source in a nod to that culture.

“The appliance marries Hadoop with the EMC Greenplum Database, allowing the co-processing of both structured and unstructured data within a single, seamless solution,” EMC announced in their press release on the subject. “The EMC Greenplum HD product family enables an organization to take advantage of Big Data analytics without the overhead and complexity that comes with the cumbersome tools and solutions on the market today.”

The announcement also outlined that EMC is partnering with a multitude of data companies to make adoption swifter, these companies include Concurrent, CSC, Datameer, Informatica, Jaspersoft, Karmasphere, Microstrategy, Pentaho, SAS, SnapLogic, Talend, and VMware.

Datastax Brisk powered by Apache Cassandra

As promised, Datastax has released Brisk, an open source product that uses Apache Cassandra to distribute and access large stores of data. Apache Cassandra is an open source distributed database management system and is used by Brisk to supplant a Hadoop file system; while the core file system and distribution is taken out of Hadoop’s hands (trunk?) it still utilizes its data crunching power.

Brisk claims to have hopes to simplify Hadoop distributions by using Cassandra for management.

Mellanox for Hadoop and Memcached in Web 2.0 Applications

Supplier of end-to-end connectivity solutions and data center storage systems today announced the release of an acceleration technology for Hadoop- and Memcached-based products. The Hadoop product is called Hadoop-Direct and it’s claimed to be transparent to current users and applications, and would not require any modification to existing applications. It would simply add another layer that would accelerate communication.

“Network bandwidth and usage of compute capacity per node to process network-related functions are key factors that limit efficient scale of Hadoop clusters,” said Dhruba Borthakur, distinguished member of the Hadoop Apache Development Team. “Hadoop-Direct with Mellanox networking solutions help minimize the latency of data access; the use of higher bandwidth enables overlapping communications and computation thus improving Hadoop cluster’s performance.”

Last year, Oracle partnered with Mellanox with notable investments and Mellanox acquired Voltaire; alongside listing IBM, HP, NetApp, and Isilon (now owned by EMC) as hardware OEM customers they still might have a lot of room to reach into with their Hadoop Big Data acceleration product.

NetApp Hadoop Storage Solution to Aid Big-Analytics Adoption

Not to be left out, NetApp has announced a pre-installed Hadoop application built into its NetApp E2600 platform. A marriage of products that NetApp claims shall “enable speed of deployment and simplify manageability of Hadoop infrastructure, allowing customers to deploy a solution in hours versus weeks and to dynamically expand to petabyte scale.”

NetApp’s offering appears to be a highly modular and dynamic approach to Big Data hardware and software, designed with differing sizes of solutions in mind.

While it has a base configuration of 16 or 32 nodes, it has system sizing and validation integrated to eliminate guesswork—and NetApp stresses that it will save time, “hours verses weeks.”

This product may give them the leverage against other contenders to maintain their crown as one of the top vendors.

“NetApp has a strong history of leadership in supporting new technology innovations. Hadoop democratizes access to previously untapped data and information,” said Rich Clifton, senior vice president at NetApp. “Our solution allows enterprise adopters to gain business advantage by analyzing vast quantities of information in real time. Customers can deploy the solution in hours versus weeks and scale predictably and simply.”

SnapLogic with SnapReduce for Hadoop

According to SnapLogic’s blog today, they have announced their intent to “humanize Hadoop.” They are unveiling SnapReduce, “an intelligent system for converting integration pipelines into MapReduce jobs.” As a management system, they claim that it will ease not just integration but human-Hadoop interaction alongside Big Datasets by enabling click-and-drag-ease to leverage the use of Hadoop.

“At SnapLogic we encounter companies who have massive amounts of data from web traffic, customer purchases, support contacts, social media, etc.,” SnapLogic writes in their blog announcement. “The more applications they add, the more data they create. They understand that this data holds insights into customer behaviors and preferences, market and product opportunities, operational savings, etc. Today there exist a new set of tools that can aid these companies in realizing these insights.”

The name SnapReduce obviously comes from a play on Hadoop’s relational logic workhorse MapReduce.