Does the World Need Another Hadoop Distro? Greenplum Says Yes

Greenplum is challenging Cloudera and MapR with a new Hadoop solution that delivers faster response times and better integration than the competition. Donald Miner, the solutions architect for the EMC subsidiary, blogged the news this morning.

Dubbed Pivotal HD, the distribution will be launched into general availability later this year. It will ship a couple of tools that Greenplum developed in-house: an admin platform named Command Center and a native ICM (installation, configuration and management) utility. Pivotal also features support for the Spring framework, and includes optimizations that allow for seamless deployment in environments that are either virtualized, depend on Isilon storage, or both.

The core component of Greenplum’s distro is HAWQ, a relational database that is hundreds of times faster than Hive and “orders of magnitude” speedier than competing SQL interfaces for Hadoop. Miner provided a brief overview of what the system does in his post:

“We have a ‘master’ node that has the job of storing the top-level metadata, as well as building the query plan and pushing the node-local queries down to the segment servers,” he writes. “When a query starts up, the data is loaded out of HDFS and into the HAWQ execution engine. HAWQ follows MPP architecture, streaming data through stages in a pipeline, instead of spilling and check pointing to disk (like MapReduce). Also, the segment servers are always running, so there is no spin-up time.”

HAWQ transforms Hadoop from a batch analytics system to a near-real-time data crunching engine that can respond to queries in less than a second. It’s the Greenplum version of Cloudera’s Impala.

The Big Data space is becoming more and more crowded by the day. Just a couple of weeks ago WANdisco announced its own Hadoop distribution, a free fork that ships with the company’s replication technology.