UPDATED 08:37 EDT / JULY 08 2011

Data Warehouse Vendors Differ in Hadoop Integration

Big Data is the future, a fact both EMC Greenplum and HP Vertica recognize. But the two data warehouse vendors are taking very different approaches to Hadoop, the open source Big Data processing framework.

With the release of its own proprietary Hadoop distribution, EMC Greenplum has clearly signaled to the market that it wants to own both halves of the Big Data picture: Hadoop for processing large volumes of unstructured data and MPP data warehousing for fast-loading, real-time analytics.

EMC Greenplum wants to be the one-stop-Big-Data-shop for the enterprise. Get your commercial Hadoop distribution and MPP data warehouse appliance, tightly optimized to share data and analysis, in one fell swoop, says EMC.

In contrast, HP Vertica has said explicitly it will not release its own Hadoop distribution. Rather, Vertica customers that want to integrate the MPP data warehouse with an open source Apache Hadoop distribution can do so with a connector custom built by Vertica. The connector itself is also open source.

Vertica is betting companies don’t want to get locked in to a proprietary Hadoop distribution (i.e. EMC Greenplum’s Hadoop distro) while the open source framework is still evolving. Instead, Vertica customers can experiment with the Hadoop distribution of their choice (as long as its open and largely based on the Apache distribution) and connect it to Vertica’s MPP data warehouse appliance.

Cloudera had a loose partnership with Greenplum before EMC got into the Hadoop market, releasing a connector between its Hadoop distribution and Greenplum last fall. But with EMC Greenplum taking on Cloudera directly with its own commercial Hadoop distribution, don’t expect that partnership to flourish.

EMC will surely push Greenplum customers that want to experiment with Hadoop to use EMC Greenplum’s Hadoop distribution. Greenplum customers that go with Cloudera or another distro may not get the same support from EMC for connecting to Greenplum they might otherwise get if they chose the EMC distribution.

Vertica, on the other hand, has essentially committed to supporting connections to any open Hadoop distribution. This gives Vertica customers significantly more flexibility over Greenplum users. Greenplum customers that invest in the EMC Hadoop distribution risk vendor lock-in. If EMC’s distribution stalls or falls significantly behind competing open Hadoop distributions, customers could get stuck with an inferior technology.

The potential benefit of EMC’s approach is a complete Big Data solution preconfigured to seamlessly meld Hadoop with MPP data warehousing. But this is a long-term benefit, as EMC has lots of work to get there first. I think EMC has a good chance of achieving its Big Data vision, but it will take time. Customers that want to use Hadoop now and reduce the risk of vendor lock-in are better off, in my opinion, going with an open distribution and connecting to their incumbent data warehouse when it makes sense.


Watch live video from SiliconANGLE.com on Justin.tv


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU