

Big Data is the future, a fact both EMC Greenplum and HP Vertica recognize. But the two data warehouse vendors are taking very different approaches to Hadoop, the open source Big Data processing framework.
With the release of its own proprietary Hadoop distribution, EMC Greenplum has clearly signaled to the market that it wants to own both halves of the Big Data picture: Hadoop for processing large volumes of unstructured data and MPP data warehousing for fast-loading, real-time analytics.
EMC Greenplum wants to be the one-stop-Big-Data-shop for the enterprise. Get your commercial Hadoop distribution and MPP data warehouse appliance, tightly optimized to share data and analysis, in one fell swoop, says EMC.
In contrast, HP Vertica has said explicitly it will not release its own Hadoop distribution. Rather, Vertica customers that want to integrate the MPP data warehouse with an open source Apache Hadoop distribution can do so with a connector custom built by Vertica. The connector itself is also open source.
Vertica is betting companies don’t want to get locked in to a proprietary Hadoop distribution (i.e. EMC Greenplum’s Hadoop distro) while the open source framework is still evolving. Instead, Vertica customers can experiment with the Hadoop distribution of their choice (as long as its open and largely based on the Apache distribution) and connect it to Vertica’s MPP data warehouse appliance.
Cloudera had a loose partnership with Greenplum before EMC got into the Hadoop market, releasing a connector between its Hadoop distribution and Greenplum last fall. But with EMC Greenplum taking on Cloudera directly with its own commercial Hadoop distribution, don’t expect that partnership to flourish.
EMC will surely push Greenplum customers that want to experiment with Hadoop to use EMC Greenplum’s Hadoop distribution. Greenplum customers that go with Cloudera or another distro may not get the same support from EMC for connecting to Greenplum they might otherwise get if they chose the EMC distribution.
Vertica, on the other hand, has essentially committed to supporting connections to any open Hadoop distribution. This gives Vertica customers significantly more flexibility over Greenplum users. Greenplum customers that invest in the EMC Hadoop distribution risk vendor lock-in. If EMC’s distribution stalls or falls significantly behind competing open Hadoop distributions, customers could get stuck with an inferior technology.
The potential benefit of EMC’s approach is a complete Big Data solution preconfigured to seamlessly meld Hadoop with MPP data warehousing. But this is a long-term benefit, as EMC has lots of work to get there first. I think EMC has a good chance of achieving its Big Data vision, but it will take time. Customers that want to use Hadoop now and reduce the risk of vendor lock-in are better off, in my opinion, going with an open distribution and connecting to their incumbent data warehouse when it makes sense.
Watch live video from SiliconANGLE.com on Justin.tv
THANK YOU