The Data Economy: Pivotal, Hortonworks milestones demonstrate new Hadoop realities

The Data Economy is an analysis column by Wikibon Senior Analyst Jeff Kelly covering the business of Big Data.

hadoop ecosystem money elephantThe Intel-Cloudera deal sucked most of the oxygen out of the room this week, but rivals Pivotal and Hortonworks also made important announcements in recent days.

Both announcements illustrate the competitive nature of the current Big Data market and the challenges facing any vendor attempting to develop and sell commercial Hadoop products and services.

First, Pivotal announced a new suite offering that integrates its various Big Data products into an easily consumable package with simplified pricing. Called the Pivotal Big Data Suite, it includes Pivotal HD at its foundation with the Greenpum and GemFire databases, HAWQ, GemFire XD and SQLFire as add-on products.

Second, Hortonworks announced Hortonworks Data Platform (HDP) 2.1. It includes a plethora of improvements to the platform, including improved data governance capabilities and fully interactive query capabilities via Hive.

Hadoop Price Pressure

 .

The Pivotal Big Data Suite is just one step in a long journey for the company as it builds out its vision of a comprehensive platform that brings together cloud, Big Data and application development. The more immediate impact of the new offering is on pricing. At the core of the suite is Pivotal HD, the company’s Hadoop distribution. Pivotal HD includes Command Center for cluster deployment, monitoring and management, Graphlab and MADlib for advanced analytics, and Spring Data for application development.

The Pivotal Big Data Suite includes unlimited use of and support for Pivotal HD. Customers can deploy Pivotal HD across as many nodes and clusters as desired and leverage Pivotal’s support services at no cost other than for provisioning the hardware. This further pushes the price of core Hadoop software and support to to near zero.

In addition, Pivotal has simplified pricing for its add-on products, such as HAWQ and GemFire XD. All such add-on products are now available on a subscription basis and priced per core. A core-based pricing model is attractive because it allows customers to take advantage of improvements in hardware efficiency and decouples storage costs from analytic compute costs.

Open Source Community Innovation

 .

Hortonworks’ announcement addresses a different issue.

open source books typographyThere is an ongoing debate in Big Data circles as to whether the open source community speeds up or slows down Hadoop innovation cycle times. Some argue it speeds innovation because there are simply more smart people working on developing Apache Hadoop through the open source community than any one company could provide. Others argue the open source community can slow innovation because of disagreements between community members as to which direction to take Hadoop.

The sheer number of innovations in HDP 2.1 lends credence to the argument that community involvement speeds development. The number of new features and capabilities in HDP 2.1 are more commonly seen in a major release than in a point-release. These include the integration of a number of related Apache projects such as Apache Falcon for data governance, Apache Knox for security, Apache Storm for stream processing and Apache Solr for search.

The best example, though, is the completion of The Stinger Initiative, which brings full interactive query capabilities to Hadoop. Stinger was spearheaded by Hortonworks but included contributions from 145 developers from 45 organizations over the course of just more than a year. These organizations include vendors (SAP, Microsoft and WANdisco, for example) as well as practitioners (Google, Visa and Netflix.) The result of Stinger is 100 times better SQL query performance across petabytes of data, among other capabilities, all of which are available to the entire Hadoop ecosystem.

The New Hadoop Reality

 .

These announcements illustrate two important facts about the current Big Data market.

1. The effective price of core Hadoop distribution software and support services is nearly zero due in part to Pivotal’s new pricing model (though this was an ongoing trend for some time.) It is near zero and not at zero because adoption of Pivotal for Hadoop requires some level of buy in to Pivotal’s long-term value proposition and potential use of other Pivotal products/services in the future.

Some customers will prefer and be willing to pay for Hadoop from a more “neutral” vendor that does not push the use of its related database/analytics products and services. Still, with a free, fully functional and supported Hadoop distribution on the market, competing Hadoop vendors are under significant pressure to differentiate, either through vastly superior support services or proprietary software, to justify customers’ spending $1,000 or more per node for Hadoop.

2. Vendors that don’t embrace community innovation are going to fall behind in the long-run. The number of new features and improved functionality in HDP 2.1 generally and the success of The Stinger Initiative specifically are further proof that the open source Apache Hadoop community is capable of developing important platform-level capabilities in reasonable time-frames. Vendors that don’t engage open source community development for platform-level capabilities will also find it more difficult to gain and maintain early-adopter customer traction.

photo credit: Marius B via photopin cc
photo credit: opensourceway via photopin cc

About Jeffrey Kelly

Jeffrey F. Kelly is a Principal Research Contributor at The Wikibon Project, an open source research and advisory firm based in Boston. His research focus is the business impact of Big Data and the emerging Data Economy. Mr. Kelly's research has been quoted and referenced by the Wall Street Journal, the Financial Times, Forbes, CIO.com, IDG News, TechTarget and more. Reach him by email at jeff.kelly@wikibon.org or Twitter at @jeffreyfkelly.