Hadoop, there it is: Here Comes Hortonworks

In what appears to be a change in direction, Hortonworks has released a completely open source Hadoop distribution based on Apache Hadoop that will compete head-on with Cloudera’s CDH3. The new distribution, called Hortonworks Data Platform, includes a new, open source management tool the company developed called Ambari and is currently available as a limited technology preview.

The reason I say this is a change of direction for the company, which was spun-out of Yahoo last summer, is because the message from CEO Eric Baldeschwieler and team until recently was that Hortonworks was going to focus strictly on Hadoop training and technical support, not on producing a distribution of its own.

ServicesANGLE

The idea was that, with its experience deploying and managing Yahoo’s enormous Hadoop cluster, Hortonworks would position itself as the only vendor that could help transition Hadoop early adopters from proof-of-concept deployments to full-on, enterprise-scale deployments. That is still part of Hortonworks’ message. Hortonworks also announced today a public Hadoop training course, as well as a number of other support services.

But Baldeschwieler decided the company also needed to develop a Hadoop distribution of its own. He explains the about-face in a blog post:

As we began to interact with enterprises and ecosystem partners, the one constant was the need for a base distribution of Apache Hadoop that is 100% open source and that contains the essential components used with every Hadoop installation.  A distribution was needed to provide an easy to install, tightly integrated and well tested set of servers and tools.

Hortonworks is also attempting to broaden the Hadoop ecosystem. The new distribution, which is based on Hadoop 0.20.205, includes HCatalog, a metadata management service, and other API’s aimed at making it easier for partners to integrate with Hortonworks Data Platform.

The company also unveiled a new partner program and an initial wave of partners. They include Informatica, the data integration specialist that just released a Hadoop-focused data transformation tool called HParser, and Tresata, a cloud-based Big Data analytics platform for banking that uses Hadoop under the covers to crunch massive data sets.

In a recent interview, Baldeschwieler told me Hortonworks is “completely committed to

Eric Baldeschwieler, CEO, Hortonworks

an open source business model” and “we are always going to ship Hadoop for free.” In other words, Hortonworks is basing a large part of its appeal, in addition to its experience supporting Yahoo, on the fact that its distribution is 100% open source, while Cloudera’s distribution includes some proprietary tools, including its cluster management console, Cloudera’s Services and Configuration Manager.

Hortonworks’ new distribution, partner program, and support/training services are a direct assault on market leader Cloudera. The timing is no coincidence either. Cloudera’s Hadoop World conference takes place next week in New York City, and undoubtedly Hortonworks is looking to steal some of Cloudera’s thunder in the run-up to the event.

About Jeffrey Kelly

Jeffrey F. Kelly is a Principal Research Contributor at The Wikibon Project, an open source research and advisory firm based in Boston. His research focus is the business impact of Big Data and the emerging Data Economy. Mr. Kelly's research has been quoted and referenced by the Wall Street Journal, the Financial Times, Forbes, CIO.com, IDG News, TechTarget and more. Reach him by email at jeff.kelly@wikibon.org or Twitter at @jeffreyfkelly.