Hortonworks: Half the World’s Data Will be on Hadoop in 5 Years

Hortonworks logo

Editor’s Note: Wikibon Analyst Bert Latamore recently participated with his colleagues in an interview with Hortonworks CEO Eric Baldeschweiler. The following is a profile by Bert and analysis of the market provided by myself,  Alex Williams, and ServicesAngle senior writer Klint Finley.

In 2006, Hortonworks CEO Eric Baldeschweiler was the chief architect of Web search at Yahoo. At that time he and his team decided to apply Apache Hadoop to the problem of improving Yahoo’s Web-search platforms. However, other science teams in the organization also quickly began to find uses for Hadoop in such areas as e-mail spam detection, advertising and home-page customization. What began as a research and development project is now to the point that Yahoo runs a central Hadoop service with more than 1,000 active internal users, running on more than 40,000 servers.

Today, Baldeschweiler is CEO of Hortonworks, a company that this past summer spun out from the Yahoo mother ship. In just a few short months, Hortonworks has positioned itself as a major competitor in the big data market.

Since the summer, Hortonworks has fast emerged as a competitor to Cloudera, the reigning Hadoop power in the big data world. Fast emergence is showing that it too has a world-class software distribution and the potential to become a giant in the big data community.

The next year will determine if there is room for multiple Hadoop software distributions. For Hortonworks, it means ramping up partnerships and taking the lead in marketing its products and services. The momentum is behind Hadoop but it’s not clear who will emerge as the overall leader.

The Background

Hortonworks is unquestionably qualified as an organization to drive the development of Hadoop. The company is built around 22 former Yahoo Hadoop developers and architects, including Baldeschweiler himself. “We’ve been the drivers behind every major release of Apache Hadoop since its inception. We have unparalleled deep domain expertise in Hadoop.”

The experience is paying off. Hortonworks has announced a series of partnerships, many that have been announced just last month. In November Hortonworks announced partnerships with Datameer, Informatica, Karmasphere and Pervasive Software.

On October 12 the company announced Microsoft as a partner. This strategy, says Baldeschweiler, is designed to accomplish two important goals:

  • Growing the Hadoop user base from the few dozen organizations using it today to thousands.
  • Continue to build Hadoop to ensure that it emerges as the standard for big data.

That second part is critical.

A host of competing semi-proprietary offerings have gained momentum in the past few months, raising concerns that we could see what happened in the 1980s during the Unix wars. That kind of split could slow the market and create divisions that would slow the innovation cycle for years. Meanwhile, LexisNexis Risk Solutions has open sourced its own big data analytics platform, HPCC, and spun out a new company to support it called HPCC Systems. HPCC doesn’t have much traction yet, but it has gained maturity from years of use at LexisNexis and could eventually usurp Hadoop’s role as the standard for big data.

But Baldeschweiler does not seem overly concerned that this might happen. He points to Hadoop’s growth as a a sign of how the market will transform.

“We think that half the world’s data will be on Hadoop in five years,” he says.
Accomplishing that goal will require the participation of a large number of vendors offering open and closed source products built on the Hadoop technology. The partnership with Microsoft will team engineers and marketing staff from the two companies to optimize Hadoop for Windows and to bring it to Windows users.

Some of the other partnerships will result in closed source products built on top of Hadoop, for example, to serve vertical markets. Further, he anticipates partnering with some of the big systems integrators to help them build Hadoop knowledge and skills into their teams.

The Role of the Relational Database?

The rise of Hadoop does not mean traditional methods will go away. Instead, it is more likely we will continue to see a level of innovation to optimize Hadoop to work with more standard relational database management systems (RDMS) and data warehouses.

“When we say that we believe half the world’s data will be on Hadoop in five years, we don’t think Hadoop will replace traditional systems,” Baldeschweiler said. “We think the growth of non-traditional data processing will far exceed the growth of those systems just because enterprises today are dropping a huge percentage of the data they generate on the floor.”

For instance, he says, companies are using Hadoop databases to bring their offline archives back online, so they are retaining much more data for much longer than they used to.

Actually, he says, Hadoop databases may help to drive the RDBMS market. “We see a strategy that makes sense to have the fine-grained data in Hadoop and then process the out-takes in a datamart or cube. We’re seeing a number of vendors doing that, and it’s a very viable solution.”

Hortonworks is dedicated to continuing to develop Apache Hadoop as a complete, totally open source product, he says. “We are not shipping parts of it for free but then saying, if you want to go into production you will need these other pieces that are proprietary. We want people to be confident that they can build their whole application for free, and if they want to engage us for training and support, we’ll be available.”

That’s the big difference between Hortonworks and Cloudera. Cloudera sells its Cloudera Enterprise solution that includes a proprietary management solution called Cloudera Manager. The company recently released a free as in beer version of Manager, but it remains a proprietary project. Hortonworks is releasing its managements tools as free open source software.

To help spread the gospel, Hortonworks will provide training programs built on its knowledge of Hadoop. Baldeschweiler hopes to have the alpha versions of the first of those programs ready for testing by the end of 2011.

The commitment to Hadoop continues to play out. The Hortonworks team worked on version 0.20.205 up to its release. It is now focused on development of 0.23.0, presently an alpha release intended for further development and not for use in production environments.

“Lots of people sell applications on top of Linux as well. We believe that Hadoop, the platform, should be free. And to grow the ecosystem we want as many vendors to come in with as many solutions as possible. Closed source, open source, we expect a variety of both. We’re focused on evolving Hadoop so it is extensible, so if people want to bring value-add differentiation, it can be done well using the Open Source foundation. So once Microsoft has Hadoop on Azure, Azure is a very differentiated service. But we want them to use the same Hadoop everyone else is.”

He hopes that the beta version will be ready at the end of the 2012 first quarter.
None of this, he says, conflicts with development by other vendors.

Why Will Hadoop Succeed?

Potential users should not believe the fear, uncertainty, and doubt (FUD) being spread by vendors of proprietary big data technologies. Hadoop is here to stay.

“Yahoo, Facebook, Ebay, etc., – lots of Internet companies – are betting their businesses on Hadoop, and lots of real dollars are flowing through their Hadoop installations. There’s been successful installations at banks and transaction processing centers at ‘real’ enterprises as well, not just Internet companies,” he says. “Hadoop has been used on a very large scale, solving problems for a number of years. The same can’t be said for competitive technologies.”

That’s holding true so far, with the exception of HPCC, which isn’t proprietary. Microsoft discontinued development of LINQ to HPC, its own Hadoop alternative, in favor working with Hortonworks on Hadoop, acknowledging that big data’s future will like be either Hadoop or another proven, open platform.

The Hadoop community is entering the next phase of development. Technology innovation will matter but marketing and overall market acceptance for Hadoop will become the true challenge for Hortonworks in the year ahead.