The Hortonworks team of ex-Yahoo Hadoop developers is focused on their announced goal of getting half of the world’s data onto Apache Hadoop within the next five years, says Hortonworks Co-Founder and Architect Arun Murthy. To do that it is developing a high-availability, massively scalable, fully open source version of Apache Hadoop based on the team’s work at Yahoo, where they ran Hadoop MapReduce across 50,000 machines.
The Next Generation Resource Manager, which will provide high-availability on customers’ Hadoop systems, is already in field testing with a limited number of users, he told SiliconAngle CEO John Furrier and Wikibon.org Chief Analyst David Velante in an interview from HadoopWorld 2011 in New York City on Nov. 8 and webcast live over SiliconAngle.tv. But, while high-availability and massive scaling will be a big deal for users, Hadoop needs more to meet the goal.
Until now, the only processing methodology Hadoop has provided has been MapReduce. While that is useful for many kinds of analysis and, says Murthy, will remain the main processing approach for Hadoop, it is not suitable for everything. Hortonworks is already looking beyond MapReduce and specifically is working to bring support for Message Passing Interface (MPI), used on many high performance computing systems, to Hadoop.
“MPI is the right way to do a subset of applications,” he says. “Today it is hard to run both a Hadoop cluster and an MPI cluster. Next generation will let you manage them in the same way, deploy them in the same way, and then process the data in the best way possible. By combining them in one compute framework instead of two separate frameworks, and running them with a single operations teams rather than two, it brings the costs down dramatically.”
As with all Hortonworks Hadoop iterations, this will be completely open-sourced. And while Hortonworks might look like a competitor to other Hadoop platform developers such as CloudEra, “in the end we are all in the business of improving Hadoop, and to do that we have to talk to each other,” he says. “We are very focused on working with them on developing the Hadoop core. If that doesn’t improve fast enough none of us will be in business long.”
The Hortonworks business play, Murthy says, is based on providing service rather than selling products. Companies can get the technology for free, but without deep technological knowledge and experience they are limited to running small clusters. And for the kind of massive scale data sets involved in big data analysis, “you don’t want to be running lots of 10-node clusters. You want a 1,000 node cluster.”
As you move to those larger clusters, “at the end of the day you want to call on the people with the most experience. We have a rich history of running very large Hadoop clusters from our years at Yahoo. Companies will want that as they move forward.”