It’s nearing the end of Hadoop World. Over the past few days we’ve been talking to developers, system engineers, data scientists and the entrepreneurs who are building a next generation of apps using big data.
Here are five themes I heard from attendees:
Hadoop is Big but is it Growing too Fast. Hadoop’s growth is exponential. But the interest in it is so great that there is a concern that there may be a split in the community. Klint Finley wrote a post that looked at five tools built on Hadoop, which include Apache Mahout, Golden Orb and Datameer Analytics Solutions. There are many more. The need is for more services to layer on top of the open-source offerings to cater better to people who do not know how to get the analysis out of Hadoop. Adding to this growth is the $100 million fund by Accel Partners for big data apps. But with the growth comes the danger of mutation. There are alternative distros emerging such as what MapR is developing.
A Need for Training. There were any number of service providers pitching Hadoop training. MapR pitched the company’s free training. Booz Allen Hamilton had a booth on the vendor floor. The consulting group is looking for talent it can employ for its growing data analytics practice. Part of its promise is training. Cloudera and Hortonworks are both offering training. Developers are hungry for more knowledge. One developer I spoke to said she is now learning more programming languages than she ever imagined she would have to know to fully grok the various services on top of Hadoop. That points to another issue: Hadoop’s complexity.
Real-Time? HBase is the database layer built on top of Hadoop. I spoke to one developer from Travelocity who said they now use Hadoop for optimizing marketing and personalization. The data comes in once day and is analyzed. HBase would provide the capability to measure the advertising every 15 minutes.
DevOps. The DevOps movement may be what transforms IT more than the cloud or big data. And DevOps fits with Hadoop. You can see it in the community that is here. It is comprised of operations and programming professionals. To make the most of Hadoop, companies need to have systems administrators and programmers working together to deploy, manage and harvest the data across a distributed infrastructure. You need configuration management tools to automate the tens, hundreds of thousands of nodes across the network. The operations people do not need to know how to program but they need to better understand the program. Developers can’t just dump their code in the laps of the operations team. They need to understand how it is deployed. The two do not always get along but Hadoop makes DevOps almost impossible to avoid. To really use Hadoop, the teams need to better cooperate.
Hadoop Needs to Be Easier to Use. I heard this complaint a lot: “It’s easy to get the data in but really hard to get the data out.” This goes back to the issue about Hadoop’s complexity. There are automation layers getting developed that will make it easier for people with lesser technical skills to use Hadoop. How? Tableau is offering data visualization for Hadoop. Karmasphere offers an SQL-type language and graphical interface for doing data analysis with Hadoop. HP Vertica, Aster Data and Revolution Analytics are providing new tools for working with Hadoop as well.
Klint Finley writes:
Datameer provides wizards for setting up data integrations and a spreadsheet style interface for working with data and creating visualizations. It supports multiple Hadoop distributions, including those from Cloudera and MapR.
Hadoop forces developers and operations people to embrace DevOps. It optimizes online services.
But the challenge in the year ahead is to make it simpler to use and less geeky. More apps will help in this but until then Hadoop will still take some time to be truly ready for businesses without considerable engineering resources.
Hadoop’s growth and status in the market is without question showing the need for better big data tools that anyone can use.
Here’s Klint and me summarizing the event and talking about the future of Hadoop and DevOps with Dave Vellante and John Furrier on theCube: