Newly formed partnerships between large technology firms and Hadoop startups have refocused attention on the open-source data analysis software. But can such high-profile alliances give one framework the edge over the rest?
The most recent of these partnerships went down last month, when Hewlett-Packard Co. announced a massive $50 million investment in Hortonworks Inc., At the time, HP said the plan was to integrate Hortonworks’ framework into its HAVEn Big Data suite of tools.
The point of HP’s investment isn’t to dominate the Hadoop ecosystem but rather to create channel partner ecosystems for selling its hardware, said Wikibon’s Jeff Kelly in a blog post.
Less clear is the impact the HP/Hortonworks partnership, or the earlier alliance between Cloudera Inc., and Intel Corp., will have on customers as they consider which Hadoop framework is best suited for their needs.
Organizations face numerous challenges when trying to get the best out of Hadoop, and the likelihood is that any large technology firm that makes Hadoop central to its core Big Data offerings will need to offer customers help with those problems. A good example of this is TradeMONSTER Group Inc., an online brokerage that uses Hortonworks’ Hadoop for transaction data analysis.
Sanjib Sahoo, CTO of TradeMONSTER, told the Wall Street Journal he’s been unable to find a solution capable of integrating transaction processing and analytics into a single system for an affordable price. Because of this, TradeMONSTER has to replicate data from its transactional relational database to a Hadoop system for analysis. That process is inefficient for a business in which speed is a competitive advantage, he said.
“From a CIO’s perspective, the lesser number of stacks or systems you have to maintain, the better,” said Mr. Sahoo, who explained that an integrated transactional and analytics system would be better for his data-intensive organization.
It’s difficult to say which of the three major Hadoop frameworks is superior; Cloudera, Hortonworks and MapR all offer their own advantages and disadvantages. The determining factors for most organizations running Hadoop are rarely the same, because each user has unique requirements.
In the case of Rubicon Project, a company that automates the process of buying and selling advertising, their decision to go with MapR’s Hadoop framework was purpose-driven.
“We started using Hadoop in 2010 when there was no redundancy on the open source version of Hadoop, so we had to make sure we had engineers who knew how to work with Hadoop,” explained Jan Gelin, vice president of engineering and chief system architect at Rubicon Project. “Since then we’ve worked with MapR because it had full redundancy built in its framework, but there are many viable options depending on what you need.”
The vendor advantage?
But there are some cases where the backing of large technology vendors could influence decisions over which framework to choose. Once again, an organization’s individual needs are paramount, but it could be that certain hardware vendors are best placed to help customers meet those needs.
Johann Schleier-Smith, co-founder and CTO of social networking site Tagged.com, which runs Hortonworks Hadoop, told SiliconANGLE that “improving documentation ranks highest on the wish list, and it’s a place where vendors could really help.”
Integration could be another selling point. Schleier-Smith said combined hardware and software offerings would likely be a compelling advantage for some customers. “The option of integrating Vertica strengthens the Hortonworks offering,” he said. “It’s interesting news and something we’re likely to be looking at.”
For now, the Hadoop market remains fairly immature, with the vast majority of organizations not even using it in live production systems. Last October a survey by Forrester Research found that 45 percent of Hadoop users were doing so in a “proof of concept” test, versus just 16 percent that were using Hadoop in production.
But that immaturity means there’s a big opportunity there for the taking. And with big money backing from vendors like Intel and HP, startups like Cloudera and Hortonworks are best placed to make the most of it.