UPDATED 12:21 EDT / APRIL 27 2011

NEWS

Yahoo Wants Its Piece of the Hadoop Pie


The Wall Street Journal reports that Yahoo is considering spinning off its Hadoop engineering division into a new company in an effort to commercialize the open source, Big Data-crunching software framework.

The Journal article, citing unnamed sources, states that Yahoo “would collaborate with and take a significant stake in the new firm.” Yahoo declined to comment for the story, but Benchmark Capital told The Journal it has had discussions with Yahoo about funding such a spinoff.

If the report is accurate, the move represents an effort by Yahoo to claim its share of the Hadoop spoils from upstart Cloudera. Yahoo contributed over two-thirds of the codebase to Apache Hadoop, but, unlike Cloudera, hasn’t monetized its investment in the open source technology in the form of a commercial platform.

Interest in Hadoop continues apace as companies look for ways to extract value from the huge volumes of user-generated and machine-generated data being created on the Web. Data analytics is increasingly becoming a top competitive differentiator for companies, so the market for software and services that make analytics on Big Data more manageable is likely to enjoy significant growth in the coming years.

Hadoop has become the standard method for processing Big Data, but recently it has been criticized as being difficult to implement and manage. A recent blog post from BackType Technology titled ‘The Dark Side of Hadoop,’ for example, declared Hadoop is “sloppily implemented and requires all sorts of arcane knowledge to operate it.”

Hadoop’s shortcomings actually present a market opening for Cloudera, Yahoo and others. As I’ve written before, for Hadoop to gain mainstream adoption, commercially supported versions of the framework must be developed that include easier-to-use management and configuration tools. As it stands now, there simply aren’t enough skilled Hadoop programmers to go around.

From a technology standpoint, Yahoo is in a strong position to develop a competitive commercial Hadoop installation thanks to its long history with Hadoop. Doug Cutting created the framework while working at Yahoo, and the company currently uses Hadoop to process huge volumes of click-stream data to match ads with users, detect spam in Yahoo! Mail, and pick top stories for its homepages.

As mentioned, a Yahoo spinoff will have to contend with Cloudera, a Silicon Valley start-up that launched its own commercial Hadoop platform in March 2009. Cloudera recently released the third iteration of its platform, called Cloudera’s Distribution including Apache Hadoop v3, which includes a new OBDC driver to integrate front-end business intelligence tools and improved workflow and job scheduling capabilities.

Thanks to its two-year head start over Yahoo, Cloudera already has over 80 commercial customers, including Groupon, the deal-of-the-day site much in the news lately. It also boasts an impressive brain trust, which includes a number of prominent Yahoo alums. Cloudera co-founder and CTO Amr Awadallah was formerly Yahoo’s vice president of engineering, and Hadoop creator Cutting is on the Cloudera payroll as well. Jeff Hammerbacher, Cloudera’s chief scientist, got his start processing large data sets at Facebook.

In addition to Cloudera, Yahoo will also have to compete with IBM, which offers it’s own commercial version of Hadoop built around the InfoSphere platform. And another unanswered question is how many resources – both in manpower and money — will Yahoo throw at the new Hadoop spin-off?

Going forward, expect to see more vendors getting in on the Hadoop action over the coming months. EMC, for one, plans to make a Hadoop-related announcement at EMC World in a couple of weeks, and GigaOm reported in March that start-up Mapr is building a proprietary version of Hadoop that is likely to launch later this year.

I’m currently awaiting an official response from Yahoo on The Journal story and will provide an update as soon as I hear. In the meantime, I’m curious what Hadoop watchers out there think of Yahoo’s move. For my part, I’m hopeful that Yahoo’s entry into the commercial Hadoop market will spur a new wave of innovation making Hadoop a more attractive and manageable option for mainstream (read: non-Web 2.0) companies.

In the long-term, I think there is plenty of room for more than one commercial Hadoop vendor. With over two years under its belt, Cloudera looks to be in the strongest position, not unlike Red Hat in its battle with Novell a few years back. Yahoo brings some impressive resources to bear, however, so it will be interesting to watch how the competition to successfully commercialize Hadoop plays out.

Jeff Kelly is a principal contributor and analyst at Wikibon.org. He focuses on trends in business analytics and big data technologies. Reach Jeff by email at jeff.kelly@wikibon.org or Twitter at @jeffreyfkelly.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU