UPDATED 10:14 EDT / AUGUST 26 2011

NEWS

Rejecting “Closed Source” Label, MapR Contrasts Its Hadoop Approach to Cloudera

Here’s the simple version of the story: Cloudera’s enterprise Hadoop distribution, CDH3, is open source, while MapR’s enterprise Hadoop distribution, M5, is closed.

The reality is more nuanced, according to Jack Norris, MapR’s Vice President of Marketing.

Open source, Norris said, doesn’t simply mean that the source code of a given technology is freely available for anyone to download and modify. More importantly, for enterprise customers especially, open source also means that a technology is economical (read: significantly less expensive than comparable commercial technology), that there is a robust community of contributors improving the technology, and that it removes the risk of vendor lock-in.

Based on these three criteria, MapR is indeed an open source vendor, Norris contends. The last criterion – open and available code – is the least important to actual users, he said.

But even on that front, much of MapR’s Hadoop distribution (which EMC ships with its Hadoop-ready Greenplum appliance) is in fact made up of open source components whose code is available via the Apache Hadoop project, including Oozie, Sqoop and Mahout. The popular perception that MapR’s is an entirely “closed,” proprietary Hadoop distribution fails to take this into consideration, said Norris.

Open Source Perception Versus Reality

That perception is largely based on two facts. First, MapR has kept the code to M5’s core infrastructure – including MapR’s Lockless Storage Services and High Performance Map Reduce Direct Shuffle – closed. Second, MapR does not contribute to the Apache Hadoop project nearly as much as competitors, namely Cloudera. Of the 40 or so Hadoop Common Active Committers, six are Cloudera employees. None work for MapR.

MapR does not share the code for its storage services and MapReduce layers because it believes they are the company’s major differentiator – it’s secrets sauce. It’s part of MapR’s approach to the Hadoop market, one that contrasts significantly with Cloudera, according to Norris.

As MapR sees it, Cloudera is attempting to add value to Hadoop deployments by wrapping services and a management console (SCM, which is proprietary to Cloudera) around a fully open Hadoop distribution.

MapR also provides services, but its major value add to customers is its proprietary core Hadoop infrastructure – data storage layer and MapReduce engine — that makes Hadoop less expensive to use and significantly improves performance, Norris said. You can customize your MapR Hadoop distribution with any of the various open source Apache components, but leave improvements to the infrastructure engine that powers Hadoop to MapR, goes the thinking.

“We just think this is the right model,” Norris told me.

MapR: Hadoop Unlike Other Early Stage Markets

Rather than tinkering and experimenting with Hadoop, more and more organizations are looking for an enterprise-ready Hadoop distribution  that can be quickly put into production. If that premise is correct, MapR could be in a good position to gain market share over Cloudera.

But it’s not clear to me that the market has reached that point yet. Most Hadoop enthusiasts that I come across are technically savvy engineers – hands-on types – not executives that are already sold on Hadoop and are looking for/expect to find a stable, enterprise-ready product. In other words, MapR may be a bit ahead of the market.

As it stands now, the Hadoop community is passionate about the open source nature of the technology. That most of MapR’s distribution is Apache Hadoop-compatible is overshadowed, in many Hadoop practitioners’ eyes, by its proprietary core infrastructure layer. I think most Hadoop practitioners, at this stage in the Hadoop maturity lifecycle, want the option to modify the framework’s internal plumbing even if they have no intention of doing so.

MapR doesn’t give them this option, so for many that justifies calling the vendor’s distribution “closed” even though the reality is not so black-and-white. For that reason, it looks to me like the Cloudera’s “open source” approach – even if its more perception than reality – is the best bet at this point. But that doesn’t mean the market won’t/can’t catch up to MapR’s approach.

Services Angle

I also think services, not just the core technology itself, is going to be play a critical role in spurring Hadoop adoption. Both MapR and Cloudera seem intent on letting a third-party service provider ecosystem develop rather than making services part of their main value-add. Cloudera’s Vice President of Product Charles Zedlewski, for example,told me last month that the company views itself as a technology company, not a services provider. But until that third-party services ecosystem emerges, Hadoop vendors will need to pick up the slack. Again, from my perspective, Cloudera is in the better position on this front. But neither vendor is in a particularly strong services position.

The Hadoop stakes are high. The difference between MapR’s approach and Cloudera’s approach to commercializing Hadoop could determine which of the two becomes the de facto Hadoop vendor. And being the go-to-vendor rather than a distant second option in a potentially billion-dollar market is no insignificant detail.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU