UPDATED 14:09 EDT / MARCH 02 2012

NEWS

MapR: Why Another Hadoop Distribution?

MapR came out of stealth in March 2011, not long after Datastax announced its own alternative distribution of Hadoop (which has since been rolled into its Cassandra distribution). Hadapt launched at the same time and meanwhile there were rumors that Yahoo would spin-off its Hadoop team (which ultimately happened, in the form of Hortonworks).

The Hadoop wars were just getting started and there were real questions as to whether MapR would be a viable contender. But now almost a year later MapR is still standing and it has a partnership with EMC to sell MapR’s distribution as part of the Greenplum HD Data Computing Appliance.

Yesterday at Strata MapR VP of Marketing Jack Norris appeared on theCube to talk about the company’s strategy and why MapR saw a need for another distribution of Hadoop.

Norris explained that the MapR team saw three main issues with Hadoop that they could address. He says that while the company was in stealth mode, the team talked to over 1,000 customers about Hadoop requirements and pain points. He jokes that they were probably the loudest stealth startup ever.

MapR decided to build a business around fixing these problems rather than wait for the community to solve them:

1. Ease of use: One of the main features of the MapR distribution is that you can mount the Hadoop Distributed File System (HDFS) using MapR’s “Direct Access NFS” so that anyone can browse the data stored in the system and drag and drop data into or out of Hadoop without the need to write code.

2. There are multiple single points of failure in Hadoop: Norris says that because of the points of failure in Hadoop you risk data loss. MapR wants you to be able to use Hadoop for mission critical data with high availability, so it adds features to make Hadoop more fault-tolerant.

3. Performance: Comscore is one customer MapR has already poached from Cloudera. Norris says that after migrating, which he says took only a couple days, a developer at Comscore spent days debugging a job because it finished so fast he couldn’t believe there wasn’t something wrong.

These are all compelling reasons to use MapR, but what about the team? Cloudera and Hortonworks both boast many of the core developers of Hadoop. Doug Cutting, the original developer of the project, works at Cloudera.

MapR co-founder and CTO M.C. Srivas worked for Google, running one of the infrastructure teams that ran and developed MapReduce, GFS and BigTable – the technologies that Hadoop, HDFS and HBase are based on. The rest of the team has extensive experience working in enterprise technology, with quite a few of the executives coming from storage and infrastructure backgrounds. Although having many of the core innovators of Hadoop on staff is a big selling point for Cloudera and Hortonworks, I can certainly see how the MapR team’s experience addressing enterprise needs could also be a major advantage. and Srivas’ involvement brings credibility to its modifications.

The other issue is whether a proprietary distribution is will fly in the open source community, and whether using a proprietary system could induce vendor lock-in. To the latter point, Norris argues that MapR users are less locked in than Cloudera customers because it’s so easy to move data in and out of the system.

ServicesAngle

Norris emphasizes that MapR is building a product and not just selling services on top of an open source project. The company is doing all of its selling through partners like EMC and others that Norris says will be announced soon. MapR sells through these channels, but provides direct support and services to customers.

As I’ve mentioned the pure open source model pioneered by Red Hat and followed by Hortonworks is an admirable but difficult model to adopt. Open core has become a more common business model. Today, product companies want to have services and services companies want to have products. The line continues to blur.

I see two big challenges ahead for MapR:

1) It risks alienating the open source Hadoop community. Enterprise buyers may not care if a product is open source or not, but much of Hadoop’s value and momentum has been derived from the Apache project and the ecosystem that developers have built around it (ie, all the stuff included in BigTop). MapR will need to be active contributors to the broader Hadoop ecosystem and be team players in the community or risk being shunned by the developers and data scientists that are so crucial to making big data projects happen.

2) At the moment the company is focused on selling to companies that are already using or experimenting with Hadoop. This low hanging fruit strategy makes sense as a starting point, but it will need to ramp up its efforts to bring its product to companies that don’t yet have Hadoop initiatives, and that will mean having a strong services arm to help companies get started with big data.

3) With EMC hedging its bets with its own Isilon/Hadoop integration, MapR needs to expand its partner ecosystem to be more resilient. Norris says that’s happening, which is good news.

That said the company is off to a great start solving real problems for its customers.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU