Hadoop is Ready for Enterprise Says Creator Doug Cutting and Former Bank of America Data Scientist Abhi Mehta

Abhi Metha and Doug Cutting

Abhi Metha and Doug Cutting Abhi Mehta was a managing director for Bank of America. Now he’s running his own big data startup Tresata, which he announced on theCube last February. Mehta returned to theCube at HadoopWorld 2011, where he talked to hosts John Furrier and Dave Vellante about Apache Hadoop in finance and the enterprise. Mehta was joined by Hadoop creator Doug Cutting, who added further details about the Hadoop ecosystem.

ServicesAngle editor Alex Williams has been researching Hadoop’s readiness for the Fortune 1000, and he’ll be sharing that that work soon. We also heard some skepticism about Hadoop’s appeal specifically in the financial sector. Mehta believes that Hadoop is enterprise ready, and his work at BoA seems to invalidate the idea that the financial sector’s not interested in Hadoop.

Tresata is in the business of providing big data analysis as a service for finance, and its tools are built on Hadoop (our coverage is here). Mehta says Tresata already has 10 customers. Cutting also said that big companies are already using Hadoop. It’s still early, but it’s happening. Tresata is one of many analytics companies trying to make Hadoop useful by building analytics and/or business intelligence applications on the Hadoop stack. By offering the solution as a service, Tresata is effectively routing around any skepticism big enterprises might have about managing Hadoop clusters – that work is being done by the Tresata eam.

When asked what has surprised Mehta about the growth of big data, he says the level of global interest. People form all over the world signed into a webinar that Tresata ran. Mehta believes that every major bank will be using Hadoop in five years.

John said that it used to be that you couldn’t get funding for a “feature” instead of a product, but that’s changing. Now you can build a company around a feature. Mehta agrees, he says that Tresata is an analytics feature built on Hadoop. Cutting said that you can add value at the component level – no one has to own the whole stack, and in fact no one can own the whole stack.

Mehta said we don’t really need any new distributions of Hadoop – the next generation of companies and development should be focused on building more tools on top of Hadoop. Tresata is going after finance, but Mehta said health care is still a wide open vertical for Hadoop-based startups.

On the subject of whether Hadoop was in risk of being forked, John said that it wasn’t really an issue. Cutting said that people are collaborating productively. He wouldn’t speak to whether all the players like each other, but said it didn’t matter. They were all working on the project together. Mehta pointed out that open source isn’t just an ideology. It solves real problems – problems that can’t be solved otherwise.

Asked what he’s working on now, Cutting explained that he has essentially three different jobs. He spends a lot of time speaking at events and to the press. He serves as the chairman of the board at Apache. And he does development work on Apache Avro, a data serialization system that supports dynamic data types and easy integration with dynamic programming languages.

I covered Apache BigTop earlier today. My take: Cloudera is banking on this becoming the definitive distribution of Hadoop, with companies like Tresata building on that. Building out the Hadoop stack with new tools like Avro while providing a platform for other companies to offer analytics as a service is a good play as it provides value to the community as well as to Cloudera.

You can watch the segment here:

You can also watch theCube’s earlier interview with Mehta here. In it, Mehta, while still at BoA, talked about the idea of data factories.