Data as a fuel : Transforming how we evaluate, measure + act on consumer behavior | #BigDataSV


abhishek-mehtaBroadcasting from #BigDataSV last week in Santa Clara, California, John Furrier and Jeff Kelly, theCUBE co-hosts, continued their quest in talking to the best Big Data practitioners by interviewing Abhishek Mehta, the CEO of Tresata. He was asked to elaborate on the challenges of being an entrepreneur and to pinpoint the trends and the blindspots of the industry.

“There’s an interesting metric where people equate success to how much money you’ve raised. I equate success to how much money I’ve made. We’ve had an awesome year,” boasted Mehta. “The market has reached the point that you and Dave and I spoke about four years ago: data factories and their ability to monetize data in ways never possible before. However, the ability to do that depends on domain knowledge, science and technology coming together seamlessly. Big Data is a bigger ecosystem than Hadoop. The era of building and commercializing tools is behind us,” reckons Mehta.”Democratization and commoditization are two sides of the same coin. As Big Data technologies democratize, they will commoditize. The only way to extract value is going up the stack and finding problems to solve with Big Data.”

“When you commoditize the infrastructure, the middleware and the applications, what’s left? Where’s the value?” asked Furrier. “If the stack becomes commoditized, where’s the value shift?”

Business leadership with technical expertise


“As an industry, we have to realize that companies that are hundreds of billions of dollars in revenue, on a stack that includes databases, storage, BI and analytical software, are under attack. Both with their legacy stacks, as well as from this revolution called open source. The question is ‘Is the commoditization going up the stack?’ The answer is ‘Absolutely yes’,”commented Mehta. “I don’t think that the Big Data market is a technology market; this is a business revolution. Where you add value isn’t technology, is not data, is not science, but the combination of the three.”

“When approaching a client, and presenting the massive explosion of data, the only way to make money is to understand customer behavior. But understanding customer behavior is a hard thing to do. It’s a combination of open data, open tech and open science. These three components make something unique. That’s where the value is, and it cannot be commoditized. The expertize is very hard to replicate,” stated Mehta.

He continued: “Commoditization will rapidly expand up the stack; IT is a $3 trillion market. 80 percent of that is enterprise software. That’s the value at stake.”

The language of business vs the language of tech


“You just said that if you are doing a database tooling, you should fold your business and shut down,” laughed  Furrier. “Are you impressed with the current start-ups?”

“I love the fact that there’s so much confusion and lack of quality in start-ups. In a way, it helps with sorting them out. There are three trends I see: 1) the days of selling databases for hundreds of millions of dollars is over. The enterprise buyer is smart enough to realize that algorithms are free, just as BI and database functionality. They struggle to figure out how to take all these open source components and find value, but they are smart enough to realize open source works; 2) the buyer itself is changing; you no longer sell to the CIO. You are selling to the CMO, CFO, CEO etc; 3) applications represent the next generation predictive analytics software; you are delivering actionable intelligence, at scale, for a variety of problems that you can monetize tomorrow,” explained Mehta.

Data as a fuel


“This is a Darwinian moment in our lifetime,” stated Mehta. “It’s a 50-year boom cycle. Data as a fuel will fundamentally transform how we evaluate, measure and act on consumer behavior – from education all the way to fraud or disease management.”

“You’ve built an application which is getting a lot of attention,” jumped in Kelly. “Tell our viewers a little more about that.”

“Two years ago people were praising Hadoop but were reluctant about it because it was batch. I told everyone not to underestimate the open source community. Spark was the genesis of that. It brings to Hadoop an in-memory, real-time, computational capability. In a simplistic way, Spark is an in-memory database like framework, that allows advanced algorithms to run in-memory, at scale,” explained Mehta. “The cool part about Spark is it works natively with HDFS. Unlike other in-memory systems and engines where you have to go and re-populate data, trying to find insight on different populations, Spark will do this dynamically. We often get asked why we are focusing on ecosystem and not build virtualization. We’ll only build virtualization when scale is the insight,” Mehta added.