UPDATED 12:28 EST / JUNE 24 2016


Hadoop and beyond: A conversation with Hortonworks CEO Rob Bearden

As the first big data company to go public, Hortonworks Inc. has been a natural target for both competitors and investors — never more than now.

The Santa Clara (Calif.) company sells subscriptions and services for its Hortonworks Data Platform, which is built upon the open source software Apache Hadoop for storing, processing and analyzing huge amounts of data.

Since its spinoff from Yahoo in 2011 and its initial public offering of stock in December 2014, Hortonworks has seen increased competition from larger companies such as Hadoop rival Cloudera Inc. and upstarts such as MapR and Databricks Inc. that are embracing Spark, a newer data processing engine, which Hortonworks also supports. Not least, Amazon Web Services and Oracle are getting into the big data game.

At the same time, amid uncertainties over how fast more companies are willing to start using Hadoop, investors aren’t as enthusiastic as they used to be. The five-year-old company’s shares plummeted after it announced plans in January for a $100 million secondary stock offering because investors questioned why it needed to raise so much money not long after its IPO, especially at a depressed share price. Shares are down 45 percent from the start of the year.

Still, Hortonworks continues to grow at a breakneck pace, with first-quarter revenues rising 85 percent, to $41.3 million. Subscription billings, which comprise 80 percent of sales, were up 122 percent. And Hortonworks reiterated a forecast that it would turn cash-flow positive in the fourth quarter.

Hortonworks Chief Executive Rob Bearden (pictured above) will talk up the company’s goal to become the leading company that helps companies manage all their data in one place when he keynotes the Hadoop Summit his company and Yahoo are hosting June 28-30 in San Jose, Calif. (* Disclosure below.)

In an interview with SiliconANGLE, Bearden described Hortonworks’ increasingly expansive corporate strategy, how the company aims to keep up with new big data technologies and why being a public company provides an edge over competitors. This is an edited version of the interview. (And you can view another interview with Bearden by SiliconANGLE Media co-CEO John Furrier, and in the linked YouTube video below.)


Q: What’s the megatrend you’re betting on here, and how is Hortonworks trying to address the opportunity?

A: We are focused on being able to bring all data under management. That begins with the data from the point of origin, like a sensor or a clickstream or even a video, and bringing that under management, engaging with that data while it’s in motion, and processing it to make decisions. That can transform customers’ business models from being reactive to their customer post-transaction to being more interactive with their customers and their supply chain pre-transaction.

Q: To what extent are companies able to capture all that data, which they used to have to throw away because storage was relatively more expensive, in a useful way and make sense of it?

A: That’s the power of Hadoop. Even five years ago, it wasn’t pragmatic to bring that volume of data under management of traditional data platforms. As Hadoop emerged as an enterprise-viable data platform, you could now bring that data under management for a fraction of the cost of managing and processing it.

Now many new use cases emerge because of the power of Hadoop to be predictive about what our customers are doing. We can have a common view of all our relationships with customers.

Q: Are customers trying to automate existing processes with this technology, or are they finding fundamentally new things they can do as a result of having control over all this data?

A: Both. A simple use case is just mass storage and fast retrieval against a very large data set at probably a tenth of the price point of traditional technologies. Much higher-value uses cases quickly emerged, like being able to have a 360-degree view of all of their data. With traditional customer relationship platforms, there’s one view of the customer in the dot-com or procurement platform. There’s another in the retail system. There’s another in the inventory system.

By leveraging Hadoop, you can bring all of those customer relationship views onto a central golden record about that customer, and be able to create a better customer experience, sell them more, faster and at a better margin.

Q: That sounds like a typical retail situation. Any examples in other industries?

A: Take the oil and gas industry. They never had the ability to understand what was happening on the rig in real time and be able to compare that against the common standards for drilling volumes, patterns and chemical makeups they’re trying to accomplish with each of the crude varieties. Today, they can make a real-time decision based on their libraries of goal sets what they want to do on that rig at the very instant they start pumping that crude and determine if they need to do maintenance later or in real time, to optimize the uptime and the pumping volumes. These companies can see from $100 million to $500 million a year in value with that real-time visibility on all that data all at once.

I could do the same with automotive, healthcare, financial services. Being able to bring all of the data under management from point of origination to point of rest transforms virtually every industry and allows them to evolve into new business models.

Q: How do you contend with customers’ organizational resistance to new business models?

A: This is one of those megatrends we’re betting on: Data becomes the new oil. They realize if they don’t embrace it, they die. Or if they don’t die, they certainly get left behind.

Q: Where do you see the biggest opportunity as a company in open source software?

A: At the core, it’s around continuing to innovate the tech, but also create value and enable these new models, and enable the enterprise to get business value back very quickly. That continues to expand the subscription relationships that we have. Our net expansion rate with customers is over 150 percent last quarter.

Q: How do you plan to move from big losses today to cash-flow positive by the fourth quarter, as you’ve promised?

A: We have beaten on every metric for the last six quarters, including moving cash burn down. We’re very comfortable in the execution against that. In 2015, we doubled our customer base.

Q: So it’s going to be a steady progression to profitability rather than, say, taking your foot off the marketing gas or other expenses? Is there a tipping point as you scale up?

A: We’ve had a very steady progression of growth in the last seven quarters. We’re going to continue to make investments going forward, and that will take us into EBITDA break-even. That’s what’s so great about the subscription model. You continue to create value and generate leverage.

Q: Given the fast-changing market and new competitors continuing to stream in, growth remains important to build a moat, right?

A: Without question. The great news about this space is that data is doubling every year across the enterprise, so our market opportunity continues to expand. At the end of last year, we expanded our strategy to bring the entire data stream from the point of origination of that data to real-time processing and engagement as events and conditions happen.

Q: A lot of people look to Red Hat as the iconic open source company to go public, and it has had its ups and downs, as has Hortonworks. Is Hortonworks trying to be the Red Hat for Hadoop?

A: There are many similarities between their model and ours–certainly open source, subscription-based revenue model. So sure, I’d gladly say we’re the Red Hat of Hadoop.

Q: Is the prospect of an IPO by Cloudera or other cloud software companies a challenge in terms of customer perceptions?

A: I can’t speak to where they are in their IPO objectives. But from our perspective, it’s been very good in customer situations to be a public company. When they start looking at creating and embracing the next-generation data platform, they want the transparency of a company that operates in a public market versus hearsay and rhetoric of a private company.

Beyond Hadoop

Q: To some, Hadoop feels like old news. Is it a challenge to convince new customers who might think Hortonworks is all about Hadoop at a time when other technologies such as Spark, Storm and Flink are driving the market perceptions out there?

A: We’re a huge supporter of Spark. We think it has an incredibly important and valuable place in the data architecture overall. Our architecture of Hadoop on the bottom brings all the data together on a central architecture, and above [allows customers] to simultaneously bring all of those different application types to execute over that central data architecture.

In the case of Spark, it does what it does extraordinarily well. But in certain environments it’s going to be some percentage of the data set and the workloads, and in other environments it’ll play less or more. We want to bring the data to Spark, not just let Spark emerge as another siloed data workload.

Q: How do you keep up with the fast pace of change in open source software that has produced adjacent or competing data technologies?

A: If we don’t have a meaningful and material role as a committer [to open source software projects], then we can’t innovate on the core architecture platform. With the core architecture that we’re enabling, we will participate in the projects and make them enterprise-viable and bring them into the platform, or we don’t have them as part of the platform.

Q: Many customers still view Hadoop as hard to use and expensive to implement. To what extent do you need to deal with that view?

A: There’s a tremendous evolution that’s happened. The first wave of it was becoming a truly enterprise-capable data platform, and after that came better enterprise services. The leg that’s forming now is ease of use, not only for the user to interoperate with the data, get data into it and get applications leveraging it.

That means being able to operate simultaneously in cloud, on-premise and hybrid environments and to have all the tooling that moves those workloads around transparently, with common security and governance models. We’ve been at that aggressively through our partnership with Microsoft the last three years.

Q: Will one distribution of Hadoop, or just a few, prevail going forward, or is fragmentation going to be a way of life for a while?

A: This is a massive market. Just look at the data growth that’s happening in the enterprise. It’s doubling every year, and 80 percent of that data growth is coming from data sets that were falling on the ground for lack of a viable platform. That opportunity opens up for multiple platforms to be successful.

Look back at ERP [enterprise resource planning]; there were certain providers that did well across certain industries or applications. Certain relational databases did certain kinds of things very well versus others. Given the size of this opportunity, that same dynamic will emerge.

Packaged big data applications

Q: Why aren’t there many packaged big data applications? Is that the way it’s going to be for the foreseeable future?

A: It is forming right now, actually. It’s a perfect indicator of the maturity of Hadoop, reaching a critical mass of adoption as part of the core data architecture strategy, that now the modern data applications can start emerging. There’s a big enough market to build great companies on. We saw that start to accelerate about this time last year.

Q: What examples would you point to?

A: Internet of Things applications are leveraging Hadoop. There are analytics platforms that are solely Hadoop-based. When you look at the cloud platforms that are now providing big data services, all of their traditional analytics natively support Hadoop. You see the connected car platform.

Q: So many big data technologies have been spun out into open source by tech companies such as Yahoo, LinkedIn and others that are not traditional software companies. What’s the upshot of that innovation model for either those companies, which are also users of these technologies, or for other customers?

A: It’s a significant trend. The new generation of companies tends to be either companies that have had to solve very hard problems to scale and they did it with their intellectual capital, or they are large companies that are hitting a scale problem, like Facebook, Twitter, Google, Yahoo, LinkedIn, even the federal government such as the National Security Agency. There’s another two dozen that are out there.

They realize the best place to innovate that tech is actually to put it in open source and get a community to form around it, with a core team that’s focused on guiding the roadmap, doing core innovation and taking it through to becoming an enterprise product. This is absolutely becoming the new model of software.

Q: That suggests a whole new structure for the software industry, doesn’t it?

A: Absolutely. It’s as transformative to the software industry as the cloud has been for the traditional hardware and storage industry.

* Disclosure: TheCUBE, owned by the same company as SiliconANGLE.com, will be the paid media partner at Hadoop Summit. This interview was conducted independently and neither Hortonworks nor other summit sponsors have editorial influence on SiliconANGLE content.

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy