UPDATED 14:29 EST / APRIL 22 2016

NEWS

Hadoop World: Big questions about Big Data’s future #HS16Dublin

Is the day of the pure Hadoop startups over? Are the big vendors, and in particular the cloud service providers, taking over Hadoop infrastructure? Will most Big Data production systems be built on infrastructure-as-a-service (IaaS) platforms and use software-as-a-service (SaaS) analysis rather than running in company data centers?

Those are the questions that SiliconAngle Media Co-CEOs John Furrier (@Furrier) and David Vellante (@DVellante) discussed, and on which they based searching questions during interviews on theCUBE from last week’s Hadoop Summit 2016 event in Dublin, Ireland.

The previous week had seen the Strata+Hadoop World event, which grew out of Big Data pioneer Cloudera Inc.’s conference. Hadoop Summit is run by Cloudera competitor and open-source pure-play Hortonworks, Inc. Not surprisingly, major vendors had large presences on the show floors of both, as well as on theCUBE. In addition to several top executives from Hortonworks, theCUBE hosted executives from Hewlett-Packard Enterprise Co., Microsoft, EMC, Cap Gemini S.A., BMC Software Inc., and IBM. The videos are all available here.

Amazon Web Services LLC (AWS) and Google were not there, but AWS was the elephant in the room in many of the discussions. Vellante and Furrier consider the AWS Big Data platform to be too immature for enterprise Big Data use at the moment, but its momentum is tremendous. AWS’s economies of scale mean that its costs are dropping faster than those of even the largest IT shops. Vellante asked how long captive IT can compete before the cost differential alone becomes so compelling that everything moves to the cloud.

Another huge advantage the cloud providers have in the Big Data market is Hadoop’s complexity. Building a Hadoop stack involves corralling 15-20 different open source components that are not designed to work together and do not share a common set of APIs. It requires hiring or developing new sets of highly technical skills. And those components are constantly rapidly evolving or are replaced by something brand new. Three years ago MapReduce was the only solution for querying big data. Today it is hardly mentioned, and a dozen new choices, many supporting SQL, have taken its place. Management is still immature, and Hadoop environments often suffer from run-away cluster syndrome.

Microsoft’s Raghu Ramakrishnan

The alternative is to build these systems on an IaaS Big Data platform and let the provider’s expert team worry about the infrastructure. Raghu Ramakrishnan (@raghurwi), CTO for Data at Microsoft, played the simplicity card early in his interview on theCUBE.

“What we hear constantly from customers is to keep it simple,” he said in one of the best interviews of the conference. He posed a typical use case of a customer who needs to capture two terabytes of data daily, hold one week of data in the active database constantly and then archive the data for historical analysis. The client wants streaming analysis as well as several standard daily and weekly reports, and also wants to make an analysis package available directly to business users and internal analysts for ad hoc queries. On Azure, Ramakrishnan said, this is simple to set up, and the customer is freed from technical complexity to focus on business issues. That, Vellante and Furrier agree, is a strong argument for running on an IaaS platform.

Ramakrishnan also said that the questions about security in the cloud have disappeared. CIOs and business leaders today for the most part recognize that cloud provider security is often stronger than what they can build in-house given their limited resources. So that excuse for keeping workloads in-house has largely evaporated.

Lilliputians in Brobdingnag

The Hadoop vendors and open source community need to move faster to mature the infrastructure, Furrier said. They are doing a disservice to their customers by failing to deliver a unified stack. But given their size, they may not have many options.

The only two pure-play Big Data players who hit what Vellante called the financial “leaderboard” are Palantir Technologies Inc. and Splunk Inc. (see video below.) Wikibon estimates that Cloudera had $200 million in revenues last year and will probably double that this year, while IBM realized $1.2 billion from its analytics business alone and Azure had between $8 billion and $10 billion in annual operating profit.

Furrier sees the pure-play Hadoop companies as second- or third-tier vendors in the enterprise market. “Oracle could run the table. Little moves by big players can change everything” in this market, he said.

However, none of this means that those pure-plays are doing badly. They are all growing hand over fist by every measurement. They consistently say that their clients are moving from trials to production systems, with some already in production with multiple applications. They also are making deals with both the big on-prem and cloud players. They have largely moved beyond the stage of evangelizing for Big Data and today go in to discuss specific projects with business and IT leaders who recognize the value of the technology. The market definitely has room for them. They have valuable knowledge and experience that is rare in the market.

The usual pattern of net technology markets is startups pioneer the technologies, and when the market gets large enough big players move in and start buying those startups for their technology and skill sets. The Big Data market is exploding, and the big players have arrived. The question may be which big player will buy which Big Data pioneer. In the end, the rich, and Vellante likes to say, (usually) get richer.

 Image courtesy Hadoop Summit 2016 Dublin


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU