The cloud’s role in Big Data : Rethinking modern IT

cloudsWhat role does the cloud play in Big Data deployments today and what will the future of Big Data in the cloud look like? To understand both, we should first look at the state of the Big Data market today—and even the definition of Big Data itself.

The Big Data market is growing and its growth shows no signs of slowing down. In 2013, the Big Data market reached $18.6 billion according to what Jeff Kelly, Principal Research Contributor at Wikibon, wrote in his Big Data Vendor Review and Market Forecast 2013-2017 report that was released February 10, 2014. “That’s a 58 percent growth rate over the previous year,” Kelly told theCUBE cohosts John Furrier and David Vellante on the opening day of BigDataSV 2014 last month.

In his report, Kelly wrote that Wikibon’s definition of Big Data contains two equally important parts:

“First, from a technology perspective, Wikibon defines Big Data as those data sets whose size, type, and speed-of-creation make them impractical to process and analyze with traditional database technologies and related tools in a cost- or time-effective way.

“Second, Wikibon believes Big Data requires practitioners to embrace an exploratory and experimental mindset regarding data and analytics, one that replaces gut instinct with data-driven decision-making, and exchanges stubbornness for a willingness to question long-held assumptions. Projects whose processes are informed by this mindset meet Wikibon’s definition of Big Data, even in cases where some of the tools and technology involved may not.

“Based on the above definition, Wikibon includes the following products and services under the umbrella of Big Data: Cloud-based Big Data services including infrastructure, platform and software delivers [sic] as a service.”

Kelly explains how several important growth drivers fueled the Big Data market in 2013—including how products and services related to Big Data continued to mature from a features perspective in 2013, further spurring adoption. “These products and services include the continued evolution of cloud-based Big Data services for large-scale analytics and application development,” Kelly wrote.

Vendors such as Amazon Web Services, Hortonworks and Microsoft have been playing their part to help enterprises bring Big Data into the cloud. As Kelly’s report cites, Amazon Web Services released Kinesis, a streaming data framework for real-time applications, and RedShift, a large-scale data warehousing service. Hortonworks and Microsoft released HDInsights, which delivers Hortonworks Data Platform (HDP) on Microsoft’s Azure cloud. “It is still very early days for Big Data in the cloud,” Kelly wrote, “…with the vast majority of current use cases focused on test-and-development, not production deployments.”


Cloud and Big Data’s love fest


At BigDataSV 2014, Vellante and Kelly discussed the innovations that are happening in software. In the NoSQL space, Vellante said that Pivotal has announced $300 million in revenue. Kelly explained that most of that revenue is coming from Pivotal’s Greenplum line and its GemFire line. “Pivotal has a grand vision of building up this three-layered platform: infrastructure, path and the data fabric,” he said. “They have a long way before they make that enterprise-ready. And they have to do a lot of work with partners—mostly cloud partners—to realize their vision.”

Kelly also said that “last year was an important one for Hadoop specifically and Big Data generally.” Kelly said that the most important thing last year was that the platform continued to mature towards a multi-application framework and that these technologies are going to find their way into the enterprise eventually.

“You talked about cloud and Big Data continuing their love fest,” Vellante observed to Kelly. “Amazon has announced things like Kinesis so it seems like it’s a place for a lot of people to park their Big Data projects. What do you see going on with Amazon and Big Data?”

Kelly replied that cloud is a good place for a lot of this test and dev. “You can go to Amazon and spin up a Hadoop cluster,” he said. “You can use Kinesis for streaming data; you have a pretty comprehensive portfolio of services around different types of data workloads. But you are not seeing a lot of enterprise workloads being moved to the cloud—AWS specifically. That can be due to a number of reasons: data integration challenges, security concerns, internal compliance policies.”


Alteryx on cloud’s role in Big Data


Also at BigDataSV 2014, Furrier sat down with George Mathew, COO of Alteryx, asking him his thoughts on the role of cloud in Big Data. Furrier said he wanted to get Mathew’s perspective on the data cloud—not data cloud as in, putting data in the cloud, but rather, the role of cloud, the role of DevOps, and that intersection. Mathew replied that cloud computing is a top priority for Alteryx—and a critical component to the adoption of Big Data.

AlteryxMathew explained that access to infrastructure resources on a pay-as-you-go basis “enables organizations to crunch large volumes of information without making a heavy upfront capital investment in infrastructure.” This elasticity, Mathew said, makes it possible for users to efficiently execute scale-out workloads that could not be as easily accommodated by an on-premise deployment.

“One of the real proponents of the cloud is now the fact that there’s now an ability for business analysts, business users and the business line to make [an] impact on how decisions are done faster without the infrastructure underpinnings that were needed inside the four walls of the organization,” he said. “So, the decision makers and the buyers are becoming the chief analytics officers, the chief marketing officer, less so the chief information officer.”

The push to reduce operational dependence on the IT organization is driven in large part, Mathew said, by the growing need for self-service analytics among employees struggling to keep up with changing business requirements. He said that data is a strategic asset that must be treated as such.


Syncsort’s Big Data market forecast


Josh Rogers, President of Syncsort, joined theCUBE cohosts Furrier and Vellante at BigDataSV 2014. In regards to the future (as it pertains to the unbundling of software and the mainframe), Rogers told Furrier and Vellante during their interview that the next step will be the production of industry-specific applications meant to drive the compute platform. “Those will be offered as SaaS models or on premise,” he predicted. Syncsort-DI-Tagline-Top

But Rogers said he doesn’t necessarily believe there will be commoditization across the entire application ecosystem. “They are doing things that are hard and will stay consistently hard,” he said. “When people move Hadoop into their Enterprise, they have to figure out how to plug it into their mainframe. We believe that is an area we can continually strengthen. We call it ‘Big Iron to Big Data.’”


Wikibon on cloud’s role in Big Data


In regards to what the future holds for cloud’s role in Big Data, Wikibon’s Kelly acknowledged that things are starting to move into the right direction but, at BigDataSV 2014, warned that it’s “going to take some time before the cloud becomes the place where you are bringing your enterprise workloads.”

Kelly wrote in his report that he thinks the future for Big Data deployments is “clearly hybrid, with cloud and on-premise deployments living (hopefully) in harmony.”

As the market matures through 2017 and beyond, Wikibon expects Big Data applications and cloud-based services to play an increasingly important role, as Kelly states in his report. “As the underlying infrastructure solidifies,” he wrote, “Wikibon believes mainstream and late-adopters will look to service providers to deliver polished applications and services that sit on top the hardened Big Data infrastructure and target specific, high-value business challenges.” .


Watch the entire theCUBE interview with George Mathew, COO of Alteryx, at BigDataSV 2014: