Can Spark break the compute bottleneck holding back big data apps?


Now that businesses have all of this big data, what exactly are they going to do with it?

At the Spark Summit East 2017 conference in Boston, Massachusetts, the answer seems to be: Lots. But they need tools that make data more available to applications. To that end, mixing streaming, batch and interactive data for Internet of Things use cases, as well as breaking the compute bottleneck, are hot topics. (* Disclosure below.)

Dave Vellante (@dvellante) (pictured, right), co-host of theCUBE, SiliconANGLE Media’s mobile live streaming studio, tried to grok the sentiment around big data. “Big data was the hottest topic in the world three or four years ago, and now it’s sort of waned as a buzzword,” he said. “Is big data done?” he asked co-host George Gilbert (@ggilbert41) (left).

Gilbert replied that while big data is not finished, it is now the use cases that are interesting. He said that the problems of storing massive data are largely solved, so we are entering a new phase where businesses are impatient to see data perform. And with the last of the kinks in storing data working themselves out, developers can devote more attention to use cases and applications, Gilbert furthered.

“Taking more workloads off the data warehouse — there are limitations there that will get solved by putting sort of MPP [Massively Parallel Processing] SQL back-ends on it,” he said.

He explained that the real bullseye now is: “Make it easier for data scientists to use this data to create predictive models for applications.”

Spark takes a shot at the compute bottleneck

Vellante joked that the “ROI” in big data actually means “Reduction On Investment.” He argued that while the technology has lowered the cost of storing data, it has not yet lived up to its promise of affecting outcomes and reeling in profits.

Contributing to this conundrum is the compute bottleneck, Gilbert replied, noting that the Spark tool SnappyData, which spun out of Pivotal, tackles this.

“You don’t want to necessarily be able to query and analyze petabytes at once. It will take too long, sort of like munging through data of that size on Hadoop took too long,” he said. Instead, with SnappyData, “you can do things that approximate the answer and get it much faster. We’re going to see more tricks like that.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of the Spark Summit East 2017 Boston(*Disclosure: TheCUBE is a media partner at the conference. Neither Databricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo by SiliconANGLE