Coverage from SiliconANGLE's livestreaming video studio

UPDATED 14:51 EDT / FEBRUARY 09 2017

BIG DATA

Data on the move: getting the most value out of the numbers | #SparkSummit

While most organizations understand the inherent value of big data — the more data, the better — there can be issues around managing and moving that data. The true value comes from the analysis of the data, not from static data itself. Many are leaning on Apache Spark (an open-source cluster computing framework) to reduce data management complexity, according to Bryan Duxbury (pictured), vice president of engineering at StreamSets Inc.

“We’re seeing a lot of interest in the Spark arena. People want to add their complex event processing or their aggregation and analysis, like Spark SQL [Apache Spark’s module for working with structured data],” Duxbury said.

He explained that these customers are looking for continuous workloads and moving away from batch. Customers are seeking analytics occurring almost simultaneously at the time of ingest, he said. To help with that, StreamSets is building integration via their Spark processor, making it possible to do the ingest and capture real-time analytics along the way.

Duxbury recently joined Dave Vellante (@dvellante) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during Spark Summit East 2017 Boston, held in Boston, MA. (*Disclosure below.)

The topic of discussion included how data movement software maximizes the value of data, including the use of Spark, and why Duxbury believes it’s better for organizations to buy than to build solutions.

Building a data pipeline without code

While many companies will build their own internal tools to move their data, and make it a science project of sorts, there’s better ways to allocate time and resources. “It’s not their job to build a world-class data movement tool; it’s their job to make the data valuable,” said Duxbury.

One of the advantages of StreamSets’ Data Collector software, according to Duxbury, is it allows users to build a data pipeline without code; it’s a graphical user interface (GUI). The software is heavy-duty and open source, made to integrate easily with other products, including Apache Kafka (an open-source stream processing platform) and Spark.

StreamSets’ Data Collector deploys every way imaginable, on-prem, in the cloud or on the edge of clusters. It focuses on the initial movement and ingestion of the data and then lets the analytical tools, such as Spark, take over and provide the business value to the data. For large scale deployments, the company offers StreamSets Dataflow Performance Manager as a way to manage the dozens or hundreds of Data Collectors including a live data map of the data flow topologies and enforcement of Data SLAs.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of the Spark Summit East 2017 Boston. (*Disclosure: TheCUBE is a media partner at the conference. Neither Databricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo by SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.