UPDATED 11:59 EDT / FEBRUARY 23 2016

NEWS

How Spark use-cases are transforming data | #SparkSummit

How is Apache Spark, an engine for Big Data processing, being used in the lives of real-life customers? Day Two of Spark Summit East 2016 kicked off with a keynote session explaining Spark Streaming by Databricks, Inc. (which enables interactive and analytical applications across both streaming and historical data), along with fascinating use-cases of Spark from Capital One Financial Corp. and Synchronoss Technologies, Inc.

Using Databricks’ Spark Streaming

“According to our survey we did last summer, actually half of the Spark users think Spark Streaming is the most important component of Spark,” said Reynold Xin, cofounder and chief architect of Spark at Databricks.

Databricks has learned a lot about streaming over the years, and it has learned that streaming computations don’t run in isolation. Training new models with machine learning to continuously update streams and detect new anomalies is very important. Applications need to be continuous and run in real time, constantly updating their knowledge.

Spark wants to simplify bringing streaming computation to the masses in its 2.0 version. This is Spark’s Structure Streaming goal: It’s a simple way to help people do streaming without having to reason about streaming, as Structured Streaming has a high-level streaming API built on a Spark SQL engine.

“Really, what we want is if you need to build ever something … complicated … it should be a lot simpler. A lot of things should just work out of the box without you having to worry about all the nitty-gritty details about serving,” said Xin.

Leveraging Spark for Capital One

Capital One has over 72 million customers that use its products each year. “We quickly scale up to petabytes of data,” said Chris D’Agostino, VP of digital and US card servicing technology and engineering at Capital One. “So for us it is all about the data. We are a data-driven company,” he said.

The importance of the data that’s real time combined with data in aggregate, and understanding that intersection, is where the Spark platform has been really helpful for Captial One. “We’ve been able to combine large data sets for historical information in both sequel and graph format and been able to query those systems, apply them to the models we build, and execute those models to make scoring decisions,” said D’Agostino.

Pipeline processing and profiling with Synchronoss

“Synchronoss offers personal cloud and activation platforms for large enterprises and communications providers around the globe, “ explained Suren Nathan, senior director of Big Data platforms and analytics framework at Synchronoss. Its solution helps providers connect to customers. According to Nathan, Synchronoss helps “implement scalable Big Data technology platform to help deliver consistent analytives.”

“In order to make any meaningful use of the data, it has to be processed,” said Nathan. There is a pipeline process in place at Synchronoss, and it’s messy and complicated. “A couple of years ago, mercifully, out came Spark. Spark provided a promise. It had everything required for pipelining,” said Nathan. They made ETLs closer to the data, batched and streamed workloads, simplified design, etc.

Data profiling is a key aspect of Big Data as well. However, it’s a time-consuming task. Big Data can’t be loaded into a database unless the data fields are known. “All of those dependencies were causing a huge headache,” said Nathan. “Spark came to the rescue.” Synchronoss was able to build large data sets that it can split up by field and use all sorts of metrics to transform the data.

Watch the full video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of Spark Summit East 2016.

Photo by Spark Summit East

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU