UPDATED 17:00 EDT / APRIL 16 2018

BIG DATA

Apache Flink helps Netflix process 3 trillion events every day

The processing demands for a video content service like Netflix Inc. are almost unimaginable. A consumer audience of over 109 million subscribers enjoys 125 million hours of TV and movie content via the online subscriber service every single day.

That places great demand on the company’s data ingestion pipeline and stream processing engines, which must handle 3 trillion daily events involving 12 petabytes of data. One of the platforms used by Netflix is Apache Flink, an open-source tool for distributed stream and batch data processing.

The key element of Flink for Netflix is its ability to target stateful applications, including support for time stamping of events such as rolling back and replaying a video, a critical element in the video streaming model. “In terms of state management, I think that’s where Apache Flink really shines compared to other streaming engines,” said Steven Wu (pictured), software platform engineer at Netflix. “Flink is more mature than the other streaming engines.”

Wu spoke with George Gilbert (@ggilbert41), host of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, at the Flink Forward event in San Francisco, California. They discussed the benefits of using Flink for streaming data and its potential for SQL applications. (* Disclosure below.)

Flink emerged from the Stratosphere research project conducted at the Technical University of Berlin nearly a decade ago. The company supporting Flink is data Artisans GmbH, a firm founded four years ago by the streaming platform’s creators.

Uber is a Flink user too

The company provides dA Platform with Flink, an enterprise offering for enabling high-throughput and low-latency solutions to support event-driven needs. Netflix is not the only large-scale Flink user. Uber Technologies Inc. uses the platform to process more than a petabyte of data per day as part of its global ride-sharing operation.

About two years ago, the Flink community initiated a project to incorporate a structured query language or SQL interface for data analysis on the streaming engine. “We haven’t used Flink SQL yet, but it’s in our roadmap,” Wu said. “The low-level data stream API can give you the full feature set of everything. High-level SQL is much easier to use, but the feature set is more limited, so that’s a trade-off.”

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of the Flink Forward 2018 event. (* Disclosure: TheCUBE is a paid media partner for Flink Forward 2018. Neither data Artisans GmbH, the event sponsor, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU