Apache Flink: Simplifying streaming data for the enterprise


Apache has been leading the way for many big data trends, with data processing tools Apache Hadoop and Spark having helped the enterprise turn data into capital. As connected devices and machine learning are becoming more pervasive, the key to success is real-time data streaming from bound or unbound sources.

Apache Flink manages this type high-level stream processing, and while companies like Uber, Netflix and Alibaba have adopted the technology, minimizing the tool’s complexity remains an area of focus.

“What we are working on right now in Flink is definitely extending the support in this area for the ability to keep a much larger state in this application. … Handling larger state, and then we’re looking into what are the APIs that users actually want in this area,” said Stephan Ewen (pictured), chief technical officer at data Artisans GmbH, the company supporting the Flink open-source project.

Ewen spoke with George Gilbert (@ggilbert41), host of the theCUBE, SiliconANGLE Media’s mobile live streaming studio, on the ground at Flink Forward 2017 in San Francisco last week about how Flink will become more flexible and easy to use. (*Disclosure below.)

Ease of use coming to the enterprise

Supporting Flink is data Artisans GmbH, which is preparing to help enterprises adopt the  technology. Early adopters of the streaming framework were large internet companies that had to modify the technology to move it to production with the added burden of integrating with other internal applications to have a seamless end-to-end system.

Companies that are dealing with big data are continuously receiving data produced from sensors, user devices from the server log, and other systems, which Ewen said is actually a stream, and processing this gives the user the abstraction of a stream. He believes Flink eliminates parts of the pipeline, such as periodic ingestion and grooming into individual finite datasets and the periodic processing of data.

“You cannot get a paradigm that unifies the processing of real-time data. This by itself is an interesting development that many have recognized. That’s why they are excited about stream processing because it helps reduce a lot of that complexity,” Ewen said.

Processing and analyzing data in real time to make important business decisions or to get better outcomes is what the enterprise is trying to achieve in its digital transformation. Flink reduces the latency of accessing the information, and Ewen said the conference provided more compelling examples of how Flink naturally pairs with online applications and analytical tools to scale down much of the complexity.

As enterprise users converge open-source technology with resource management systems, Ewen noted that most of the technology needs to be easy to bring into production and offer the ability to prepare more functionality bundled out of the box. As the open-source community partners with companies such as data Artisans, they can make the technology enterprise ready, and broaden adoption, he added.

So, the elements that are simplifying Flink for users are stream processing that creates an easier paradigm, with fewer moving parts, lower latency and developments into all the technologies.

“Flink has released a lot of stream processing APIs. In the last release, we’re adding even more low-level APIs that [allow you] to just think about the basic ingredients as events, state, time and snapshots. There is more control and more flexibility by taking directly the basic building blocks rather than more high level abstractions. I think you can expect more evolution on that layer,” said Ewen.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Flink Forward 2017. (*Disclosure: TheCUBE is a paid media partner at Flink Forward. The conference sponsor, data Artisans, does not have editorial oversight of content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE