How Uber cruised Apache data frameworks and arrived at Flink


It’s hard to overstate how heavily ride-sharing apps rely on fresh data streams to do business. So what makes a streaming data framework good enough for Uber?

Uber has run the gamut from Apache Spark to Apache Samza to Flink, most recently, said Chinmay Soman (pictured), staff software engineer at Uber Technologies Inc.

“We observed that the Spark stream processing is actually more resource-intensive than some of the other technologies we benchmarked,” he said.

This intensive use of compute power and memory spurred Uber to look for a more efficient solution, Soman told George Gilbert (@ggilbert41), host of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during the Flink Forward event in San Francisco, California, last week.  (*Disclosure below.)

At his previous LinkedIn gig, Soman gained experience with Apache Samza, so he welcomed Uber’s transition to this framework. However, “We hit a scale where Samza, we felt, was lacking,” he said. The reason is that Samza frequently ties into Apache Kafka, a tool for building real-time data pipelines and streaming apps.

With multi-stage pipelines where one stage processes data and sends it to another stage, all intermediate stages go back to Kafka, which is expensive and complex, Soman explained.

“So if you want to do a lot of these use cases, you actually end up creating a lot of Kafka topics and the I/O [input/output] overhead on a cluster shoots up exponentially,” he said. “So that’s why we are looking at Flink. In Flink, you can actually build a multi-stage pipeline and have in-memory cues instead of writing back to Kafka, so it is fast and […] you don’t have to create multiple topics per pipeline.”

Fast recovery from backpressure is another perk of Flink, as is local offloading of data from memory to disk aided by a RocksDB cache, an embeddable persistent key-value store for fast storage, Soman said.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Flink Forward 2017. (*Disclosure: TheCUBE is a paid media partner at Flink Forward. The conference sponsor, data Artisans, does not have editorial oversight of content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE