UPDATED 02:52 EST / JANUARY 22 2016

NEWS

Google pitches Cloud Dataflow to the Apache Software Foundation

Google is making its first major open-source move of the year by offering up its Dataflow technology to the Apache Software Foundation (ASF) as an incubator project.

Google is hoping to spur more collaborative efforts and governance around its technology, which is used for writing processing for large-scale data processing jobs. The end goal is to enable the development of data pipelines which can be ported across multiple execution engines, both in the cloud and on-premises. As such, Google is hoping that its Dataflow programming model and its Dataflow Software Development Kit (SDK) will be bundled together as a single Apache Incubator project.

The search giant has gathered the support of several big name name companies behind its bid, including Cloudera Inc., Data Artisans GmbH, PayPayl Holdings Inc. and Talend.

Ultimately Google is hoping that Dataflow will be accepted as a Top Level project under the ASF, but in order to get there it must first go through the mandatory incubation stage, during which issues related to its future direction and licensing will be tackled.

“We believe this proposal is a step towards the ability to define one data pipeline for multiple processing needs, without tradeoffs, which can be run in a number of runtimes, on-premise, in the cloud, or locally,” wrote Google Software Engineer Frances Perry and Product Manager James Malone in a January 20 blog post.

Google’s Cloud Dataflow service, which is based on the technology, will not be affected by the proposal to open-source the programming model, SDK and other components, they added.

Google built Dataflow as a means of helping developers write applications and data pipelines that can run on multiple Big Data engines, including Apache Spark and Apache Flink, as well as its own Cloud Dataflow. The technology consists of a number of SDKs that are used to define data processing jobs in batch mode and in streaming for large data sets.

The company open-sourced the Dataflow SDK back in December 2014, in order to boost development activity around the technology and quell fears that it might help to lock users into Google’s infrastructure. Since that time, Google says the Dataflow SDK has been used to create a variety of “pluggable runners” that connect data pipelines to Spark, Flink and others.

Perry and Malone pointed to a number of benefits if the ASF accepts Dataflow as an incubator project. The main one, they said, is that developers would be able to focus on their applications and data pipelines instead of worrying too much about which Big Data engine it’s compatible with.

Google previously said Dataflow was a combination of several technologies it’s been using internally for years, including FlumeJava, a batch processing engine, MillWheel, a stream processing engine, and MapReduce.

Photo Credit: TMarieShines via Compfight cc

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Google pitches Cloud Dataflow to the Apache Software Foundation

Photo Credit: TMarieShines via Compfight cc

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

SC25

Refresh North America 2025

Google pitches Cloud Dataflow to the Apache Software Foundation

Photo Credit: TMarieShines via Compfight cc

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

SC25

Refresh North America 2025

Cookies