UPDATED 04:21 EDT / JUNE 26 2014

Google launches Cloud Dataflow pipeline for batch and stream processing

There was plenty of excitement at Google I/O yesterday, and not just because of the brief interruption by a protester calling on Google to “develop a conscience”. While much of the spotlight fell on Android, Google announced a number of new services on its cloud front, including something called Cloud Dataflow that makes it easier to create data-processing pipelines combining both stream and batch-processing capabilities.

Dataflow is based on several earlier Google projects, including its FlumeJava data-pipeline tool and MillWheel stream-processing technology. Its been designed to enable analysis of live data, allowing users to view trends and receive alerts of events in real-time. The service is primarily aimed at developers who need to stream real-time data.

It’s possible to run your own Hadoop cluster atop Google Compute Engine of course, but Google Cloud platform marketing head Brian Goldfarb says Dataflow has been built to overcome latency and complexity limitations that are inherent in MapReduce.

“[MapReduce] was good for simple jobs, but when you needed to run pipelines it wasn’t so easy,” he said. “Internally, we don’t use it anymore because we don’t think it’s the right solution for the overwhelming number of situations.”

Chiefly, Dataflow has been designed as an easy-to-use tool that’s capable of handling both complex workflows and very large datasets. Streaming and batch jobs both employ the same code, while Dataflow manages the infrastructure and optimizes the data pipeline. The service is compatible with multiple programming languages, though the first SDK is designed for Java.

According to Google, the main focus is helping its users to get “actionable insights from your data while lowering operational costs without the hassles of deploying, maintaining or scaling infrastructure.”

Real-time anomaly detection was cited as one primary use for Dataflow. A live demo involved analyzing streamed World Cup data that was compared with historical data in an attempt to spot anomalies. Users can either investigate events themselves using Google BigQuery, or set Dataflow up so it automatically takes actions when it detects something.

The service is important for Google’s cloud efforts because Amazon has had its own data pipeline service for some time already. In addition, AWS also has its Kinesis service that specializes in real-time data processing – Dataflow is, in essence, Google’s combined answer to both.

Image credit: Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Google launches Cloud Dataflow pipeline for batch and stream processing

Image credit: Google

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026

Google launches Cloud Dataflow pipeline for batch and stream processing

Image credit: Google

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026

Cookies