UPDATED 07:53 EDT / APRIL 17 2015

Google brings Big Data to the masses with Cloud Dataflow beta

sheep-17482_640Google has finally taken the wraps off its Cloud Dataflow offering, which is designed to allow developers lacking in Hadoop skills to build sophisticated analytic “pipelines” capable of processing extremely large datasets.

Cloud Dataflow was first introduced last summer, with Google touting it as a next-generation service for building systems that can ingest, transform, normalize, and analyze huge amounts of data, well into the exabyte range. Google had previously been accepting applications for a private alpha of the service, but now anyone can try the data processing system in beta mode. The software is built on Hadoop and Spark, but also relies on Google’s Flume Java and MillWheel technologies to move data within the hosted platform, but there’s not a trace of MapReduce to be seen.

As Google explained last year, the idea behind Dataflow is a simple one: By hiding the complexity of Hadoop behind a bunch of straightforward APIs and SDKs, and hosting everything in Google’s cloud, it enables just about anyone to make use of Big Data analytics, something that’s been the private domain of data scientists up until now.

“Today, nothing stands between you and the satisfaction of seeing your processing logic, applied in streaming or batch mode (your choice), via a fully managed processing service,” wrote Google product manager William Vambenepe in a blog post. “Just write a program, submit it and Cloud Dataflow will do the rest. No clusters to manage, Cloud Dataflow will start the needed resources, autoscale them (within the bounds you choose) and terminate them as soon as the work is done.”

As stated above, Dataflow relies on Google’s Compute Engine cloud service to provide the raw computing power, while Google Cloud Storage and BigQuery are employed to store and access the data. Basically, it makes use of several of the main components found in Google’s Cloud Platform, which competes with Amazon Web Services and Microsoft Azure.

Besides the Dataflow news, Google simultaneously announced an update to its BigQuery service, which provides a Structured Query Language (SQL) interface to help developers delve into large sets of unstructured data. SQL is one of the most common programing languages, used by almost all traditional relational databases, which means it’s well understood by the vast majority of database managers.

With the update, Google has enhanced BigQuery so it can now ingest up to 100,000 rows per second per table. In addition, Google is at last making the service available to European customers. BigQuery data can now be stored in Google’s European-based data centers, which means companies there will now be able to adhere to the EU’s strict data sovereignty regulations. Finally, Google has added new row-level permissions to BigQuery, which can be used to limit data accessibility based on user credentials. This means users can protect sensitive data such as people’s names and addresses while alllowing access to other details, for example customer’s anonymized purchasing history.

Image credit: PublicDomainPictures via Pixabay.com

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.