Google brings Big Data to the masses with Cloud Dataflow beta
Google has finally taken the wraps off its Cloud Dataflow offering, which is designed to allow developers lacking in Hadoop skills to build sophisticated analytic “pipelines” capable of processing extremely large datasets.
Cloud Dataflow was first introduced last summer, with Google touting it as a next-generation service for building systems that can ingest, transform, normalize, and analyze huge amounts of data, well into the exabyte range. Google had previously been accepting applications for a private alpha of the service, but now anyone can try the data processing system in beta mode. The software is built on Hadoop and Spark, but also relies on Google’s Flume Java and MillWheel technologies to move data within the hosted platform, but there’s not a trace of MapReduce to be seen.
As Google explained last year, the idea behind Dataflow is a simple one: By hiding the complexity of Hadoop behind a bunch of straightforward APIs and SDKs, and hosting everything in Google’s cloud, it enables just about anyone to make use of Big Data analytics, something that’s been the private domain of data scientists up until now.
“Today, nothing stands between you and the satisfaction of seeing your processing logic, applied in streaming or batch mode (your choice), via a fully managed processing service,” wrote Google product manager William Vambenepe in a blog post. “Just write a program, submit it and Cloud Dataflow will do the rest. No clusters to manage, Cloud Dataflow will start the needed resources, autoscale them (within the bounds you choose) and terminate them as soon as the work is done.”
As stated above, Dataflow relies on Google’s Compute Engine cloud service to provide the raw computing power, while Google Cloud Storage and BigQuery are employed to store and access the data. Basically, it makes use of several of the main components found in Google’s Cloud Platform, which competes with Amazon Web Services and Microsoft Azure.
Besides the Dataflow news, Google simultaneously announced an update to its BigQuery service, which provides a Structured Query Language (SQL) interface to help developers delve into large sets of unstructured data. SQL is one of the most common programing languages, used by almost all traditional relational databases, which means it’s well understood by the vast majority of database managers.
With the update, Google has enhanced BigQuery so it can now ingest up to 100,000 rows per second per table. In addition, Google is at last making the service available to European customers. BigQuery data can now be stored in Google’s European-based data centers, which means companies there will now be able to adhere to the EU’s strict data sovereignty regulations. Finally, Google has added new row-level permissions to BigQuery, which can be used to limit data accessibility based on user credentials. This means users can protect sensitive data such as people’s names and addresses while alllowing access to other details, for example customer’s anonymized purchasing history.
Image credit: PublicDomainPictures via Pixabay.com
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU