UPDATED 09:08 EDT / DECEMBER 19 2014

Google opens cloud-based Hadoop alternative to developers with free SDK

blue sky and clouds as worlwide mapGoogle on Thursday released a software development kit (SDK) for its cloud-based data crunching engine in an effort to the development of analytic applications against the service. The launch comes seven months after the search giant first unveiled its ambitious plans to steal Hadoop’s thunder.

Currently available in limited access, Cloud Dataflow is an evolution of existing analytic technologies in the open-source ecosystem that aims to eliminate most of the hassle required to process large quantities of unstructured information coming from multiple sources. The platform accomplishes that with a unified programming interface that makes it possible to handle static batches of historical data and real-time streams under the same coding layer.

Cloud Dataflow abstracts the nuances of different information types into consistent “PCollections” that can pull updates from a specified source or perform any number of other tasks. Developers can manipulate these adaptive datasets using a built-in library of operations covering many of the functions available for Hadoop and then some.

That syntax is executed in a way that makes it possible to efficiently reuse code for multiple workloads, which saves time and effort while enabling the underlying runtime to collapse repeating actions for faster execution. Cloud Dataflow also incorporates performance analysis metrics, system monitoring functionality and other operational capabilities of Google’s infrastructure-as-a-service to automate many of the management details.

The search giant’s vision for the platform contrasts sharply with Hadoop, which is a collection of independently-maintained and often overlapping open-source projects. Deploying Hadoop has become more feasible in recent years thanks to the emergence of on-demand hosting options, but processing real-time and historical data on the same cluster can still require stringing together multiple tools, each with its own architecture and syntax. As a result, many IT organizations have struggled to fulfill the full potential of Hadoop.

Cloud Dataflow aims to make that functionality available for everyday developers with a simple interface and a utility pricing scheme. The new SDK, which is available under an open-source license, makes it possible to harness the service for next-generation analytic applications and import data from existing Hadoop environments.

The platform only works with Java for now, but Google plans to add support for Python and other programming languages. The search giant also allows developers to extend the native syntax of Cloud Dataflow themselves with custom operators for automating complex transformations.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU