UPDATED 14:59 EDT / JANUARY 24 2020

CLOUD

Google’s Dataproc service gets GPUs and management automation features

Dataproc is an analytics service from Google LLC that allows enterprises to spin up managed Spark and Hadoop big-data environments in the cloud. Today, the search giant updated the service with four features that promise to provide a boost for machine learning projects as well as simplify day-do-day maintenance.

Companies using Dataproc for machine learning can now add graphics processing units to their Hadoop and Spark clusters.

GPUs run artificial intelligence models many times faster than a standard central processing unit, which should translate into a performance boost for users. Google provides eight Nvidia Corp. data centers GPUs to choose from in its public cloud including the chipmaker’s top-end Tesla V100 model.

Also new to Dataproc is autoscaling. The service can now automatically dial the size of a cluster up or down depending on how many hardware resources a workload requires at a given moment.

The autoscaling mechanism comes handy in several situations, according to Google. It makes it easier to deal with abrupt usage spikes such as an increase in the volume of data that an analytics application sends to a Spark deployment. Meanwhile, an engineer looking to scale up an algorithm they’ve successfully deployed on a small test cluster can do so without having to manually provision the extra infrastructure they need. 

“The cluster will simply grow to the size needed to process the full dataset and then scale itself back down when the processing is completed,” explained Chris Crosbie, a director of product management with Google’s cloud analytics group. “You don’t need to waste time trying to move over to a larger server environment or figure out how to migrate your work.”

Google used the occasion to add a couple other features meant to help companies operate their Dataproc clusters more efficiently. The first addition, a new configuration option, makes it possible to set a limit on how long a cluster can run idly and have Dataproc automatically delete it if the threshold is reached. The other new feature lets companies automate certain tasks in SparkR, an extension for Spark that provides the ability to run R programs on the framework. 

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU