UPDATED 14:59 EDT / JANUARY 24 2020

CLOUD

Google’s Dataproc service gets GPUs and management automation features

Dataproc is an analytics service from Google LLC that allows enterprises to spin up managed Spark and Hadoop big-data environments in the cloud. Today, the search giant updated the service with four features that promise to provide a boost for machine learning projects as well as simplify day-do-day maintenance.

Companies using Dataproc for machine learning can now add graphics processing units to their Hadoop and Spark clusters.

GPUs run artificial intelligence models many times faster than a standard central processing unit, which should translate into a performance boost for users. Google provides eight Nvidia Corp. data centers GPUs to choose from in its public cloud including the chipmaker’s top-end Tesla V100 model.

Also new to Dataproc is autoscaling. The service can now automatically dial the size of a cluster up or down depending on how many hardware resources a workload requires at a given moment.

The autoscaling mechanism comes handy in several situations, according to Google. It makes it easier to deal with abrupt usage spikes such as an increase in the volume of data that an analytics application sends to a Spark deployment. Meanwhile, an engineer looking to scale up an algorithm they’ve successfully deployed on a small test cluster can do so without having to manually provision the extra infrastructure they need. 

“The cluster will simply grow to the size needed to process the full dataset and then scale itself back down when the processing is completed,” explained Chris Crosbie, a director of product management with Google’s cloud analytics group. “You don’t need to waste time trying to move over to a larger server environment or figure out how to migrate your work.”

Google used the occasion to add a couple other features meant to help companies operate their Dataproc clusters more efficiently. The first addition, a new configuration option, makes it possible to set a limit on how long a cluster can run idly and have Dataproc automatically delete it if the threshold is reached. The other new feature lets companies automate certain tasks in SparkR, an extension for Spark that provides the ability to run R programs on the framework. 

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.