UPDATED 12:00 EDT / JUNE 23 2020

BIG DATA

Google Cloud scales up Kubernetes Engine to 15,000 nodes for Bayer Crop Science

A new case study published today by Google Cloud shows how rapidly the tech giant’s cloud unit, in particular its Google Kubernetes Engine, can be rapidly scaled up to meet intense data processing needs.

The case involves Bayer Crop Science data scientists requiring huge amounts of data processing power to process genotype data to decide which products — in this case seeds — to make available to customers. The processing of the data had previously been done premises with BCS deciding to make the switch to Google Cloud.

Working with the BCS data scientists, Google was able to push GKE to 15,000 nodes — three times the level of nodes supported by open-source Kubernetes — to provide improved data processing ability. The results for BCS, at least, were remarkable, allowing the company to process its entire data set, which used to take two weeks, in four days.

In the case study, Maciek Różacki, a product manager at Google Cloud, explained that scalability is a core requirement for the products. With more enterprises adopting GKE, the company has been working to push the limits of a GKE cluster way beyond the supported limits.

The extra scale is said to be advantageous to companies running large, internet-scale services. The benefits including simplifying infrastructure management and, for batch processing particularly, shortening data processing time, as well as absorbing large spikes in resource demand.

“This 15,000-node achievement is all the more significant when you consider that the scalability of an IT system is much more than just how many nodes it supports,” Różacki said. “A scalable system needs to be able to use a significant amount of resources and still serve its purpose.”

To make it possible for GKE users to run workloads that need more than 5,000 nodes in one cluster, Różacki explained, Google engaged a group of design partners into a closed early access program.

“With 15,000 nodes at its disposal, BCS also saves a lot of time,” Różacki added. “With 240,000 CPUs across 15,000 nodes, BCS can process ~15,000,000,000 genotypes an hour. That gives BCS the flexibility to make model revisions and quickly reprocess the entire data backlog or quickly add inference based on new data sets, so their data scientists can continue to work rather than waiting for batch jobs to finish.”

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU