UPDATED 16:48 EDT / AUGUST 10 2021

CLOUD

Cloudera and Nvidia partnership looks to streamline data science and accelerated workloads

The recent partnership between Cloudera Inc. and Nvidia Corp., announced in April, is worth watching as it represents an important step in creating architectures of the future for streamlining data science and machine learning pipelines.

This DataOps trend, outlined by research firm Gartner as a key development for 2021, is being driven by growing enterprise IT frustration over the challenges of building data workflows today. It’s time consuming to train and iterate on models, it’s costly because large-scale CPU infrastructure can be expensive for big data operations, and the frustration level rises when refactoring and hand-offs add cycle time.

In the past, use of analytics in big data operations often involved multiple organizational teams, which required customizing GPU integrations for the various use cases. Cloudera and Nvidia are jointly pursuing a solution that relies on using integrated GPUs across workflows to power data preparation and analytics.

The integration of Cloudera Data Platform with Nvidia’s Apache Spark 3.0 libraries creates architectures without demanding GPU customization. It’s a solution that aligns well with the scale of data operations today and customer interest in applying a GPU-driven model.

“We run on more than 400,000 servers and have over five exabytes of data under management,” said Sushil Thomas, vice president of machine learning at Cloudera, during an online press briefing last week. “Customers are screaming for GPUs.”

Accelerating enterprise workloads

The collaboration between Cloudera and Nvidia employs RAPIDS, an open-source accelerator designed to execute analytics pipelines on GPUs across hybrid platforms. RAPIDS, which is licensed under Apache 2.0, accelerates data analytics and the extract, transform and load process for GPUs on Cloudera Data Platform to speed up enterprise data science.

“We focused a lot on deep learning, where the power of GPU has really shown through,” said Manuvir Das, head of enterprise computing at Nvidia, during theCUBE’s broadcast of “Transform Innovative Ideas Into Data-Driven Insights” with Cloudera on August 5. “But as we’ve gone forward, we found that GPUs can accelerate a variety of different workloads, from machine learning to inference. The AI or machine learning compute needs to meet the customer where the data is.”

Another critical element that is driving the push for GPUs and the partnership between the two companies is being able to execute a cloud experience for data processing in a hybrid model. The “no-touch, self-service” cloud experience is what many enterprises are seeking when it comes to data management and analytics.

The packaging of Nvidia’s solution into Cloudera’s platform provides a key integration step to ease the processing of analytics workflows.

“We are seeing quite a bit of demand to simplify the whole experience,” said Scott McClellan, senior director of the Data Science Product Group at Nvidia, during the press briefing. “We want to bring a lot of that same experience to enterprises in a hybrid model, which is a key focus of the Cloudera Data Platform.”

Harnessing the power of hybrid

The partnership represents another key step for both firms in a big bet on the hybrid cloud. Nvidia has been notably active in the hybrid computing space over the past year with a string of high-profile announcements.

In August of last year, Nvidia teamed up with Google LLC to integrate its GPU Operator development environment with the Anthos hybrid and multicloud platform. A month later, Nvidia expanded its partnership with VMware Inc. by integrating its NGC software hub to support GPU-accelerated AI apps in hybrid solutions.

Nvidia unveiled a new set of AI edge and hybrid cloud services for AI workloads in June, and it announced the first Nvidia-powered hybrid cloud offering available through its AI LaunchPad partner program this month.

“When it comes to using your data, you want to use it in a variety of ways with a powerful platform, which of course you have built over time,” Das said. “Believe in the power of hybrid that data exists and compute needs to follow the data.”

For Cloudera, a hybrid strategy is grounded in the belief that not all applications are created equal and enterprises will need to understand the lineage of machine learning algorithms. This will require a dependence on tools to fully track data across multiple operating environments.

“The key is developing a hybrid data strategy,” said Mick Hollison, president of Cloudera, during an interview with theCUBE this week. “Hybrid is going to play a more important role in how work is conducted.”

Compliance and fraud solution for IRS

The partnership between the two companies has already yielded a significant customer use case inside one government agency, which certainly has no shortage of data. The U.S. Internal Revenue Service was confronted with a need for data analysis from its vast troves of information to deal with thorny taxpayer compliance and fraud issues.

The agency turned to Cloudera and Nvidia for help using data-driven insights to power mission-critical use cases.

“Our datasets are just getting bigger and bigger, and it demands that we actually do something to get more value added,” said Joe Ansaldi, technical branch chief, Research Applied Analytics and Statistics Division of the IRS, during an interview as part of theCUBE’s August 5 event. “Our biggest challenge is the infrastructure to support all of the ideas that subject matter experts are coming up with in terms of all the algorithms they would like to create.”

The IRS decided to test the Cloudera/Nvidia solution using a fraud detection algorithm on a four-terabyte dataset, according to Ansaldi. The agency was looking for speed.

“Our expectation was we were definitely going to see some speed-up in computation processing times,” Ansaldi said. “If I recall correctly, we had a 22 to 48 times speed-up after we started tweaking the original algorithm. Now it’s like the shackles are off and we can just run to our heart’s desire, wherever our imagination takes our subject matter experts to actually develop solutions.”

Documented performance improvement

The IRS example shows a significant acceleration for one use case, but Cloudera and Nvidia executives have been more conservative in their own estimates of results, citing speed improvement with less cost. Regardless, that was the promise of a shift to GPUs from the start, and the evidence to date has supported the companies’ claims.

“With documented performance improvements of 3x and customer references suggesting as much as 10x speed improvements at 50% of current cost for such workflows, it is hard not to think the Nvidia-Cloudera partnership has significant potential to boost revenues and customer wins for both companies,” said Daniel Newman, principal analyst at Futurum Research and chief executive officer of Broadsuite Media Group, in an interview with theCUBE. “I believe this partnership reflects the importance of leveraging both hardware frameworks and software to enable enterprises and other large organizations to fully realize the potential of machine learning deployed at scale.”

The collaboration between a software company such as Cloudera and a major semiconductor designer such as Nvidia is about the essential nature of AI and machine learning to a data-centric strategy. Businesses want to make better business decisions faster from virtually unlimited quantities of information, and that will require an architecture built for speed and the hybrid cloud.

“The reason we’re talking about speed and why speed is everything in a hybrid world and a hyper-competitive climate is that the faster we get insights from all of our data, the faster we grow and the more competitive we are,” said Rob Bearden, chief executive officer at Cloudera, in an interview during theCUBE broadcast. “That’s why the partnership between Cloudera and Nvidia together means so much. We turbo charged the enterprise data cloud to enable our customers to work faster and better, and to make integration of AI approaches a reality for companies of all sizes.”

Image: Cloudera Twitter

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU