UPDATED 09:34 EST / JUNE 05 2024

AI

Is Nvidia becoming the de facto AI mainframe?

Forgetting the fact that it took 30 years for Nvidia Corp. to become an overnight success, its incredible rise over the recent past looks like it has the makings of a bubble.

About 15 years ago, the company made a riverboat gamble to pivot from gaming toward high-performance computing and artificial intelligence, and the numbers have been getting crazier. Revenue has more than doubled year-on-year for the last three straight quarters. Having pushed longtime leader Intel Corp. back to the rearview mirror, Nvidia recently declared a 10-for-1 stock split just to keep share prices affordable.

But let’s ask a more nuanced question. Granted, at some point the laws of gravity will hit Nvidia. But the operable question is how defensible Nvidia’s position is over the long haul. The answer may hinge not just on how effectively Nvidia maintains its technology edge, but also the degree of stickiness of the platform. For AI developers, how easy will it be to move on or off Nvidia? Are the barriers sufficiently high that we are witnessing the birth of a de facto AI mainframe?

As we noted earlier this year, Nvidia dominates the supply chain for advanced AI processors. Amazon Web Services Inc. and Google Cloud have introduced their own specialized silicon while Advanced Micro Devices Inc. and Intel are anxiously gearing up. And if this were only just generic silicon, yes, there are going to be lead times for ramping up the manufacturing capacity, but eventually, the laws of supply and demand should level the playing field, just as Intel and AMD chips became interchangeable not only for “Wintel” computers, but also Macs.

But here’s the spoiler. We’re not just talking about chips and fabs. Over the past 15-plus years, Nvidia has been busy building a top-to-bottom technology stack that presents the opportunity for classic lock-in. Atop its processor portfolio is a broadening software stack that is transforming Nvidia into an end-to-end software and hardware computing platform. Spanning from chip to supercomputer to libraries, tools and microservices, Nvidia AI Enterprise is built with a unified, fit-for-purpose architecture that has uncanny parallels with the mainframe.

While Nvidia was founded back in the 1990s designing chips for gamers, the enterprise story begins in 2006 with the introduction of CUDA, a portfolio of 150-plus libraries, software development kits, and profiling and optimization tools from Nvidia and partners. Nvidia had to reengineer its entire chip portfolio to run CUDA.

Short for Compute Unified Device Architecture, CUDA is a parallel computing platform that enables developers to define parallel functions that run on Nvidia GPUs. CUDA enables developers to work with application programming interfaces rather than having to write low-level drivers, just as they were accustomed to working in the CPU world.

The CUDA portfolio is especially rich, supporting functions for deep learning, linear algebra, signal processing (for audio and video), generalized parallel and data processing computations, along with specialized capabilities for computational lithography and communications, to name a few. It abstracts the complexities of GPU programming; developers focus on allocating memory and setting up data transfers rather than have to worry about how to parallelize writes to the processor.

Nvidia viewed the GPU as the alternative to classic high-performance computing architecture used for scientific compute workloads. Where traditional HPC ganged hundreds or thousands of CPUs, GPUs concentrated compute on the chip, and was optimized for compute- rather than input-output operations-intensive jobs for which CPUs were designed.

But at the time, HPC was a niche, and Wall Street worried that CUDA would distract Nvidia from its core market. And back in the early 2000s, Moore’s Law still had some life left in it; it wasn’t until the 2010s where it grew apparent Moore’s Law hit the wall and that GPUs or multicore would be the only ways forward.

For CUDA, the watershed was the 2012 AlexNet project, an experiment with using convolutional neural networks or CNNs for image recognition. The CNN, trained on Nvidia GTX 580 3GB GPUs using CUDA, outclassed the competition with the lowest error rate. The AlexNet research paper was downloaded more than 100,000 times, putting CUDA on the map. Though alternatives to CUDA emerged, in the early 2010s those multipurpose libraries were no match for CUDA’s bespoke design.

AlexNet vindicated Nvidia’s move up the food chain with CUDA to technical computing. But in the early 2010s, deep learning and neural nets were still the stuff of bleeding edge. Nvidia transcended its gamer niche for a more extensible one, but at the time, it was still a niche.

Attention is all Nvidia needed

Upon winning the AlexNet competition, Nvidia elevated AI front and center to its message. But here’s the rub. As noted, image or voice recognition, or any form of neural net, was still early stage as a market. Instead, the 2010s were the decade of Big Data, and when it came to AI, machine learning was far ahead of deep learning in the queue.

Data and ML are both IOPS-intensive processes that worked just fine if you could scale out to enough commodity hardware; GPUs were overkill for these workloads. Not surprisingly, during the Big Data decade, the AWSs, Azures and Googles – but not Nvidia – cleaned up.

But then came Google’s seminal 2017 Attention Is All You Need research paper that, eventually, gave Nvidia all the attention it needed. The transformer pattern introduced in that paper made generative AI feasible because it provided a shortcut around traditional neutral net processing. A year earlier, Nvidia custom built its first DGX supercomputer that it provided to Open AI. And about five years later plus or minus came ChatGPT, and that’s why we’re having this conversation.

During this period, CUDA’s breadth and maturity drove native support from deep learning frameworks such as TensorFlow, PyTorch, Caffe, Theano and MXNet. The portfolio of functions grew from Nvidia and from partner content. Having built these libraries over 20 years, there’s now a whole generation of AI developers coding to them. AMD and Intel have grand leapfrog plans get their new state of the art fabs online. However, unless they are catering to developers new to AI, the AMDs and Intels of the world had better get darn good at writing emulators.

CUDA is just the start

While CUDA is the hook for developers, it is actually the tip of the iceberg for an expanding Nvidia proprietary ecosystem of tooling and libraries. Admittedly, you can develop programs to run Nvidia H100s without any of these layers, but the company is adding amenities to keep coders productive and evolve the whole stack into a supercomputing platform. It starts with the popular RAPIDS open-source software libraries and APIs for data science, complemented with the NeMo developer platform tools for curating tokens at the trillions scale; a choice of different preconfigured distributing computation patterns; a library of pretrained models that can be customized with built-in prompt engineering and fine tuning tools; and integration with Nvidia’s own Triton inference server.

Atop that is a new layer of microservices that is Nvidia’s play for the application tier. Nvidia Inference Microservices or NIMs is a set of optimized cloud-native microservices for embedding models written for the Nvidia stack. It encompasses standard APIs for language, speech, drug discovery and other patterns, with prebuilt containers and Helm charts packaged with optimized models. With these microservices, models executing on Nvidia systems can be embedded into enterprise applications.

Already, third parties are packaging their own NIMs for deployment, such as Adobe Inc. with PDF extraction services, SAP SE for extending its Joule copilot, and ServiceNow Inc. for smart assistants. Nvidia customers can mix and match these services individually or license them as a package with Nvidia AI Enterprise, which adds the ability to manage all of these development and deployment services as a whole. It’s available in editions that are deployed at the server and/or at the edge.

And that’s just what’s on top. Getting back to chips, Nvidia’s models are getting more intricate than ever. Its Nvidia GB200 Grace Blackwell “Superchip” connects two Nvidia B200 Tensor Core GPUs to the Nvidia Grace CPU over a 900-gigabit-per-second ultra-low-power NVLink chip-to-chip interconnect.

And this is all packaged into an integrated supercomputer: Nvidia DGX (pictured). It provides a self-contained data center that can be deployed as a hybrid cloud on-premises or as a service on AWS, Azure, Google or Oracle clouds. According to published reports, Nvidia nearly tripled its future commitments with each of the hyperscalers for its DGX cloud service since January. DGX includes base command as the management tier, providing all the necessary job scheduling, orchestration, network and cluster management functions. And like a Russian doll, DGX bundles Nvidia AI Enterprise – the full software stack.

So what’s the deal with the mainframe?

IBM Corp. dominated enterprise computing during the heyday of the mainframe. That era was all about the hardware, with software treated as an afterthought: Enterprise MIS departments (they weren’t called IT yet) either wrote their own programs or hired consultants to do it, or programs were bundled with the box. Commercially packaged software did not yet exist. Software was written on the specific hardware platform, and though emulators existed, they were no substitutes for the real thing.

Admittedly, IBM wasn’t the only mainframe player in town, but the Honeywells, Sperry-Univacs, Control Datas and GEs eventually got relegated to the dustbins of history, and with it, all the programs written for them. A similar pattern emerged with the rise of the midrange in the 1970s and ’80s, as software written for DEC wouldn’t run on Prime or Data General machines, and today the bulk of that code is extinct.

The world today is not a carbon copy of the past. There are proprietary islands and multiplatform realities existing side by side. Obvious examples are Apple versus the rest of the world – Android in mobile and Windows on laptops. But the abstractions are high enough, not to mention that having only two primary ecosystems makes life very manageable for mobile and laptop developers to develop for just two target platforms.

On the other hand, the server has become the domain of open systems with Linux becoming the de facto standard and software portability across hardware taken for granted. Likewise, thanks to robust W3C standards, web apps should run on any browser.

But still, neural networks and gen AI are different matters. Given the ravenous appetite for compute, any loss in performance when translating to another hardware platform is just not acceptable; the risks are too high.

As noted, it’s not simply that Nvidia has carved an overwhelming lead in infrastructure (even though ironically it doesn’t actually manufacture anything). But Nvidia offers a fully optimized stack rendering it difficult if not impossible to gain comparable performance elsewhere without serious coding rewrite or refactoring. Admittedly, developers could take architectural approaches to abstract the algorithm from the platform, but that requires added labor and forethought up front. And given the stringent performance requirements for generative or neural net AI models, bespoke optimization for the processor could be a major gating factor to model success.

That said, nature abhors a vacuum. This Medium post provides a well-documented look at Nvidia’s competitive landscape as of 2023. It concludes that Nvidia currently has a monopoly, but there are potential threats lurking.

For instance, there has been a number of attempts over the years to generalize the interface with GPUs dating back to the OpenCL project created by Apple in 2008; surprise, surprise, it can actually run on CUDA. More recently, the Linux Foundation jumped in with the Unified Acceleration Foundation or UXL project, a standard framework that should allow developers to write code that can run cross-platform. With UXL, a model developed for running on AMD’s ROCm should also target Intel’s oneAPI without code changes, and so on.

And there have been moves from the likes of Google and Microsoft for enabling support of various rival proprietary and open frameworks on infrastructure from second sources such as AMD. By the way, as enterprises look to more compact, domain-specific smaller language models, that could open doors for alternative platforms that won’t have to be the most powerful on the planet.

The laws of supply and demand could catch up with Nvidia, once supply catches up, of course. But for now don’t hold your breath. Intel won’t be live with its new state of the fabs until 2027 and it will also likely take a few years for AMD and Apple to similarly scale up. There’s growing support for CUDA alternatives, such as popular frameworks like PyTorch and TensorFlow supporting alternatives like ROCm.

Hyperscale cloud providers, of course, offer their own silicon specialized for training and inference workloads. And while hyperscalers want to remain in Nvidia’s good graces, their chief priority remains selling compute cycles. Hyperscalers will book capacity for whatever their customers demand.

Nonetheless, in the near term, Nvidia’s biggest worry is about keeping Wall Street in line as its multiples next year are likely to flatten owing to a likely hangover of customer capital spend. For now, Nvidia’s worry isn’t about developers defecting to less costly platforms. Nvidia AI Enterprise remains far more complete than anything else out there, not to mention that most large AI programs are written for the CUDA libraries. Though it is an expensive, intricate platform, Nvidia today has the same built-in defensibility of the classic mainframe.

Tony Baer is principal at dbInsight LLC, which provides an independent view on the database and analytics technology ecosystem. Baer is an industry expert in extending data management practices, governance and advanced analytics to address the desire of enterprises to generate meaningful value from data-driven transformation. He wrote this article for SiliconANGLE.

Photo: Nvidia

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU