INFRA
INFRA
INFRA
Nvidia Corp.‘s latest networking innovations meet the needs of a new kind of network that supports the unique demands of artificial intelligence factories.
Ethernet is no longer a generic plumbing choice but an enabler of high-performance AI. With today’s unveiling of Multipath Reliable Connection, or MRC, on Spectrum-X Ethernet, Nvidia is pushing Ethernet even deeper into AI-native territory — and doing so in partnership with OpenAI Group PBC and Microsoft Corp.
On the surface, MRC is a new remote direct memory access or RDMA transport protocol, now open-sourced via the Open Compute Project. In reality, it’s a production-proven way to keep tens or hundreds of thousands of graphics processing units fed and synchronized by using a single RDMA connection to stripe traffic across multiple paths and dynamically steer around congestion and failures. OpenAI has already used MRC on Spectrum-X to train recent frontier large language models powering ChatGPT and Codex, and Microsoft is deploying it in some of its largest AI factories built on GB200 systems. The important point is that MRC isn’t a lab experiment but a set of algorithms that has already earned its place in some of the most demanding AI environments on the planet.
There are three intertwined elements to the announcement:
That openness is important to scaling MRC. Nvidia has been adamant that everything in Spectrum X is built on standard protocols, with no proprietary wire formats and no lock-in at the packet level. The “secret sauce” is in how they partition control logic among NICs, switches and host software, not in a closed protocol. MRC follows that pattern: Anyone can implement the spec, but Nvidia believes its execution on SpectrumX hardware, with deep telemetry and fabric control, will be hard to match.
When a frontier model is being trained across tens or hundreds of thousands of GPUs, the network is effectively part of the compute pipeline. If a link flaps for a few milliseconds or a path gets congested, it’s a stall in a multimillion-dollar training run and can cost big money.
MRC addresses that problem in several ways.
During a call, Nvidia Senior Vice President Gilad Shainer described MRC as extending the routing “brain” all the way to the host. The network interface card and the host-side management stack (in OpenAI’s case, its own software) can actively participate in routing decisions, thereby overriding or influencing what the switches do. That’s a major shift from classical Ethernet designs, where a hosted tenant has little or no control over the fabric.
In more traditional cloud models, a hosted customer has visibility and control at the virtual machine or server level, but the network fabric remains opaque. OpenAI wanted to change that, acting as a “smart tenant” with the ability to govern routing policy, congestion responses and failure behavior from the server edge. MRC is the mechanism that reconciles that desire with the realities of a shared, hyperscale fabric.
Another key piece is SpectrumX multiplane support. Large AI factories are increasingly built as multiplane networks. That is a separate, independent network plane that provides a full path between GPUs. Think of it as having multiple disjointed fabrics in parallel, each serving as an alternative route for the same east-west traffic.
SpectrumX solves this. Hardware-accelerated load balancing across planes keeps latency predictable while scaling to hundreds of thousands of GPUs. Failures or maintenance events can be absorbed by shifting traffic between planes without disrupting training jobs.
MRC sits on top of this, using multiplane awareness to exploit those parallel fabrics more intelligently. The result is a kind of AI-native Ethernet fabric where redundancy, performance and control are baked into the transport, not bolted on via box-by-box tinkering.
Nvidia is careful to present MRC as “another protocol” on SpectrumX, not a replacement for everything else. Today, SpectrumX supports at least two main Ethernet transports for AI. Spectrum-X plus adaptive RDMA is a general-purpose AI Ethernet with adaptive routing in the switches and NIC-level optimization. Spectrum-X with MRC is an RDMA transport emphasizing multipath, host-driven routing and governance.
There is also the Ultra Ethernet Consortium, which is a multivendor effort to define a new Ethernet RDMA-based fabric. I asked Shainer about the long-term implications of these Ethernet variants and he gave a very pragmatic answer. He does not see the world collapsing onto a single “winner” like UEC. Instead, he expects more variety: Different hyperscalers and AI providers will tune their transport protocols to their own workloads and operational models.
In that context, MRC is a great example of a “custom Ethernet for AI” that’s already running in production, while UEC is another evolving effort. Technically, MRC builds on RoCEv2 as defined by the InfiniBand Trade Association, then extends it with multipath, host-governed routing and the multiplane integration.
Some concepts that surfaced in UEC discussions — such as enhanced congestion control — also show up in MRC, but wired into Nvidia’s hardware and host stack. From a user point of view, the important bit is that SpectrumX gives you a choice: you can run Adaptive RDMA, you can run MRC, and there are other undisclosed variants SpectrumX can support that are specific to other large customers.
One of the more interesting subtexts in my conversation with Shainer is the distinction between “hosted users” and “infrastructure owners.” If you own the AI factory, you can program switches, NICs and hosts end-to-end; you can roll your own routing algorithms and congestion-control tweaks anywhere in the stack. If you’re a hosted customer — OpenAI on top of Microsoft, for example — you typically only control the host. The network underneath is someone else’s problem.
MRC exists largely to bridge that gap. By embedding new logic in the SuperNIC and exposing it to host-side management, a tenant can make meaningful routing decisions that the fabric will honor, without direct switch access. That allows OpenAI, or others with similar models, to optimize for their specific training jobs — changing routing strategies, reacting to congestion patterns, or tuning behavior per workload — without owning the whole data center.
That’s an important pattern to watch as AI ecosystems get more layered and multiparty. We’ll see more cases where a model provider wants near-owner-level control over routing and telemetry, even when they’re running on someone else’s iron. MRC is an early pattern for how that could be done safely over Ethernet.
From an industry perspective, MRC and SpectrumX underscore three trends.
First, AI is forcing Ethernet to specialize. Ten years ago, you could plausibly talk about “one Ethernet” dominating the data center. Today, we have a spectrum: shallow-buffer vs. deep-buffer switches, DCB vs. ECN-driven fabrics, a variety of RDMA variants, and now AI-specific transports such as MRC. Shainer’s line that “there is Ethernet, and there is Ethernet, and there is another Ethernet” isn’t just a joke — it’s the reality of the role the network plays in AI.
Second, open specifications with proprietary implementations are becoming the norm. By pushing MRC into OCP, alongside contributions from AMD, Broadcom and Intel, Nvidia gains ecosystem credibility while still betting that its Spectrum-X implementation will perform best. It’s the same playbook Nvidia has used in InfiniBand: standards on the wire, differentiation in silicon, and software.
Third, UEC is now one of several options, not the ordained future. With MRC in production on GB200-based clusters at Microsoft and in OpenAI environments, Nvidia can point to a working, large-scale, open Ethernet transport that doesn’t depend on the UEC kitchen to finish its meal. That doesn’t kill UEC, but it does make the future feel more pluralistic — one where hyperscalers, silicon vendors and model providers define and adopt the flavors that best match their economics and risk tolerance.
For enterprise buyers and service providers, the practical takeaway is this: When evaluating “AI networking,” don’t stop at port speeds and buffer sizes. Ask which transport protocols the fabric supports, how they’re implemented in NICs and switches, what telemetry and host-side control you get, and how quickly the system can respond to failure and congestion. In other words, treat the network as part of the AI architecture, not just a line item.
Nvidia’s MRC announcement, backed by OpenAI and Microsoft, is a strong reminder that in gigascale AI, Ethernet must mature and function as an AI-native fabric. With Spectrum-X, Nvidia is betting that the winning networks won’t just be fast — they’ll be intelligent, programmable and tailored to the unique demands of AI factories.
Zeus Kerravala is a principal analyst at ZK Research, a division of Kerravala Consulting. He wrote this article for SiliconANGLE.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.