INFRA
INFRA
INFRA
Observability is becoming the control system for the complex infrastructure powering modern digital services and AI workloads.
As enterprises move from AI experimentation to large-scale deployment, the operational challenge of keeping those systems reliable is intensifying. Companies such as Virtana are focusing on observability platforms that can monitor entire environments rather than isolated components — particularly as organizations work with infrastructure partners such as Dell Technologies Inc. to build out AI factories, according to Paul Appleby (pictured), president and chief executive officer of Virtana, which provides observability software that helps enterprises monitor, optimize and manage complex hybrid cloud and AI infrastructure.
“Every one of those organizations has digitized their services,” Appleby said. “We can tune them every day. We do transactions on our phones, whether we’re buying things, booking flights or moving money around in a bank account. That incredibly complex infrastructure that delivers up those services is fundamental to ensuring the continuity and resilience of those businesses. This is an incredibly expensive technology, you need a new class of observability. That’s really what Virtana does.”
Appleby spoke with theCUBE’s Gemma Allen for theCUBE + NYSE Wired: AI Factories – Data Centers of the Future interview series, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They explored how observability is evolving to help enterprises operate and optimize complex AI factory infrastructure.
As enterprises scale AI infrastructure, monitoring individual components is no longer enough. The architecture behind AI factories involves layers of compute, storage, networking and data pipelines that interact continuously. Observability platforms are increasingly designed to understand how these elements behave together as a system, Appleby noted.
“These AI factories are being scaled out all over the world,” he said. “The challenge, of course, with that is they come with their own set of complexities and their own sets of challenges. We think AI factories are just a bunch of GPUs. We hear GPUs have entered the parlance and we now talk about GPUs almost on a daily basis, but the reality is an AI factory is a hugely complex system that the GPUs are just a part of.”
The difficulty for enterprises is not simply detecting outages but identifying the source of failures quickly enough to avoid disruption. As AI systems become embedded in financial services, telecommunications, healthcare and other critical sectors, resilience becomes a business requirement rather than a technical preference, Appleby emphasized.
“If we’re going to be relying on AI as part of our core business, whether it’s financial services or telco or healthcare or whatever, having 25% of jobs fail is just not acceptable,” he added. “That’s really, again, where this new class of observability has to come into play.”
Operational efficiency is also becoming a major factor as companies deploy large GPU clusters to support AI workloads. Observability tools are now expected to help organizations monitor utilization levels across the entire infrastructure stack, ensuring that costly compute resources are used effectively, according to Appleby.
“Not only monitoring the availability and performance of the service, but actually the operational efficiency of the AI factory not only helps you maximize the value of your GPU investments, but it also helps you lower electricity usage, water usage, heating, HVAC costs and all the rest of it,” he said. “There’s also an environmental impact there as well, which is really important.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of theCUBE + NYSE Wired: AI Factories – Data Centers of the Future interview series:
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.