

Data center infrastructure optimization startup Clockwork Systems Inc. says it’s expanding its clock synchronization technology with the launch of a new offering called FleetIQ in order to maximize the utilization of graphics processing units and boost the performance of artificial intelligence workloads.
The news came as Clockwork closed on $20.575 million in fresh funding and hired a new chief executive, former NetApp Inc. exec Suresh Vasudevan. Today’s round was led by existing investor New Enterprise Associates and saw participation from industry luminaries, including Intel Corp. CEO Lip-Bu Tan, former Cisco Systems Inc. CEO John Chambers and the venture capitalist Carl Ledbetter.
Clockwork’s technology was designed to solve latency headaches in distributed networks by providing accurate clock synchronization to enable servers to run more efficiently. With the launch of FleetIQ, Clockwork says it’s extending its highly accurate clock capabilities into the AI and GPU domain, where it brings the promise of sub-microsecond visibility and accelerated cluster performance to enhance AI training and inference and prevent workload disruption.
During an appearance on theCUBE’s AnalystANGLE segment last October, Clockwork’s co-founder and former CEO Balaji Prabhakar explained that there’s a big need for improved synchronization of clocks in the clusters of servers that make up modern data centers. Traditionally, distributed networks have relied on the internal timing mechanisms of hardware such as switches and routers, but the lack of synchronization can cause bottlenecks for any workload.
With Clockwork’s edge-based synchronization, companies can measure latency accurately without depending on network hardware. Using this approach, networks can run at higher utilization rates while keeping latency low, which is particularly beneficial in applications such as high-frequency financial trading, where microseconds can mean millions of dollars.
“If you have accurate clocks, you can measure more accurately the time it takes for a packet to go through a network,” Prabhakar told theCUBE. “You can actually measure congestion from the edge, and then you can control that congestion. You can drive them at higher utilizations while maintaining very low latencies.”
It turns out that the same thing applies to AI workloads, where the emergence of more powerful GPUs has caused the bottleneck to shift from compute to communications. These days, most AI training and inference workloads are powered GPUs from Nvidia Corp. and Advanced Micro Devices Inc.
The most demanding jobs employ enormous clusters, sometimes numbering thousands of GPUs all trying to work in sync to process AI tasks. This has caused a real headache for data center operators, which struggle to keep those clusters of GPUs in sync. If even a single chip lags behind, it can cause all of the other thousands of GPUs in a cluster to come to a halt while they wait for it to catch up.
Clockwork refers to this problem as the “AI efficiency gap.” It says real-world GPU clusters typically only achieve between 30% and 55% of their theoretical performance due to constant link failures and other issues that stem from the lack of reliable synchronization. That inefficiency can cost enterprises billions of dollars over the long term.
For instance, today’s most advanced large language models, such as OpenAI’s GPT-4o, run on clusters of more than 100,000 GPUs, which represent an investment of about $6 billion. So if that cluster is running at less than half of its theoretical capacity, it means waste of up to $3 billion.
FleetIQ solves this challenge. Clockwork says it’s a software-driven GPU fabric that leverages its foundational clock synchronization technology to provide “microsecond-level” visibility into clusters and immediately pinpoint any bottlenecks. By enabling stateful fault tolerance, which keeps AI workloads running when individual chips in a cluster fall behind, it increases throughput and avoids the need for expensive restarts. That boosts the cluster’s overall efficiency.
Clockwork said FleetIQ is a hardware-agnostic service that works with GPUs from Nvidia and AMD as well as custom AI accelerators such as Amazon Web Services Inc.’s Tranium chips. It’s also compatible with networking infrastructure such as Nvidia’s InfiniBand and Ethernet/RoCE, enabling superior cluster utilization no matter what the setup is. By improving GPU cluster efficiency, FleetIQ enables companies to run AI training, inference and user-facing applications concurrently, with dramatic improvements in performance and economics, while simplifying networking operations.
Holger Mueller of Constellation Research Inc. told SiliconANGLE that Clockwork is tackling a key problem in AI operations. “The challenge of syncing the internal clocks of AI servers has not yet been addressed, and if anything, it has only gotten worse with the emergence of more powerful and varied processors and server types,” he said. “Flush with new funding, Clockwork is trying to address this issue and could become a pivotal player if it can make clock syncing challenges a thing of the past.”
There are signs it may well do. One company that can testify to this is the Danish Center for AI Innovation, which operates that country’s most powerful supercomputer, Gefion, and participates in a number of critical AI research initiatives around drug discovery, advanced weather forecasting and more.
“To succeed, we must deliver resilience, reliability and efficiency at an unprecedented scale — performance once reserved for hyperscalers,” said DCAI CEO Nadia Carlsten. “Partnering with Clockwork enables us to operate Gefion seamlessly and reliably, even as workloads and demands increase. The result is a compute-efficient, fault-tolerant infrastructure that researchers and industries can trust, lowering costs and eliminating wasted GPU cycles.”
The launch of FleetIQ coincides with Prabhakar’s decision to step down from his day-to-day leadership role in favor of new CEO Vasudevan.
Prabhakar said he’s handing over the reins to Vasudevan partly because he wants to focus more on the company’s technology strategy, and also because of his role as a professor at Stanford University. In addition, he believes Vasudevan has the skills to scale Clockwork’s business better than he could.
Vasudevan’s expertise was forged at a number of well-known data center infrastructure companies. Having made his name as the chief product officer at NetApp, Vasudevan left to take over as CEO of Nimble Storage Inc. in 2011, growing its revenue from almost nothing to over $500 million before taking it public during his tenure. He then joined the runtime security startup Sysdig Inc. as CEO in 2018, helping to establish that company as a leader in container and cloud security before leaving last year.
“He brings an exceptional combination of go-to-market leadership and product building experience required to scale entirely new categories, positioning Clockwork to enter its next phase of hypergrowth,” Prabhakar said.
Vasudevan said he’s joining Clockwork because he believes that “communication is the new Moore’s Law” and that whoever masters it will dominate the AI infrastructure industry. “At Clockwork, we are pioneering an intelligent abstraction layer between workloads and infrastructure that observes, predicts and controls in real time, dynamically aligning application requirements and fabric behavior,” the new CEO said. “It enables organizations to achieve more with the same infrastructure and will make AI more economically viable for the decade ahead.”
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.