UPDATED 09:00 EDT / MAY 28 2025

INFRA

Atlas Cloud optimizes AI inference service to boost GPU throughput

Cloud infrastructure startup Atlas Cloud today launched a highly optimized artificial intelligence inference service that it says dramatically reduces the computational requirements of even the most demanding AI workloads.

The new service, called Atlas Inference, is designed to provide companies with a more cost-effective and simpler environment in which they can deploy and run their large language models.

Atlas Cloud is the creator of a cloud-based infrastructure platform that’s geared especially for AI workloads. It provides low-cost and on-demand access to clusters of up to 5,000 graphics processing units, for both AI training and inference workloads. Customers can choose from a selection of GPU types, and the platform is serverless too, so they don’t have to worry about configuring their clusters or carrying out maintenance work.

The new Atlas Inference service is based on the open-source SGLang inference engine. The company says it maximizes GPU efficiency by processing more tokens with fewer computational resources. It claims it can deliver 2.1 times greater throughput for AI workloads compared wit equivalent AI inference services offered by the likes of Amazon Web Services Inc. and Nvidia Corp.

When running heavyweight, tensor-parallel AI systems, Atlas Inference can deliver equal or superior throughput while using 50% fewer GPUs. It features real-time load balancing capabilities that allow it to evenly distribute tokens and reduce latency spikes on overloaded nodes, ensuring stable performance under any conditions. In tests, it claims it was able to maintain sub-five-second first-token latency and 100-millisecond inter-token latency across more than 10,000 concurrent sessions.

The company adds that an Atlas Inference 12-node cluster outperformed DeepSeek Ltd.’s reference implementation for the DeepSeek V3 model while using only two-thirds of the server’s computational capacity. At the same time, operational expenses were reduced by 80%.

Atlas Cloud says this was made possible by four separate innovations. They include a “prefill/decode disaggregation” technique that separates compute-intensive operations from memory-bound processes to boost efficiency. There’s also “DeepExpert Parallelism,” which uses load balancing to increase GPU utilization across the entire cluster. Other innovations include Atlas Cloud’s proprietary two-batch overlap technology, which boosts throughput by enabling larger token batches, and the use of “DisposableTensor memory models,” which help prevent system crashes.

Another advantage of Atlas Inference is its linear scaling behavior across nodes, which automates the expansion and contraction of GPU clusters in real time to help optimize infrastructure costs.

Atlas Cloud Chief Executive Jerry Tang said the company wants to change the economics of AI deployment in order to make it more profitable for enterprises. He explained that many companies can barely break even at the moment, while others are running AI applications and services at a loss, because of the sky-high computational costs.

“Our platform’s ability to process 54,500 input tokens and 22,500 output tokens per second per node means businesses can finally make high-volume LLM services profitable,” Tang said. “I believe this will have a significant ripple effect throughout the industry. We’re surpassing industry standards set by hyperscalers by delivering superior throughput with fewer resources.”

The startup says Atlas Inference is compatible with any type of GPU hardware and supports any kind of AI model. It’s available starting today via the company’s cloud-based servers, and can also be run on customers’ on-premises servers.

Image: SiliconANGLE/Dreamina

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU