UPDATED 12:50 EDT / NOVEMBER 06 2025

Low-latency AI model pioneer Inception nabs $50M

Inception AI Inc., a company building ultra-fast large language models based on the diffusion technique, said today it has raised $50 million in new funding led by Menlo Ventures.

Mayfield, Innovation Endeavors, Nvidia Corp.’s venture capital arm NVentures, Microsoft Corp.’s M12, Snowflake Ventures and Databricks Investment also participated in the round.

Inception builds LLMs with a novel approach that allows them to produce text and code at significantly higher speeds than current architectures. Modern LLMs use a technique called autoregression, which requires them to generate words one at a time. Although this is fast for short responses, it slows them down for lengthy replies.

The company instead uses a method that applies the mechanism behind image and video generation called diffusion. Diffusion allows image and video media models such as DALL-E, Midjourney and Sora to generate content by forming an entire image at once, not pixel-by-pixel sequentially.

According to Inception, this technique enables text generation that is up to 10 times faster and more efficient, while also delivering high-quality answers.

“Training and deploying large-scale AI models is becoming faster than ever, but as adoption scales, inefficient inference is becoming the primary barrier and cost driver to deployment,” said Inception co-founder and Chief Executive Stefano Ermon. “We believe diffusion is the path forward for making frontier model performance practical at scale.”

Inception’s model family is called Mercury, and the company added that it’s the only commercially available dLLM. The technology behind it enables it to outpace significantly even speed-optimized models from major AI providers, including OpenAI Group PBC, Anthropic PBC and Google LLC.

To allow users to visualize how the model generates blocks of text, Inception added a feature to Mercury’s chat that displays a fanciful animation of text being “generated” by having it gradually sharpen into detail.

The company said the diffusion process also reduces the total graphics processing unit footprint needed to execute, allowing organizations to run larger models at the same latency and cost, or serve more users on the same infrastructure.

Mercury is offered in two models: a general-purpose model for ultra-low latency chat, and Mercury Coder, which is optimized for generating computer code. Both models have a 128,000-token context window, which is roughly equivalent to a 300-page novel, and are priced at 25 cents per million input tokens and $1 for output.

Both models are available via major cloud providers like Amazon Web Services Inc. and unified model access services such as OpenRouter Inc. Support from Microsoft Corp.’s Azure Foundry is coming soon.

Inception said that dLLMs can benefit more than just speed and cost efficiency. The company outlined a roadmap for adding new model features, including advanced reasoning capabilities using built-in error correction to reduce hallucinations and unified multimodal capabilities. The model architecture also supports high-precision control over structured outputs, making it ideal for tool calling and data generation.

Ultra-fast generation has been the holy grail of LLM providers since it unlocks opportunities for faster and smarter AI agents. Users don’t want to wait for answers; at the same time, they want accuracy and clarity, two things that are often at odds for AI development. Use cases such as voice-enabled agents also require super-fast reaction time to engage in what feels like natural human communication.

Image: Shutterstock/Nepool

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Low-latency AI model pioneer Inception nabs $50M

Image: Shutterstock/Nepool

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Low-latency AI model pioneer Inception nabs $50M

Image: Shutterstock/Nepool

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Cookies