UPDATED 12:00 EDT / MAY 05 2026

data science networking AI ghostly hand reaching into a dark space filled with glowing symbols AI

Subquadratic launches with $29M to bring 12M-token context windows to AI

Subquadratic, a company developing a novel generative artificial intelligence model, launched today with $29 million in seed funding.

The new large language model, dubbed SubQ, uses what the company calls a subquadratic architecture that greatly increases the context window — how much information the AI can read at once — without significantly increasing the amount of compute it requires. The company also says it outperforms other state-of-the-art models on speed and accuracy.

Many AI systems today are built around the limits of context windows. For example, the industry standard is 128,000 tokens for many AI models and up to 1 million tokens for frontier cloud models such as Claude Sonnet 4.7 and Gemini 3.1 Pro. SubQ can manage a context window of up to 12 million tokens, maintain accuracy, increase speed and reduce compute cost. At that length, it would be around 9 million words, or almost 120 books.

To reach this context size, Subquadratic needed to create a model that could handle that much data without breaking the “compute bank.” To do that, co-founders Justin Dangel, who’s chief executive, and Chief Technology Officer Alexander Whedon told SiliconANGLE in an interview, the company settled on a proprietary transformer architecture that implements sparse attention.

“We are very focused on the problem of how we transition from a dense attention, quadratic scaling architecture to a sparse attention linear architecture,” Dangel said. “Sparse attention is an effort to say, hey, let’s try to figure out how to not compare every token to every token to every token.”

The “T” in ChatGPT stands for “transformer,” which is the type of generative AI model architecture under the hood. It’s not necessary to understand what that is, just that it’s essentially the engine of the LLM that provides its power to contextualize language.

Traditional transformer models use dense attention, meaning the model compares every token in a prompt with every other token. That becomes expensive very quickly: Boubling the input does not just double the work; it roughly quadruples the number of token-to-token relationships the model has to consider. That is the “quadratic” scaling problem Subquadratic is targeting.

Attention is what transformers use to “understand” how a prompt operates by comparing words (broken up into tokens) to one another. The same way humans know that, in the sentence “The cat is in the room,” the words “cat” and “room” relate to one another, an LLM compares words in a sentence to understand their relationships. With dense attention, every word is compared to every other word. The more words there are, the more comparisons the model needs to make.

“If you double the input size with quadratic scaling laws, you need four times to compute; with linear scaling laws, you need just twice,” Whedon explained.

According to Subquadratic, SubQ is more than 50 times faster and 50 times less expensive than leading frontier models at 1 million tokens, while maintaining higher accuracy. At its full 12 million-token context window, the company says, the model reduces compute requirements by almost 1,000 times compared with other frontier models. On the RULER 128K long-context benchmark, Subquadratic said SubQ scored 95% accuracy at a cost of $8, compared with 94% accuracy and about $2,600 for Claude Opus, representing roughly a 300-times reduction in cost.

Changing context window handling

Currently, the data view of LLMs is limited to a maximum of 1 million tokens for most state-of-the-art models, and even that can be difficult to use because of compute constraints.

To handle this, developers carefully curate the data that goes into the context window using systems such as retrieval-augmented generation, or RAG, and agentic retrieval systems to manage data flow. These systems necessarily add latency and computational overhead, and can bias the information fed to the LLM.

“I used to manually curate prompts and retrieval systems and evals and conditional logic to chain together the workflows,” Whedon said. “And I think that that is kind of a waste of human intelligence and also limiting to the product quality.”

Subquadratic’s vision is that AI is being constrained by the cost curve of dense-attention transformers. The company argues that once the architecture moves from quadratic to linear scaling, developers can build products that were previously too slow, too expensive or too reliant on brittle data curation.

To tackle this, the company is launching the SubQ application programming interface, making it available to developers and enterprise teams that need access to the full 12 million-token context window. It is also launching SubQ Code, a command-line interface coding agent designed to load entire codebases into a single context window, so developers can plan, execute and review across a repository without coordinating multiple agents.

Dangel also described a search product that will initially be free, suggesting a land-and-expand strategy around long-context research, coding and enterprise workloads. He added that the model will not be open-weight or open-source in the near term, but will be trainable for customer-specific use cases.

The funding round was joined by investors including Javier Villamizar, former partner at SoftBank Vision Fund, and Justin Mateen, co-founder of Tinder and founder of JAM fund, alongside early investors in Anthropic PBC, OpenAI Group PBC, Stripe Inc. and Brex Inc.

“The fundamental scaling laws imposed by the transformer architecture and dense attention have been broken through,” Dangel concluded.

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.