UPDATED 09:00 EDT / AUGUST 22 2024

Scott Hebner, principal analyst at theCUBE Research, talks about causal AI during an AnalystANGLE segment on theCUBE.

AI21 Labs’ updated hybrid SSM-Transformer model Jamba gets longest context window yet

OpenAI rival AI21 Labs Ltd. today lifted the lid off of its latest competitor to ChatGPT, unveiling the open-source large language models Jamba 1.5 Mini and Jamba 1.5 Large.

The new models are based on an alternative architecture that enables them to ingest much longer sequences of data, so they can better understand context compared to traditional LLMs. AI21 says Jamba 1.5 Mini and 1.5 Large stand out as the fastest and most efficient LLMs in their class sizes, delivering superior performance to open-source alternatives such as Llama 8B and Llama 70B.

The models build on the success of the original Jamba foundational model, which combines the traditional Transformers architecture with a framework known as “Mamba” that’s based on the older Structured State Space technique for building artificial intelligence. SSM models, as they’re known, rely on older concepts such as neural networks and convolutional neural networks and are known to be more computationally efficient.

It’s this unique architecture that allows the Jamba models to ingest more data to deal with workloads where greater context can be more helpful, such as generative AI reasoning tasks.

AI21 Labs, which has raised more than $336 million in funding, is the creator of the Jurassic family of LLMs that compete with OpenAI’s GPT models. But rather than try to take on that company directly in a never-ending race to add computational power, the startup realized that it might be better off pursuing an alternative approach.

Its hybrid SSM-Transformer model is designed to address some of the main shortcomings of Transformer LLMs, in particular the way they struggle to deal with large context windows. When faced with large context, even the best LLMs slow down in order to process the information they need to provide a response.

The issue with Transformer models is that their attention mechanisms must scale together with sequence length, as each token depends on the entire sequence that preceded it. This has the effect of slowing down throughput, leading to low-latency responses. Transformer models also require a much larger memory footprint in order to scale, which means they need vast amounts of computing power to deal with longer context windows.

Context is king

AI21 Labs Vice President of Product Or Dagan told SiliconANGLE that context is important for AI, because it refers to the input data that a generative AI model considers before generating its response. He explained that an AI model that can effectively handle long context is crucial for many enterprise generative AI applications.

“First of all, analyzing long documents, meeting transcripts, internal policies — these have become very popular tasks for AI,” he said. “But in many cases, AI models that don’t really utilize their entire context hallucinate and miss important information.”

By having an AI that properly understands context, it’s possible to improve the responses they generate, Dagan said. “In addition, a long context model substantially improves the quality of RAG and agentic workflows, which are becoming the key part of many AI enterprise solutions,” he said. “Long context models reduce costs in these systems by eliminating the need for continuous chunking and repetitive retrievals. While it’s sometimes claimed that RAG is a substitute for long context, a successful system needs both.”

The Mamba architecture, originally developed by researchers at Carnegie Mellon and Princeton University, operates with much lower memory footprint and has a more efficient attention mechanism, enabling it to handle longer context windows with ease. However, Mamba models cannot match the output and breadth of knowledge possessed by Transformer LLMs. That’s why AI21 Labs has opted to combine the two architectures, taking advantage of the best bits of both.

Dagan explained that the main difference between the architectures is that transformer models always “look” at the entire context, which slows them down, whereas Mamba models maintain a smaller “state” that’s constantly updated throughout the context.

“This means Mamba doesn’t have the same huge memory and computational footprint of Transformers, so it can easily fit more context on the same hardware and process it faster,” he said. “Second, since Mamba works with this moving state, it can generalize better learnings from shorter contexts to larger ones.”

Advanced reasoning and Agentic AI

Jamba 1.5 Large is the culmination of those efforts. According to AI21 Labs, it’s a sophisticated “mixture-of-experts” model with 398 billion total parameters and 94 billion active parameters. It’s designed to handle more complex reasoning tasks, the startup said.

As for Jamba 1.5 Mini, it’s a refined and enhanced version of Jamba 1.5 Large, built to deliver expanded capabilities and superior output quality. The company said both of the new models were designed with developer-friendliness in mind, and they have been optimized for creating “agentic AI” systems that can perform tasks on behalf of users. To do this, they support features such as functional calling and tool use, JSON mode, citation mode and structured document objects.

Jamba 1.5 Mini and Jamba 1.5 Large are both said to feature 256,000 token context windows, more than any other open-source model available today. However, unlike other long-context window models, the Jamba models are said to be able to use their declared context windows fully.

Evidence of that comes from their performance on the new RULER benchmark, which is specifically designed to evaluate such models on tasks such as multihop tracing, retrieval, aggregation and question-answering. According to AI21 labs, the Jamba models excel at these tasks, demonstrating consistently superior outputs to competing models.

The startup pitted Jamba 1.5 Large against similar models, such as Llama 3.1 70B, Llama 3.1 405B and Mistral Large 2, and it reportedly achieved the lowest latency rate in its responses, proving twice as fast in the longest context windows.

Constellation Research Inc. analyst Holger Mueller said the main advantage of the new Jamba models is that they should improve the cost of running AI models, without impacting on the overall performance. “This is a key strategy for AI model makers, and AI21 Labs is going about it in a novel way by supporting larger context windows, which deliver better results without increasing the computational load,” he said.

Dagan said LLMs that can utilize extensive context windows represent the future of AI, as they’re better suited for handling complex and data-heavy tasks.

“Our breakthrough architecture allows Jamba to process vast amounts of information with lightning-fast efficiency,” he said. “Jamba’s combination of optimized architecture, unprecedented speed, and the largest available context window make it the optimal foundation model for developers and enterprises building RAG and agentic workflows.”

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

AI21 Labs’ updated hybrid SSM-Transformer model Jamba gets longest context window yet

Context is king

Advanced reasoning and Agentic AI

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Black Hat USA 2025

Open Storage Summit 2025

AppDev Done Right Summit, Encore Presentation

World of Workato 2025

Future of Data Platforms Summit

RECENT CUBE EVENTS

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

AI21 Labs’ updated hybrid SSM-Transformer model Jamba gets longest context window yet

Context is king

Advanced reasoning and Agentic AI

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Black Hat USA 2025

Open Storage Summit 2025

AppDev Done Right Summit, Encore Presentation

World of Workato 2025

Future of Data Platforms Summit

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

Cookies