AI
AI
AI
Generative artificial intelligence developer AI21 Labs Inc. says it wants to bring agentic AI workloads out of the data center and onto user’s devices with its newest model, Jamba Reasoning 3B.
Launched today, Jamba Reasoning 3B is one of the smallest models the company has ever released, the latest addition to the Jamba family of open-source models available under an Apache 2.0 license. It’s a small language model or SLM that’s built atop AI21 Labs’ own hybrid SSM-transformer architecture, making it different from most large language models, which are based on transformer-only frameworks.
SSM means that it’s a “state space model” which refers to a class of highly efficient algorithms for sequential modeling that identify a current state and then predict what the next state will be.
Jamba Reasoning 3B combines the Transformers architecture with AI21 Labs’ own Mamba neural network architecture and boasts a context window length of 256,000 tokens, with the ability to handle up to 1 million. It demonstrates efficiency gains of between two and five times that of similar lightweight models.
In a blog post, the company explained that Jamba Reasoning 3B utilizes rope scaling technology to stretch its attention mechanism, allowing it to handle tasks with much less compute power than larger models.
AI21 Labs highlighted its impressive performance, with a “combined intelligence” and “output tokens per second” ratio that surpasses similarly sized LLMs such as Alibaba Cloud’s Qwen 3.4B, Google LLC’s Gemma 3.4B, Meta Platforms Inc.’s Llama 3.2 3B, IBM Corp’s Granite 4.0 Micro and Microsoft’s Phi-4 Mini. That evaluation was based on a series of benchmarks, including IFBench, MMLU-Pro and Humanity’s Last Exam.

AI21 Labs believes there will be a big market for tiny language models such as Jamba Reasoning 3B, which is designed to be customized using retrieval-augmented generation techniques that provide it with more contextual knowledge.
The company cites research that shows how 40% to 70% of AI tasks in enterprises can be handled efficiently by smaller models. In doing so, companies can benefit from 10 to 30 times lower costs. “On-device SLMs like Jamba Reasoning 3B enable cost-effective, heterogeneous compute allocation — processing simple tasks locally while reserving cloud resources for complex reasoning,” the company explained.
SLMs can also power most AI agents, which perform tasks autonomously on behalf of human workers, with a high degree of efficiency, the company said. In agentic workflows, Jamba Reasoning 3B can act like an “on-device controller” orchestrating their operations, activating cloud-baed LLMs only when the extra compute power is needed to get more sophisticated tasks done. That means SLMs can potentially power much lower-latency agentic workflows, with additional benefits such as offline resilience and enhanced data privacy.
“This ushers in a decentralized AI era, akin to the 1980s shift from mainframes to personal computers, empowering local computation while seamlessly integrating cloud capabilities for greater scalability,” the company wrote.
AI21 Labs co-Chief Executive Ori Goshen told VentureBeat in an interview that SLMs like Jamba Reasoning 3B can free up data centers to focus only on the hardest AI problems and help to solve economic challenges faced by the industry. “What we’re seeing right now in the industry is an economics issue, where there are very expensive data center buildouts, and the revenue that is generated [from them] versus the depreciation rate of all their chips shows that the math doesn’t add up,” he said.
The company provided a number of examples of where AI is better processed locally by SMBs. Contact centers can run customer service agents on small devices to handle customer calls and decide if they can handle issues themselves, if a more powerful model should do it, or if the issue needs to be taken care of by a human agent.
Futurum Group analyst Brad Shimmin told AI Business that the theory behind state space models is an old one, but until recently the technology hasn’t existed to create them. “Now you can use this state space model idea because it scales really well and is extremely fast,” he said.
Holger Mueller of Constellation Research Inc. said SLMs certainly have their place and so it’s good to see AI21 Labs improving on them with Jamba Reasoning 3B, but he pointed out that the company is not telling the whole story here. “What is often forgotten is that there’s a need for SLMs to be updated more regularly than LLMs, and fine-tuned more frequently for specific tasks,” he said. “This challenge is often overlooked when weighing up the reduced power and system requirements of SLMs.”
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.